Parsing the ICD10 PCS code

Jul 08, 2020 - ICD10 python

This is my attempt in parsing the ICD10 PCS codes by using Python. The original file from CMS.gov was in XML format, which has been converted into CSV.

# Import libraries and load the data.
import pandas as pd
from tabulate import tabulate 

new_separate = pd.read_csv('new_separate.csv')

# Remove unnecessary columns.
new_separate.drop(new_separate.iloc[:, :2], inplace = True, axis = 1)
new_separate_head = new_separate.head()
print(new_separate_head.to_markdown())

|    |    code | col1                 | col2                                      | col3   | col4               | col5   | col6                         | col7           |
|---:|--------:|:---------------------|:------------------------------------------|:-------|:-------------------|:-------|:-----------------------------|:---------------|
|  0 | 0016070 | Medical and Surgical | Central Nervous System and Cranial Nerves | Bypass | Cerebral Ventricle | Open   | Autologous Tissue Substitute | Nasopharynx    |
|  1 | 0016071 | Medical and Surgical | Central Nervous System and Cranial Nerves | Bypass | Cerebral Ventricle | Open   | Autologous Tissue Substitute | Mastoid Sinus  |
|  2 | 0016072 | Medical and Surgical | Central Nervous System and Cranial Nerves | Bypass | Cerebral Ventricle | Open   | Autologous Tissue Substitute | Atrium         |
|  3 | 0016073 | Medical and Surgical | Central Nervous System and Cranial Nerves | Bypass | Cerebral Ventricle | Open   | Autologous Tissue Substitute | Blood Vessel   |
|  4 | 0016074 | Medical and Surgical | Central Nervous System and Cranial Nerves | Bypass | Cerebral Ventricle | Open   | Autologous Tissue Substitute | Pleural Cavity |

# Create an empty dictionary to put the ICD10 codes and display names
new_data_dict = {}

print("Parsing new codes... please be patient LOL")
for index, row in new_separate.iterrows():
    # Prepare variables to create new record.
    code = ''
    display_name = ''
    
    # For each character of the row's code, create new record to new_data_dict
    for character in row['code']:
        code += character
        
        # If code length is more than 6, then get out of the loop.
        if len(code) > 6:
            break
        
        # If code already exists, deal with the next new code combination.
        if code in new_data_dict:
            continue
        
        # Append display names according to the code characters.
        if len(code) >= 1:
            display_name = row['col1']
            
        if len(code) >= 2:
            display_name += " @" + row['col2']
            
        if len(code) >= 3:
            display_name += " @" + row['col3']
            
        if len(code) >= 4:
            display_name += " @" + row['col4']
            
        if len(code) >= 5:
            display_name += " @" + row['col5']
            
        if len(code) >= 6:
            display_name += " @" + row['col6']
        
        # Fill the new_data_dict with new record
        new_data_dict[code] = [display_name.rstrip(), code]

    # Convert existing record into [title, code] as well.
    display_name = row['col1'] + " @" + row['col2'] + " @" + row['col3'] \
        + " @" + row['col4'] + " @" + row['col5'] + " @" + row['col6'] \
    + " @" + row['col7']
    new_data_dict[code] = [display_name.rstrip(), code]

# Merging the data
print("Creating new data frame...")
result = pd.DataFrame(new_data_dict.values(),
                      columns = ['ICD10PCS_display_name', 'ICD10PCS_code'])
# Quick check on the result
# result.head(10)
# result.tail(10)

# Export result to CSV
print("Exporting to CSV file...")
result.to_csv('icd10pcs_official.csv')
print("Done. YAYYY :D XD~~")

Parsing new codes... please be patient LOL
Creating new data frame...
Exporting to CSV file...
Done. YAYYY :D XD~~

See Also