Parsing the ICD10 PCS code
- ICD10 python
This is my attempt in parsing the ICD10 PCS codes by using Python. The original file from CMS.gov was in XML format, which has been converted into CSV.
# Import libraries and load the data.
import pandas as pd
from tabulate import tabulate
new_separate = pd.read_csv('new_separate.csv')
# Remove unnecessary columns.
new_separate.drop(new_separate.iloc[:, :2], inplace = True, axis = 1)
new_separate_head = new_separate.head()
print(new_separate_head.to_markdown())
| | code | col1 | col2 | col3 | col4 | col5 | col6 | col7 |
|---:|--------:|:---------------------|:------------------------------------------|:-------|:-------------------|:-------|:-----------------------------|:---------------|
| 0 | 0016070 | Medical and Surgical | Central Nervous System and Cranial Nerves | Bypass | Cerebral Ventricle | Open | Autologous Tissue Substitute | Nasopharynx |
| 1 | 0016071 | Medical and Surgical | Central Nervous System and Cranial Nerves | Bypass | Cerebral Ventricle | Open | Autologous Tissue Substitute | Mastoid Sinus |
| 2 | 0016072 | Medical and Surgical | Central Nervous System and Cranial Nerves | Bypass | Cerebral Ventricle | Open | Autologous Tissue Substitute | Atrium |
| 3 | 0016073 | Medical and Surgical | Central Nervous System and Cranial Nerves | Bypass | Cerebral Ventricle | Open | Autologous Tissue Substitute | Blood Vessel |
| 4 | 0016074 | Medical and Surgical | Central Nervous System and Cranial Nerves | Bypass | Cerebral Ventricle | Open | Autologous Tissue Substitute | Pleural Cavity |
# Create an empty dictionary to put the ICD10 codes and display names
new_data_dict = {}
print("Parsing new codes... please be patient LOL")
for index, row in new_separate.iterrows():
# Prepare variables to create new record.
code = ''
display_name = ''
# For each character of the row's code, create new record to new_data_dict
for character in row['code']:
code += character
# If code length is more than 6, then get out of the loop.
if len(code) > 6:
break
# If code already exists, deal with the next new code combination.
if code in new_data_dict:
continue
# Append display names according to the code characters.
if len(code) >= 1:
display_name = row['col1']
if len(code) >= 2:
display_name += " @" + row['col2']
if len(code) >= 3:
display_name += " @" + row['col3']
if len(code) >= 4:
display_name += " @" + row['col4']
if len(code) >= 5:
display_name += " @" + row['col5']
if len(code) >= 6:
display_name += " @" + row['col6']
# Fill the new_data_dict with new record
new_data_dict[code] = [display_name.rstrip(), code]
# Convert existing record into [title, code] as well.
display_name = row['col1'] + " @" + row['col2'] + " @" + row['col3'] \
+ " @" + row['col4'] + " @" + row['col5'] + " @" + row['col6'] \
+ " @" + row['col7']
new_data_dict[code] = [display_name.rstrip(), code]
# Merging the data
print("Creating new data frame...")
result = pd.DataFrame(new_data_dict.values(),
columns = ['ICD10PCS_display_name', 'ICD10PCS_code'])
# Quick check on the result
# result.head(10)
# result.tail(10)
# Export result to CSV
print("Exporting to CSV file...")
result.to_csv('icd10pcs_official.csv')
print("Done. YAYYY :D XD~~")
Parsing new codes... please be patient LOL
Creating new data frame...
Exporting to CSV file...
Done. YAYYY :D XD~~