CS 598: Deep Learning For Healthcare Final Project

By: Armaan R. Butt and Harikrishna Bojja {arbutt2, hbojja2}@illinois.edu

Group ID: 213, Paper ID: 283

Paper: Categorization of free-text drug orders using character-level recurrent neural networks [1]

Dependencies

Computer Specs

You will need a machine with the following specs:

CPU: 2.9 GHz - 8 Cores
Memory: 64 GB

Runtime

You will need a machine with Python 3.9.7 installed.

Python Libraries

Please have the following Python libraries installed. We have provided the requirements.txt file in the project root for your convenience.

keras==2.8.0
matplotlib==3.5.1
nltk==3.7
numpy==1.22.0
pandarallel==1.6.1
pandas==1.4.2
scikit_learn==1.0.2
tensorflow==2.8.0

Data Download Instructions

Please download the data from https://physionet.org/content/mimiciii-demo/1.4/ and extract the NOTEEVENTS.csv and PRESCRIPTIONS.csv to /data/real-mimic-iii-database.

Exploratory Data Analysis(EDA)

The jupyter notebooks below shows how we have used Seaborn and Matplotlib to analyze the Precriptions and Noteevents data.

/src/data_profiling/profile_prescription_data.ipynb
/src/data_profiling/profile_free_text.ipynb

Data Pre-Processing Code

Run the following jupyter notebooks in order. It takes approximately 1 hour to finish the data preprocessing.

/src/data_processing/extract_drug_codes.ipynb
/src/data_processing/note_events_processing.ipynb

Once complete it will generate two new files in /data/processed/:

NOTEEVENTS_ML_DATASET.csv
ndc_codes_extracted.csv

Train and Evaluate Models Code

To train and evaluate the SVM and GRU Models run the following jupyter notebooks:

/src/ml/multi_class_svm.ipynb
/src/ml/gru_model.ipynb

Results will be persisted in two csvs in the /data/results directory.

GRU_RESULTS.csv
SVM_results.csv

Results

Baseline SVM

The baseline model SVM model was trained on the top 22 common drugs in our dataset (NDC). The SVM model used a linear kernel (LinearSVC) with the input text data being vectorized at the character level using TfidVectoriezer using scikit-learn. TfidVectoriezer was configure to generate trigrams from the text data.

NDC	Accuracy (%)	Precision (%)	Recall (%)
00713016550	90	83	76
00487950125	92	89	76
00517391025	92	92	91
51079001920	90	90	91
11098003002	95	88	44
00054829725	93	88	72
00045025501	98	80	5
00338055002	86	88	91
00409131230	94	90	59
00045152510	93	87	65
00074407532	87	87	77
51079080120	91	90	85
51079025520	91	90	85
00074176201	87	86	71
00781305714	97	87	13
00054465025	95	86	65
00008084199	94	91	74
58177000104	93	95	91
00781188313	96	93	55
00517293025	93	94	95
00338355248	90	84	71
00002735501	89	89	83
Average	92	88	70

GRU - RNN

Model	Hidden State Size	Number of Epochs	Mean Training Accuracy (%)	Mean Test Accuracy (%)	Mean Training Loss (%)	Mean Test Loss (%)
Bidirectional GRU	32	3	75.44	75.69	56.44	55.49
Bidirectional GRU	64	3	75.45	75.69	56.22	55.52
Bidirectional GRU	128	3	75.44	75.69	56.26	55.66

References

[1] Raiskin Y, Eickhoff C, Beeler PE. Categorization of free-text drug orders using character-level recurrent neural networks. Int J Med Inform. 2019 Sep;129:20-28. doi: 10.1016/j.ijmedinf.2019.05.020. Epub 2019 May 23. PMID: 31445256.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
data		data
docs		docs
src		src
.gitignore		.gitignore
CITATION.cff		CITATION.cff
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS 598: Deep Learning For Healthcare Final Project

Paper: Categorization of free-text drug orders using character-level recurrent neural networks [1]

Dependencies

Computer Specs

Runtime

Python Libraries

Data Download Instructions

Exploratory Data Analysis(EDA)

Data Pre-Processing Code

Train and Evaluate Models Code

Results

Baseline SVM

GRU - RNN

References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

c3armaanbutt/dl4h_final_project

Folders and files

Latest commit

History

Repository files navigation

CS 598: Deep Learning For Healthcare Final Project

Paper: Categorization of free-text drug orders using character-level recurrent neural networks [1]

Dependencies

Computer Specs

Runtime

Python Libraries

Data Download Instructions

Exploratory Data Analysis(EDA)

Data Pre-Processing Code

Train and Evaluate Models Code

Results

Baseline SVM

GRU - RNN

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages