This repository contains the code implementation for experiments in the paper: Fast and Interpretable Mortality Risk Scores for Critical Care Patients. We detail the specific steps to reproduce our results below. Our work has been accepted by JAMIA.
- Install packages using
requirements.txt. - Set
PYTHONPATHtosrc/common. This is because most of the scripts use modules inmimic_pipelinedirectory.
- MIMIC III: https://physionet.org/content/mimiciii/1.4/.
- eICU: https://physionet.org/content/eicu-crd/2.0/.
- Follow the official tutorial for MIMIC III to build a local postgres database (https://mimic.mit.edu/docs/gettingstarted/local/install-mimic-locally-ubuntu/).
- Create concepts for MIMIC III using the code in official repository (https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iii/concepts_postgres).
- Run
src/common/sql/mimic3/extract/union_features.sqlandsrc/common/sql/mimic3/extract/union_features_prep.sql. This step preprocesses the features considered in our study, selects our study cohorts, and generates a tabular data for the patients. - Use
src/exp_6.6_to_6.27/union_train_test.ipynbto generate train and test splits for MIMIC III study cohort.
- Similarly as MIMIC III, build a local postgres database (https://eicu-crd.mit.edu/tutorials/install_eicu_locally/). Then, create concepts for eICU using code in officla repository (https://github.com/MIT-LCP/eicu-code/tree/main/concepts).
- Run
src/common/sql/eicu/extract/union_features.sql, this script selects the cohorts for our study and generates the data in a tabular form. - Use
src/exp_6.6_to_6.27/union_eicu_generate.ipynbto save the data as a.csvfile locally.
- Use
src/exp_6.6_to_6.27/mimic_groupfasterrisk_train.pyto train GFR models. Set group sparsity as 15. This creates the models without monotonicity constraints. - To generate risk scores with monotonicity constraints, run
src/exp_6.6_to_6.27/card_generation.ipynb.
- Use
src/exp_6.6_to_6.27/mimic_cross_validation.pyto perform nested cross valiadtion on MIMIC III. Please note that the hyperparameters in that file are tuned individually on each of the five folds on MIMIC III. If you would like to reproduce our results for hyperparameter optimization, perform the following steps: (1) runsrc/exp_6.6_to_6.27/mimic_kfold_generation.ipynbto generate 5 folds for MIMIC III; (2) usesrc/exp_6.6_to_6.27/sweep.pyandsrc/exp_6.6_to_6.27/sweep_oasis.pyto perform hyperparameter optimization. The possible hyperparameter combinations are stored atparamsdirectory. - Run
src/exp_6.6_to_6.27/mimic_baselines.ipynbto obtain results for OASIS and SAPS II on MIMIC III. - Run
src/exp_6.6_to_6.27/OOD.ipynbto obtain results for OASIS, SAPS II, and APACHE IV/IVa on eICU.
- Run
src/exp_6.6_to_6.27/OOD_visualize.ipynbto visualize the ROC and PR curves on eICU.
- Use
src/exp_6.6_to_6.27/mimic_cross_validation.py, train GroupFasterRisk models with group sparsity of 10, 15, 20, 25, 30, 35, 40, and 45. - After training is complete, run the Group Sparsity cell in
src/exp_6.6_to_6.27/visualize.ipynb.
- Run
src/exp_6.6_to_6.27/time_fasterrisk.pyto obtain an estimate of runtime for training GroupFasterRisk models. - Plot the figure using Time Consumption cell in
src/exp_6.6_to_6.27/visualize.ipynb.
- Obtain tabular data for each disease-specific cohorts using
src/exp_6.6_to_6.27/union_disease_generate.ipynb. Then, generate the folds for MIMIC III subpopulations usingsrc/exp_6.6_to_6.27/mimic_kfold_disease_generation.ipynb. - Train GroupFasterRisk models on MIMIC III subpopulation using
src/exp_6.6_to_6.27/mimic_cross_validation_disease.py. - Plot the results using Disease Specific cell in
src/exp_6.6_to_6.27/visualize.ipynb.
- Run
src/exp_6.6_to_6.27/OOD_disease.ipynbto obtain the results.
- Use
src/exp_6.6_to_6.27/mimic_feature_selection.pyto train ML models. Train one set of models with GroupFasterRisk features, and train another set of models with OASIS features. - Run Feature Selection cell in
src/exp_6.6_to_6.27/visualize.ipynb.
- Use
src/exp_6.6_to_6.27/mimic_cross_validation.pyto train GFR-14, GFR-OASIS, and GFR-40. These models are for internal evaluation (MIMIC III). - Use
src/exp_6.6_to_6.27/mimic_cross_validation.pyto train ML models. Train one set of models with OASIS features, and train another set of models with all 49 features. These models are for internal evaluation (MIMIC III). - Use
src/exp_6.6_to_6.27/mimic_groupfasterrisk_train.pyto train GFR-14, GFR-OASIS, and GFR-40 for out-of-distribution evaluation (eICU). - Use
src/exp_6.6_to_6.27/mimic_ml_train.pyto train ML models. Similarly, train one set of models with OASIS features, and train another set of models with all 49 features. These models are for out-of-distribution evaluation (eICU). - The results for AutoScore on both MIMIC III and eICU can be obtained with
src/exp_6.6_to_6.27/autoscore.r. - Use Complexity Graph cell in
src/exp_6.6_to_6.27/visualize.ipynbto generate the figures.
- Use
src/exp_6.6_to_6.27/OOD_calibrate.ipynbto calibrate GFR models on eICU dataset using a subset of 2000 patients. - Use
src/exp_6.6_to_6.27/OOD_fairness.ipynbto obtain the numerical results in Table 1.
@article{zhu2023fast,
title={Fast and Interpretable Mortality Risk Scores for Critical Care Patients},
author={Zhu, Chloe Qinyu and Tian, Muhang and Semenova, Lesia and Liu, Jiachang and Xu, Jack and Scarpa, Joseph and Rudin, Cynthia},
journal={arXiv preprint arXiv:2311.13015},
year={2023}
}