A Disease Progression Modelling (DPM) implementation of the Goldilocks framework for data-driven model configuration.
Goldilocks (conference paper: Oxtoby, AAIC 2024) is a framework for helping users ensure that their data-driven model of choice is configured "just right" for the available data. Conceptually, Goldilocks informs on feature selection and hyperparameter tuning, with respect to signal in the data. This complements work in the field of explainable AI.
- goldilocks_dpm: python module for implementing the Goldilocks framework within data-driven DPM.
- demos: folder containing demo implementations of various DPMs.
- README: describes the
goldilocks-dpmworkflow - goldilocks-pysustain.py: Subtype and Stage Inference
- Calls plot_SuStaIn_model_arbitrarycolours.py which demos how to use arbitrary colours in SuStaIn model plotting (thanks to Alex Young for help with this).
- README: describes the
- TODO. goldilocks_ebm.py: Event-Based Model (GMM, KDE, Ordinal Scored Events)
- demos: folder containing demo implementations of various DPMs.
- ADNIMERGE2023_synthetic.csv: Data mimicking ADNI data based on ADNIMERGE.csv downloaded in May 2023.
- synthetic_data.ipynb: Jupyter notebook for generating the above, FYI (don't try to run it unless you have an ADNIMERGE CSV to feed into it).
See goldilocks-pysustain.py for a worked example using ZScoreSustain.
## 1. Prepare your data
## 2. Create a Goldilocks DPM object and run the framework
## ZScoreSuStaIn
from goldilocks_dpm import goldilocks_ZscoreSustain
gdpm = goldilocks_ZscoreSustain(
classes = y,
dpmData = X,
output_folder = "path/to/output_folder",
robust_zscores = False,
case_label = 1,
ctrl_label = 0,
direction_abnormal = direction_abnormal,
biomarker_labels = biomarkers
)
gdpm.run_goldilocks()
## 3. Interrogate the resulting output.
print(gdpm.Z_vals)
print(gdpm.Z_max)
MIT.
