This work is published in npj Dementia (https://doi.org/10.1038/s44400-025-00031-1).
This repository provides a comprehensive pipeline for preprocessing, feature extraction, modeling, and evaluation of neuroradiology imaging data. It integrates FreeSurfer-based volumetric analysis, white matter hyperintensity (WMH) burden estimation, and machine learning models (XGBoost) for Alzheimer’s disease (AD) and Other Imaging Evident Dementia (OIED) predictions.
- Create a Conda environment:
conda env create -f environment.yml # or for macOS specific conda env create -f environment-macos.yml - Activate the environment:
conda activate neurorad-radiomics
data/ml_data/: preprocessed tabular datasets (ml_data_filtered.csv,ml_test_data_filtered.csv, etc.)dev-model/: saved XGBoost models and tuning parametersfeature_config.json: definitions of volumetric, WMH, imaging, and demographic featureslobe_mapping.json,feature_names_map.json, etc.
Scripts under preprocessing/ handle MRI data conversion and feature extraction:
fs_scripts.py: filter and convert FreeSurfer outputs, parse volume statisticslst_scripts.py: run LST (Lesion Segmentation Tool) for WMH estimationNACC_mris.py&prep_array_job_csv_.py: generate batch arguments for array jobs on NACC datasets
Training scripts are in train/:
model.py: preprocessing and label generationwandb_sweep.py: hyperparameter sweep setup with Weights & Biasescv.py: cross-validation with GroupKFoldfinal_train.py: train final AD, OIED models with hyperparmeters defined from sweeps
Misc. analyses and plotting scripts are in results/
utils/load_data.py: load and derive feature lists from JSON configutils/dump_dkt_atlas.py: export region names grouped by lobes