This repository contains the implementation of a two-stage machine learning framework for classifying cognitive status in Parkinson's Disease (PD) patients. The model distinguishes among three cognitive states: PD with normal cognition (PD-NC), PD with mild cognitive impairment (PD-MCI), and PD dementia (PDD).
- Two-stage classification framework: Stage 1 (PD-NC vs. non-PD-NC) and Stage 2 (PD-MCI vs. PDD)
- Explainable AI: Uses SHAP (SHapley Additive exPlanations) for model interpretability
- Ensemble approach: Combines XGBoost and Multilayer Perceptron (MLP) classifiers
- Class imbalance handling: SMOTE-Tomek method for balanced learning
- Subitem-level features: Fine-grained clinical and neuropsychological assessments
- Clinical feasibility: Uses only routine clinical scales without neuroimaging
This study uses data from the Parkinson's Progression Markers Initiative (PPMI):
- Total participants: 1,439 individuals with PD
- PD-NC: 1,030 participants
- PD-MCI: 330 participants
- PDD: 79 participants
- Data available from PPMI database
- Requires registration and approval by PPMI Data Access Committee
- Institutional ethics approval required
- UPDRS (NP1-NP4): Unified Parkinson's Disease Rating Scale subscales
- MoCA: Montreal Cognitive Assessment (total and subdomains)
- VF_LT: Verbal Fluency Letter Task
- JLO: Judgment of Line Orientation
- ESS: Epworth Sleepiness Scale
- GDS: Geriatric Depression Scale
- Clock drawing: Six individual items
Python >= 3.9
scikit-learn >= 1.2
XGBoost >= 1.7
pandas >= 1.3.0
numpy >= 1.21.0
matplotlib >= 3.4.0
seaborn >= 0.11.0
shap >= 0.40.0
imbalanced-learn >= 0.8.0
- Clone this repository:
git clone https://github.com/yc199911/Machine-Learning-for-Three-Class-Cognitive-Status-Classification-in-Parkinson-s-Disease.git
cd Machine-Learning-for-Three-Class-Cognitive-Status-Classification-in-Parkinson-s-Disease- Install required packages:
pip install -r requirements.txtfrom src.preprocessing import DataPreprocessor
# Initialize preprocessor
preprocessor = DataPreprocessor()
# Load and preprocess PPMI data
X_train, X_test, y_train, y_test = preprocessor.load_and_split_data('path/to/ppmi_data.csv')from src.two_stage_model import TwoStageClassifier
# Initialize two-stage classifier
model = TwoStageClassifier(
stage1_model='xgboost',
stage2_model='mlp',
use_smote_tomek=True,
random_state=42
)
# Train the model
model.fit(X_train, y_train)
# Make predictions
predictions = model.predict(X_test)from src.interpretability import SHAPAnalyzer
# Initialize SHAP analyzer
shap_analyzer = SHAPAnalyzer(model)
# Generate SHAP explanations
shap_analyzer.explain_predictions(X_test)
shap_analyzer.plot_feature_importance()- Algorithm: XGBoost
- Features: Top 10 SHAP-selected features
- Balancing: SMOTE-Tomek resampling
- Validation: 5-fold stratified cross-validation
- Algorithm: Multilayer Perceptron (MLP)
- Architecture: Two hidden layers (100, 50 neurons)
- Features: Top 10 features + interaction terms
- Activation: ReLU
- Optimizer: Adam
- Accuracy: 0.92
- Macro F1-score: 0.79
- Weighted F1-score: 0.92
| Class | Precision | Recall | F1-score |
|---|---|---|---|
| PD-NC | 0.94 | 1.00 | 0.97 |
| PD-MCI | 0.96 | 0.71 | 0.81 |
| PDD | 0.50 | 0.71 | 0.59 |
- PD-NC: 0.85
- PD-MCI: 0.85
- PDD: 0.84
- Superior performance compared to traditional screening methods (37% improvement over MoCA-only)
- Subitem-level features significantly outperform total scores (107% accuracy increase)
- Clinical interpretability through SHAP analysis reveals key predictive patterns
- Balanced detection across all cognitive subgroups, including minority PDD class
If you use this code in your research, please cite our paper:
@article{chen2024subitem,
title={Subitem-Level Multi-Scale Assessment and Machine Learning for Three-Class Cognitive Status Classification in Parkinson's Disease},
author={Chen, Ying-Che and Yu, Rwei-Ling and Hsieh, Sun-Yuan},
journal={[Journal Name]},
year={2024},
note={In review}
}We welcome contributions to improve this framework. Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Corresponding Authors:
- Sun-Yuan Hsieh: hsiehsy@mail.ncku.edu.tw
- Rwei-Ling Yu: lingyu@mail.ncku.edu.tw
First Author:
- Ying-Che Chen: q56121036@gs.ncku.edu.tw
Institution: National Cheng Kung University, Taiwan
- Parkinson's Progression Markers Initiative (PPMI) for providing the dataset
- Movement Disorder Society for cognitive assessment guidelines
- All PPMI participants and research teams
This code is provided for research purposes only. The model has not been clinically validated for diagnostic use. Please consult healthcare professionals for medical decisions.