Machine Learning for Three-Class Cognitive Status Classification in Parkinson's Disease

Overview

This repository contains the implementation of a two-stage machine learning framework for classifying cognitive status in Parkinson's Disease (PD) patients. The model distinguishes among three cognitive states: PD with normal cognition (PD-NC), PD with mild cognitive impairment (PD-MCI), and PD dementia (PDD).

Key Features

Two-stage classification framework: Stage 1 (PD-NC vs. non-PD-NC) and Stage 2 (PD-MCI vs. PDD)
Explainable AI: Uses SHAP (SHapley Additive exPlanations) for model interpretability
Ensemble approach: Combines XGBoost and Multilayer Perceptron (MLP) classifiers
Class imbalance handling: SMOTE-Tomek method for balanced learning
Subitem-level features: Fine-grained clinical and neuropsychological assessments
Clinical feasibility: Uses only routine clinical scales without neuroimaging

Dataset

This study uses data from the Parkinson's Progression Markers Initiative (PPMI):

Total participants: 1,439 individuals with PD
PD-NC: 1,030 participants
PD-MCI: 330 participants
PDD: 79 participants

Data Access

Data available from PPMI database
Requires registration and approval by PPMI Data Access Committee
Institutional ethics approval required

Features Used

UPDRS (NP1-NP4): Unified Parkinson's Disease Rating Scale subscales
MoCA: Montreal Cognitive Assessment (total and subdomains)
VF_LT: Verbal Fluency Letter Task
JLO: Judgment of Line Orientation
ESS: Epworth Sleepiness Scale
GDS: Geriatric Depression Scale
Clock drawing: Six individual items

Requirements

Python >= 3.9
scikit-learn >= 1.2
XGBoost >= 1.7
pandas >= 1.3.0
numpy >= 1.21.0
matplotlib >= 3.4.0
seaborn >= 0.11.0
shap >= 0.40.0
imbalanced-learn >= 0.8.0

Installation

Clone this repository:

git clone https://github.com/yc199911/Machine-Learning-for-Three-Class-Cognitive-Status-Classification-in-Parkinson-s-Disease.git
cd Machine-Learning-for-Three-Class-Cognitive-Status-Classification-in-Parkinson-s-Disease

Install required packages:

pip install -r requirements.txt

Usage

Data Preprocessing

from src.preprocessing import DataPreprocessor

# Initialize preprocessor
preprocessor = DataPreprocessor()

# Load and preprocess PPMI data
X_train, X_test, y_train, y_test = preprocessor.load_and_split_data('path/to/ppmi_data.csv')

Two-Stage Model Training

from src.two_stage_model import TwoStageClassifier

# Initialize two-stage classifier
model = TwoStageClassifier(
    stage1_model='xgboost',
    stage2_model='mlp',
    use_smote_tomek=True,
    random_state=42
)

# Train the model
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

SHAP Analysis

from src.interpretability import SHAPAnalyzer

# Initialize SHAP analyzer
shap_analyzer = SHAPAnalyzer(model)

# Generate SHAP explanations
shap_analyzer.explain_predictions(X_test)
shap_analyzer.plot_feature_importance()

Model Architecture

Stage 1: PD-NC vs. Non-PD-NC

Algorithm: XGBoost
Features: Top 10 SHAP-selected features
Balancing: SMOTE-Tomek resampling
Validation: 5-fold stratified cross-validation

Stage 2: PD-MCI vs. PDD

Algorithm: Multilayer Perceptron (MLP)
Architecture: Two hidden layers (100, 50 neurons)
Features: Top 10 features + interaction terms
Activation: ReLU
Optimizer: Adam

Performance Results

Overall Three-Class Classification

Accuracy: 0.92
Macro F1-score: 0.79
Weighted F1-score: 0.92

Per-Class Performance

Class	Precision	Recall	F1-score
PD-NC	0.94	1.00	0.97
PD-MCI	0.96	0.71	0.81
PDD	0.50	0.71	0.59

ROC-AUC Scores

PD-NC: 0.85
PD-MCI: 0.85
PDD: 0.84

Key Findings

Superior performance compared to traditional screening methods (37% improvement over MoCA-only)
Subitem-level features significantly outperform total scores (107% accuracy increase)
Clinical interpretability through SHAP analysis reveals key predictive patterns
Balanced detection across all cognitive subgroups, including minority PDD class

Citation

If you use this code in your research, please cite our paper:

@article{chen2024subitem,
  title={Subitem-Level Multi-Scale Assessment and Machine Learning for Three-Class Cognitive Status Classification in Parkinson's Disease},
  author={Chen, Ying-Che and Yu, Rwei-Ling and Hsieh, Sun-Yuan},
  journal={[Journal Name]},
  year={2024},
  note={In review}
}

Contributing

We welcome contributions to improve this framework. Please:

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Corresponding Authors:

Sun-Yuan Hsieh: hsiehsy@mail.ncku.edu.tw
Rwei-Ling Yu: lingyu@mail.ncku.edu.tw

First Author:

Ying-Che Chen: q56121036@gs.ncku.edu.tw

Institution: National Cheng Kung University, Taiwan

Acknowledgments

Parkinson's Progression Markers Initiative (PPMI) for providing the dataset
Movement Disorder Society for cognitive assessment guidelines
All PPMI participants and research teams

Disclaimer

This code is provided for research purposes only. The model has not been clinically validated for diagnostic use. Please consult healthcare professionals for medical decisions.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
results		results
PPMI_DATASET.xlsx		PPMI_DATASET.xlsx
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning for Three-Class Cognitive Status Classification in Parkinson's Disease

Overview

Key Features

Dataset

Data Access

Features Used

Requirements

Installation

Usage

Data Preprocessing

Two-Stage Model Training

SHAP Analysis

Model Architecture

Stage 1: PD-NC vs. Non-PD-NC

Stage 2: PD-MCI vs. PDD

Performance Results

Overall Three-Class Classification

Per-Class Performance

ROC-AUC Scores

Key Findings

Citation

Contributing

License

Contact

Acknowledgments

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Machine Learning for Three-Class Cognitive Status Classification in Parkinson's Disease

Overview

Key Features

Dataset

Data Access

Features Used

Requirements

Installation

Usage

Data Preprocessing

Two-Stage Model Training

SHAP Analysis

Model Architecture

Stage 1: PD-NC vs. Non-PD-NC

Stage 2: PD-MCI vs. PDD

Performance Results

Overall Three-Class Classification

Per-Class Performance

ROC-AUC Scores

Key Findings

Citation

Contributing

License

Contact

Acknowledgments

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages