Heart Failure Survival Analysis, Winter 2026

Survival analysis of heart failure patients to identify key factors that distinguish survival from death. Students will learn visualization, statistical analysis, and machine learning techniques to predict patient outcomes.

Key Finding: Two features are sufficient to distinguish survival from death using different classifiers.

View Project Website

Poster

Structure

Below is a high-level overview of the main components of this project.

Dataset {heart_failure_clinical_records_dataset.csv}
Heart failure clinical records from 299 patients with 13 features including age, ejection fraction, serum creatinine, and follow-up time. Binary outcome: survival or death.

Week 1: Data Exploration {Week1.ipynb}
Introduction to the dataset, exploratory data analysis, and visualization techniques using Pandas, Seaborn, and Matplotlib.

Week 2: Statistical Analysis {Week2.ipynb}
Hypothesis testing (T-test, Mann-Whitney U), correlation analysis, multiple testing correction (FDR), feature variance analysis, and Variance Inflation Factor (VIF) for detecting multicollinearity.

Week 3: Unsupervised Learning {Week3.ipynb}
Data normalization (Z-score), dimensionality reduction with PCA, K-Means clustering, hierarchical (agglomerative) clustering, confusion matrices, silhouette scores, and the elbow method.

Week 4: Supervised Learning {Week4.ipynb}
Train/test splitting with stratification, feature normalization, and classification algorithms including Logistic Regression, Random Forest, Support Vector Machines (SVM), and K-Nearest Neighbors (KNN). Model evaluation using accuracy, precision, recall, and F1-score metrics.

Week 5: Hyperparameter Optimization {Week5.ipynb}
Advanced techniques for tuning machine learning models using GridSearchCV, Random Search, and Bayesian Optimization with Optuna. Learn to efficiently search hyperparameter space and find optimal model configurations.

Week 6: Ensemble Methods & Boosting {Week6.ipynb}
From Random Forest to Gradient Boosting to LightGBM. Learn how ensemble methods combine weak learners into strong predictors. Includes hyperparameter tuning with Optuna, evaluation with 4 metrics, and hands-on exercises.

Week 7: Feature Selection Methods {Week7.ipynb}
Why fewer features often outperform more: Lasso (L1 penalty), Elastic Net (L1+L2), and MRMR filter method. Learn to identify which clinical features truly drive heart failure survival prediction.

Week 8: Deep Learning for Medical Data {Week8.ipynb}
Introduction to deep learning with Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTMs). Learn neural network fundamentals, backpropagation, activation functions, and how to apply deep learning to clinical prediction tasks.

Week 9: AI Model Interpretation (PLS-DA & SHAP) {Week9.ipynb}
Interpretability and explainability in machine learning. Partial Least Squares Discriminant Analysis (PLS-DA) for supervised dimensionality reduction, and SHAP (SHapley Additive exPlanations) for understanding model predictions and feature importance in black-box models.

Tutorials {Git_Tutorial.ipynb, Venv_Tutorial.ipynb}
Interactive Jupyter notebooks for Git basics and Python virtual environments. Also available as markdown guides in tutorials/ folder.

Schedule

Week	Topic	Links
1	Data Exploration	Notebook, Seaborn Docs, Pandas Docs
2	Statistical Analysis	Notebook, Slides, Scipy Stats, Statsmodels VIF
3	Unsupervised Learning	Notebook, Slides, PCA Guide, Clustering
4	Supervised Learning	Notebook, Slides, Scikit-learn Classifiers, Model Evaluation
5	Hyperparameter Optimization	Notebook, Slides, Optuna Docs, GridSearchCV
6	Ensemble Methods & Boosting	Notebook, Slides, LightGBM Docs, Chicco & Jurman (2020)
7	Feature Selection Methods	Notebook, Scikit-learn Feature Selection
8	Deep Learning for Medical Data	Notebook, TensorFlow Docs, Keras API, Understanding Neural Networks
9	AI Model Interpretation (PLS-DA & SHAP)	Notebook, Slides, SHAP Docs, PLS-DA Guide

Research Background

This project is based on the paper by Chicco & Jurman (2020):

Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone BMC Medical Informatics and Decision Making, 20, 16

Key Findings from the Paper:

Applied several ML classifiers (Random Forest, Gradient Boosting, SVM, etc.) to predict survival
Discovered that serum creatinine and ejection fraction alone achieve strong predictive performance
Random Forest achieved the best results with Matthews Correlation Coefficient (MCC) of 0.418
Feature ranking analysis revealed time, serum creatinine, and ejection fraction as top predictors
Demonstrated that complex models with all 13 features do not significantly outperform simpler 2-feature models

Clinical Relevance:

Serum creatinine indicates kidney function, often impaired in heart failure patients
Ejection fraction measures heart pumping efficiency, a direct indicator of cardiac health
These two biomarkers are routinely measured and can guide clinical decision-making

Read the full paper

Resources

Project

Setup Tutorials (Interactive Jupyter Notebooks)

Git Tutorial - Installation, setup, commands, workflows, branching
Virtual Environment Tutorial - Python venv, package management, troubleshooting

Libraries & Tools

Scikit-learn - Machine learning
Pandas - Data manipulation
Seaborn - Data visualization
LightGBM - Gradient boosting
Optuna - Hyperparameter optimization

Getting Started

Quick Setup

git clone https://github.com/MichiganDataScienceTeam/W26-MDST-Project_Heart-Failure-Survival-Analysis.git
cd W26-MDST-Project_Heart-Failure-Survival-Analysis
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
jupyter notebook

Need Help?

New to Git? Start with Git_Tutorial.ipynb
New to Python virtual environments? Check Venv_Tutorial.ipynb
Both tutorials are interactive Jupyter notebooks - just open and read along!

Acknowledgements

Leads
Sina Bonakdar
Terry Zhang

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heart Failure Survival Analysis, Winter 2026

Poster

Structure

Schedule

Research Background

Resources

Project

Setup Tutorials (Interactive Jupyter Notebooks)

Libraries & Tools

Getting Started

Quick Setup

Need Help?

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.github/workflows		.github/workflows
Images		Images
docs		docs
slides		slides
tutorials		tutorials
.gitignore		.gitignore
Git_Tutorial.ipynb		Git_Tutorial.ipynb
LICENSE		LICENSE
README.md		README.md
Venv_Tutorial.ipynb		Venv_Tutorial.ipynb
Week1.ipynb		Week1.ipynb
Week2.ipynb		Week2.ipynb
Week3.ipynb		Week3.ipynb
Week4.ipynb		Week4.ipynb
Week5.ipynb		Week5.ipynb
Week6.ipynb		Week6.ipynb
Week7.ipynb		Week7.ipynb
Week8.ipynb		Week8.ipynb
Week9.ipynb		Week9.ipynb
heart_failure_clinical_records_dataset.csv		heart_failure_clinical_records_dataset.csv
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Heart Failure Survival Analysis, Winter 2026

Poster

Structure

Schedule

Research Background

Resources

Project

Setup Tutorials (Interactive Jupyter Notebooks)

Libraries & Tools

Getting Started

Quick Setup

Need Help?

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages