This project compares the performance of four machine learning algorithmsβLogistic Regression, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Gradient Boosting Machine (GBM)βin predicting heart failure outcomes using clinical data.
To determine the most effective machine learning model for predicting DEATH_EVENT (mortality) among patients using real-world clinical records.
βββ data/ β βββ heart_failure_clinical_records.csv βββ notebooks/ β βββ heart_failure_analysis.ipynb βββ outputs/ β βββ confusion_matrices/ β βββ classification_reports
- Source: UCI Machine Learning Repository
- Records: 299 patients
- Features: 12 clinical variables
- Target Variable:
DEATH_EVENT(1 = death, 0 = survived)
-
Data Preprocessing
- Loaded and cleaned dataset
- Standardized features for KNN and SVM
- Train/test split (80/20)
-
Models Trained
- Logistic Regression
- SVM (Support Vector Machine)
- K-Nearest Neighbors (KNN)
- Gradient Boosting Machine (GBM)
-
Evaluation Metrics
- Accuracy
- Precision
- Recall
- F1-Score
- Confusion Matrix
-
Visualization
- Correlation Matrix
- Classification Reports
- Confusion Matrices
| Model | Accuracy | F1-Score | Remarks |
|---|---|---|---|
| Gradient Boosting | β Highest | High | Best overall performance |
| SVM | Moderate | Good | Effective with standardized data |
| KNN | Moderate | Moderate | Handles non-linear patterns well |
| Logistic Regression | Moderate | Moderate | Baseline interpretable model |
- Gradient Boosting consistently outperformed others.
- SVM and Logistic Regression are simpler and interpretable.
- KNN works well for non-linear patterns but needs tuning.
- Ensemble models like GBM offer high accuracy for clinical predictions.
- π Dataset: UCI Repository
- π Report: See
MachineLearning_assessment.pdf - π¦ GitHub Repo: https://github.com/Sravani-Neelakantam/assessment
pandas numpy scikit-learn matplotlib seaborn xgboost π©βπ» Author
Sravani Neelakantam MSc Data Science, Coventry University π§ [email protected]