An end-to-end machine learning classification pipeline that predicts heart attack risk from 13 clinical features. Demonstrates a realistic ML workflow: EDA → preprocessing → model training → cross-validation → evaluation → live prediction.
| Model | Accuracy | ROC-AUC | CV Accuracy |
|---|---|---|---|
| Logistic Regression | ~85% | ~92% | ~84% |
| Random Forest | ~85% | ~93% | ~83% |
| SVM | ~87% | ~94% | ~85% |
| Gradient Boosting | ~86% | ~93% | ~85% |
| XGBoost ⭐ | ~88% | ~95% | ~86% |
- Max Heart Rate (thalachh) and chest pain type (cp) are the strongest predictors
- XGBoost achieves ~95% ROC-AUC — best overall model
- Stratified 5-fold CV ensures reliable estimates on the small dataset (303 patients)
- Exercise-induced angina (exng) and ST depression (oldpeak) are highly informative
- Full end-to-end ML pipeline (raw data → prediction)
- 5 models compared: Logistic Regression, Random Forest, SVM, Gradient Boosting, XGBoost
- Stratified K-Fold cross-validation
- Confusion matrix for best model
- Feature importance analysis (Gradient Boosting)
- Interactive prediction form (13 clinical inputs)
| Layer | Technology |
|---|---|
| ML Models | Scikit-Learn, XGBoost |
| Preprocessing | StandardScaler |
| Evaluation | ROC-AUC, Confusion Matrix, StratifiedKFold |
| Web Framework | Flask |
| Dataset | Heart Attack Analysis & Prediction (303 patients) |
1. Clone the repo
git clone https://github.com/manny2341/heart-attack-predictor.git
cd heart-attack-predictor2. Download the dataset
Download heart.csv from Kaggle and place it in:
dataset/heart.csv
3. Install dependencies
pip install -r requirements.txt4. Start the app
python3 app.py5. Open in browser
http://127.0.0.1:5009
| Feature | Description |
|---|---|
| age | Patient age |
| sex | 1 = Male, 0 = Female |
| cp | Chest pain type (0–3) |
| trtbps | Resting blood pressure (mmHg) |
| chol | Serum cholesterol (mg/dl) |
| fbs | Fasting blood sugar > 120 mg/dl |
| restecg | Resting ECG results (0–2) |
| thalachh | Maximum heart rate achieved |
| exng | Exercise induced angina (1=Yes) |
| oldpeak | ST depression induced by exercise |
| slp | Slope of peak exercise ST segment |
| caa | Number of major vessels (0–4) |
| thall | Thalassemia (0–3) |
| output | 1 = High risk, 0 = Low risk |
heart-attack-predictor/
├── app.py # Flask server, full ML pipeline, prediction API
├── dataset/
│ └── heart.csv # Download from Kaggle (see above)
├── templates/
│ └── index.html # Confusion matrix, model comparison, prediction form
├── static/
│ └── style.css # Dark medical theme
└── requirements.txt
| Project | Description | Repo |
|---|---|---|
| Diabetes Classifier | Model comparison + feature scaling demo | diabetes-classifier |
| Medical Cost Predictor | Regression — what drives healthcare costs | medical-cost-predictor |
| Stock Price Predictor | LSTM forecasting for 5,884 tickers + crypto | stock-price-predictor |
| Crop Disease Detector | EfficientNetV2 — 15 plant diseases | crop-disease-detector |