A machine learning pipeline to predict heart disease presence based on the UCI Cleveland dataset.
Check out the live application here: Heart Disease Predictor
The model was evaluated on a reserved test set (20% of data).
| Metric | Score | Description |
|---|---|---|
| Accuracy | 83% | Overall correctness of predictions. |
| Recall (Class 1) | 0.79 | The model correctly identifies 79% of positive heart disease cases. |
| Precision (Class 1) | 0.85 | When the model predicts disease, it is correct 85% of the time. |
The model successfully identified 22 out of 28 positive cases in the test batch.
data/: Contains raw and processed data (ignored by git).notebooks/: Jupyter notebooks for exploration and prototyping.src/: Production-ready Python scripts.models/: Serialized models.
This project uses a decoupled architecture with a separate frontend and backend, containerized for portability.
- Machine Learning:
scikit-learn: Model training and evaluation.pandas&numpy: Data manipulation and preprocessing.joblib: Model serialization.
- Backend API:
FastAPI: High-performance web framework for serving predictions.Uvicorn: ASGI server for production.
- Frontend UI:
Streamlit: Interactive web interface for users.
- DevOps & Deployment:
Docker: Containerization of services.Docker Compose: Orchestration of multi-container environments.Render: Cloud platform for deployment.
- Clone the repo.
- Run the app on your local machine:
- Install dependencies:
pip install -r requirements.txt - Run the API:
uvicorn src.app:app --reload - Access the Streamlit app:
streamlit run src/streamlit_app.py
- Install dependencies:
- Alternatively, use Docker Compose to run both services:
docker-compose up- Access the app at
http://localhost:8501 - Access the API at
http://localhost:8000
- Access the app at
The dataset used in this project is the UCI Cleveland Heart Disease dataset, which contains various medical attributes related to heart disease. The target variable indicates the presence or absence of heart disease.
Source: UCI Machine Learning Repository
Here's a brief description of the features:
| Column | Full Name | Description | Key Values |
|---|---|---|---|
age |
Age | Patient's age in years. | |
sex |
Sex | Biological sex of the patient. | 1 = Male0 = Female |
cp |
Chest Pain Type | The type of chest pain experienced. | 1 = Typical Angina2 = Atypical Angina3 = Non-anginal Pain4 = Asymptomatic |
trestbps |
Resting Blood Pressure | Blood pressure reading upon admission (mm Hg). | |
chol |
Cholesterol | Serum cholesterol in mg/dl. | |
fbs |
Fasting Blood Sugar | Whether fasting blood sugar > 120 mg/dl. | 1 = True0 = False |
restecg |
Resting ECG | Resting electrocardiographic results. | 0 = Normal1 = ST-T wave abnormality2 = Left ventricular hypertrophy |
thalach |
Max Heart Rate | Maximum heart rate achieved during the test. | |
exang |
Exercise-Induced Angina | Chest pain caused by exercise? | 1 = Yes0 = No |
oldpeak |
ST Depression | ST depression induced by exercise relative to rest (indicates heart stress). | |
slope |
Slope | The slope of the peak exercise ST segment. | 1 = Upsloping2 = Flat3 = Downsloping |
ca |
Major Vessels | Number of major vessels colored by fluoroscopy. | 0 to 3 |
thal |
Thalassemia | A blood disorder classification. | 3 = Normal6 = Fixed Defect7 = Reversable Defect |
target |
Diagnosis | Presence of heart disease. | 0 = No Disease1 = Disease |

