AI-Powered Proactive Heart Disease Risk Assessment
An intelligent medical application that combines machine learning with an intuitive GUI to predict an individual's risk of heart disease. Built with Python, Scikit-Learn, and Tkinter, this system helps healthcare professionals and individuals assess cardiovascular health through comprehensive data analysis and real-time visualization.
Heart disease remains one of the leading causes of death worldwide. Early detection and risk assessment are crucial for prevention and treatment. This Heart Disease Prediction System leverages machine learning to analyze 13 critical medical parameters and provide instant risk assessments, empowering both patients and healthcare providers to make informed decisions about cardiovascular health.
The system features a user-friendly graphical interface that allows for easy data entry, real-time visualization of health metrics, and comprehensive patient reports.
- Logistic Regression Algorithm for binary classification (at-risk vs. not at-risk)
- Trained on comprehensive heart disease dataset with validated medical parameters
- High accuracy achieved through stratified train-test split
- Real-time predictions based on 13 clinical features
- Proactive risk assessment for early intervention
- Patient Registration System with unique ID tracking
- Comprehensive Data Entry for all medical parameters
- Real-time Visualization with 4 dynamic graphs
- Instant Results with color-coded risk assessment
- Report Generation with numbered tracking
- Professional Medical Interface designed for clinical use
Four real-time graphs displaying:
- Categorical Features - Sex, FBS, Exang
- Vital Signs - Age, Blood Pressure, Cholesterol, Heart Rate
- Cardiac Metrics - Oldpeak, RestECG, Chest Pain Type
- Advanced Indicators - Slope, CA, Thal
13 Critical Features:
- Age - Patient's age in years
- Sex - Gender (1 = Male, 0 = Female)
- CP - Chest Pain Type (0-3)
- Trestbps - Resting Blood Pressure (mm Hg)
- Chol - Serum Cholesterol (mg/dl)
- FBS - Fasting Blood Sugar > 120 mg/dl
- RestECG - Resting Electrocardiographic Results (0-2)
- Thalach - Maximum Heart Rate Achieved
- Exang - Exercise Induced Angina (1 = Yes, 0 = No)
- Oldpeak - ST Depression Induced by Exercise
- Slope - Slope of Peak Exercise ST Segment (0-2)
- CA - Number of Major Vessels (0-3)
- Thal - Thalassemia (0-3)
Patient Data Entry (GUI)
↓
Data Validation
↓
Feature Extraction (13 parameters)
↓
Data Preprocessing (NumPy array conversion)
↓
Model Prediction (Logistic Regression)
↓
Risk Assessment (0 = No Risk, 1 = At Risk)
↓
Visualization (4 Real-time Graphs)
↓
Report Generation (Color-coded results)
Heart Disease Dataset (heart.csv)
↓
Data Loading & Exploration
↓
Feature Selection (X) & Target (Y)
↓
Train-Test Split (80/20, Stratified)
↓
Logistic Regression Model
↓
Model Training
↓
Accuracy Evaluation
↓
Model Serialization
↓
Real-time Prediction
Machine Learning:
- Scikit-Learn (Logistic Regression)
- Pandas (Data manipulation)
- NumPy (Numerical operations)
GUI Framework:
- Tkinter (Main application)
- ttk (Modern widgets)
Data Visualization:
- Matplotlib (Graph generation)
- matplotlib.backends.backend_tkagg (Tkinter integration)
Additional Libraries:
- datetime (Date handling)
- messagebox (User alerts)
Why Logistic Regression?
- Ideal for binary classification (risk vs. no risk)
- Provides probability estimates
- Interpretable coefficients
- Fast training and prediction
- Works well with medical datasets
- Patient's age in years
- Risk factor: Increases with age
- 1 = Male
- 0 = Female
- Males typically at higher risk
- 0 = Typical angina
- 1 = Atypical angina
- 2 = Non-anginal pain
- 3 = Asymptomatic
- Measured in mm Hg
- Normal: < 120 mm Hg
- Elevated: 120-129 mm Hg
- High: ≥ 130 mm Hg
- Measured in mg/dl
- Desirable: < 200 mg/dl
- Borderline high: 200-239 mg/dl
- High: ≥ 240 mg/dl
- 1 = True (> 120 mg/dl)
- 0 = False (≤ 120 mg/dl)
- Indicates diabetes risk
- 0 = Normal
- 1 = Having ST-T wave abnormality
- 2 = Showing probable or definite left ventricular hypertrophy
- Measured during exercise test
- Higher values generally better
- Age-adjusted norms apply
- 1 = Yes (chest pain during exercise)
- 0 = No
- Significant risk indicator
- ST depression induced by exercise relative to rest
- Measured in depression units
- Higher values indicate more severe condition
- 0 = Upsloping (better)
- 1 = Flat
- 2 = Downsloping (worse)
- 0-3 vessels colored by fluoroscopy
- More vessels = higher risk
- 0 = Normal
- 1 = Fixed defect
- 2 = Reversible defect
- 3 = Reversible defect
IMPORTANT NOTICE:
This Heart Disease Prediction System is designed as a supportive tool and should NOT replace professional medical diagnosis or treatment.
- For Educational and Research Purposes: This application demonstrates machine learning in healthcare
- Not a Medical Device: Not approved or certified for clinical diagnosis
- Consult Healthcare Professionals: Always seek advice from qualified medical practitioners
- Emergency Situations: Call emergency services immediately if experiencing chest pain or cardiac symptoms
- Risk Assessment Only: Predictions indicate probability, not definitive diagnosis
This project is open source and available under the MIT License.