The project aims to identify the most suitable ML algorithm for the Heart Disease Prediction. For that, I selected SVM and Random Forest.
Introduction: This project aims to utilize machine learning algorithms to develop a predictive model for heart disease, potentially contributing to the early detection of heart disease, thereby reducing the global death rate of cardiovascular diseases.
Defining the ML problem: Task (T): Build a heart disease risk prediction tool for the next 10 years using a machine model. Experience (E): A dataset with details under different categories like demographic, behavioral, past, and current medical conditions will be used to train the model. Performance Metric (P): To measure the performance of the model, metrics like Accuracy or Mean Absolute Error (MAE) will be used.
Data: Web link for the data set: https://www.kaggle.com/datasets/dileep070/heart-disease-prediction-usinglogistic-regression/data
Features - 14 features
- Sex
- Age
- Current Smoker
- Cigarettes Per Day (Cigs Per Day)
- Blood Pressure Medication (BP Meds)
- Previously had Stroke (Prevalent Stroke)
- Hypertensive (Prevalent Hyp) No of examples: 4238 Possible methods of handling missing data Imputation: Use the value of the median to fill in the blanks.
Algorithms:
- SVM
- Random Forest