Skip to content

nayana1729/Predictive-Modeling-Heart-Disease-Risk

Repository files navigation

Predictive Modeling of Heart Disease Risk

Cardiovascular disease remains the leading cause of death in the United States, yet risk manifests heterogeneously across comorbid populations. We used the 2020 CDC Behavioral Risk Factor Surveillance System “Key Indicators of Heart Disease” dataset (N = 319,795 adults, 17 demographic, lifestyle, and clinical variables) to ask two questions: 1) Does the predictive weight of traditional CVD risk factors differ between people with and without diabetes? and 2)Can we prospectively flag individuals who report no prior heart-disease diagnosis but nevertheless exhibit a high latent risk?

After data cleaning and randomly down-sampling the majority class to correct, the 0.91/0.09 class imbalance, we conducted exploratory analysis that confirmed previously established associations between heart disease and BMI, smoking, hypertension, and poor-self reported physical health. Then, we benchmarked the four classifiers of majority-class baseline, logistic regression, decision tree, and random forest, before optimizing a LightGBM model with stratified 5-fold cross-validation and Optuna hyper-parameter tuning.

The tuned LightGBM achieved an overall accuracy of 93% and a macro F1-score of 0.78 on the held-out test set, which outperformed all baselines. Feature-importance scores showed that BMI, days of poor physical health, high blood pressure, and elevated cholesterol were dominant predictors in both cohorts, but advanced age contributed more strongly for non-diabetic participants. Applying the trained model to the 279,000 respondents who reported no prior heart disease diagnosis, we identified more than 17,000 adults with a predicted probability of ≥ 0.80. These high-risk individuals clustered around the same adverse lifestyle and clinical profiles as those with diagnosed heart disease, suggesting at the model’s face validity and potential for targeted early-intervention.

In summary, conventional risk factors remain prominent regardless of diabetes status, however, the degree to which each of these matters changes subtly with diabetes. The stratified LightGBM model efficiently runs large datasets and offers a scalable screening tool that can prioritize preventative care.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5