Heart-Disease-Predictor---Group16

Authors: Shrabanti Bala Joya, Sarisha Das, Omowunmi Obadero, Mantram Sharma

About

Here we attempt to build a classification model to predict whether an individual is at risk of a heart disease. The dataset contains 1000 unique examples and 14 features containing information on the individuals' cholesterol, blood pressure, fasting blood sugar, etc. Our target column contains binary encoding where 1 translates to 'heart disease' and 0 to 'no heart disease'.

We performed exploratory data analysis (EDA) and applied SciKit Learn's preprocessing tools such as StandardScaler, OneHotEncoder and Ordinal encoder to preprocess the data based on the EDA. We built four different models - Decision Tree, Support Vector Machine (SVM) with Radial Basis Function (RBF) kernel, Logistic Regression and a Dummy Classifier. We used the Dummy Classifier as the baseline and compared cross-validation scores achieved from the other three models. The Support Vector Machine (Classifier) performed reasonably well than the other models with 0.98 test accuracy with recall = 0.98 and precision = 0.98.

It is imperative to ensure accurate diagnosis of heart disease based on a individuals clinical features. Among the evaluated models, we believe that the Support Vector Machine with RBF Kernel will yield the most reliable results as reflected in it's overall performance.

The dataset used in this project has been obtained from Mendeley Data. A detailed explanation of all the important features are provided in our analysis. You can find the raw and processed datasets in the data directory of this repository. Our train and test dataset are represented in train_heart.csv and test_heart.csv respectively.

Dependencies

Docker

Report

The final report can be found here.

Usage (Attributed from Breast-Cancer-Predictor Project README)

Setup

If you are using Windows or Mac, make sure Docker Desktop is running.

Clone this GitHub repository.

Running the analysis

Navigate to the root of this project on your computer using the command line and enter the following command:

docker compose up

In the terminal, look for a URL that starts with http://127.0.0.1:8888/lab?token= (for an example, see the highlighted text in the terminal below). Copy and paste that URL into your browser.

Open a terminal and execute the following commands from the root to run the analysis:

3a. Reset the project to a clean state (remove all generated files from the analysis)

make clean

3b. Run the analysis in its entirity, including generation of new HTML report, run the following:

make all

Clean up

To shut down the container and clean up the resources, type Ctrl + C in the terminal where you launched the container, and then type

docker compose rm

Developer notes

Developer dependencies

conda (version 23.9.0 or higher)
conda-lock (version 2.5.7 or higher)

Running the analysis on local

Open a terminal in the root of the folder. Please make sure conda and conda-lock is installed and the base environment is activated. To ensure the base is active run:

conda activate base

Run the following to create a new environment for the analysis (Replace <env_name> with a relevent name for your new environment)

conda-lock install --name <env_name> conda-lock.yml

Please wait a while for the packages to download.

Now activate the new environment using

conda activate <env_name>

Now in the terminal execute the following commands from the root to run the analysis:

Reset the project to a clean state (remove all generated files from the analysis)

make clean

Run the analysis in its entirity, including generation of new HTML report, run the following:

make all

Adding a new dependency

Add the dependency to the environment.yml file on a new branch.
To update the conda-linux-64.lock file run the following.

conda-lock -k explicit --file environment.yml -p linux-64

Note: This may create additional lockfiles for multiple OS types, please ignore/delete the irrelevent lock-files.

Re-build the Docker image locally to ensure it builds and runs properly. Replace <your_tag> with a tag of your choice.

docker build --tag <your_tag> .

Push the changes to GitHub. A new Docker image will be built and pushed to Docker Hub automatically. It will be tagged with the SHA for the commit that changed the file.
Update the docker-compose.yml file on your branch to use the new container image (make sure to update the tag specifically).
Send a pull request to merge the changes into the main branch.

License

The Heart Disease Predictor report contained in this repository is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. Please refer to the license file for full details. If you reuse or adapt any part of this report, kindly provide proper attribution and include a link to this webpage.

The software code included in this repository is licensed under the MIT License See the license file for further information.

References

Ttimbers. (n.d.). TTIMBERS/breast-cancer-predictor. GitHub. https://github.com/ttimbers/breast-cancer-predictor/tree/main?tab=readme-ov-file

Doppala, B. P., & Bhattacharyya, D. (2021, April 16). Cardiovascular Disease Dataset (Version 1) [Data Set]. Mendeley Data. https://doi.org/10.17632/dzz48mvjht.1

Name		Name	Last commit message	Last commit date
Latest commit History 322 Commits
.github/workflows		.github/workflows
analysis		analysis
data		data
images		images
results		results
scripts		scripts
tests		tests
utils		utils
.gitignore		.gitignore
.lesshst		.lesshst
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
conda-linux-64.lock		conda-linux-64.lock
conda-lock.yml		conda-lock.yml
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heart-Disease-Predictor---Group16

About

Dependencies

Report

Usage (Attributed from Breast-Cancer-Predictor Project README)

Setup

Running the analysis

Clean up

Developer notes

Developer dependencies

Running the analysis on local

Adding a new dependency

License

References

About

Uh oh!

Releases 3

Packages

Contributors 5

Uh oh!

Languages

License

sjbalagit/Heart-Disease-Predictor---Group16

Folders and files

Latest commit

History

Repository files navigation

Heart-Disease-Predictor---Group16

About

Dependencies

Report

Usage (Attributed from Breast-Cancer-Predictor Project README)

Setup

Running the analysis

Clean up

Developer notes

Developer dependencies

Running the analysis on local

Adding a new dependency

License

References

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 5

Uh oh!

Languages

Packages