Diabetes Project Topic

Predicting the presence of diabetes (1) vs the absence of diabetes (0)

Research Question

Logistic Regression vs Knn vs Decision Tree on predicting the presence of diabetes

Why are we using these two models?

Both are suitable for binary classification

Purpose of this project

Improve my knowledge of current ML supervised learning algorithms by implementing them from scratch and evaluating them using certain metrics
I am interested in healthcare and want to begin my ML journey with some interesting and manageable health data.
Diabetes are common and can often be caused due to genetic mutations. Early detection is key so I wanted to build a model that can accurately determine the presence and absence of diabetes in a current individual.
Practice for future experiementation with big data including cancer, and other types of health conditions. This includes CNN for MRI imaging, and CT imaging specifically.

Outline of this project

Data Exploration

To understand the kaggle dataset we are working with, it is important to first performe some baseline exploratory analysis. To do this we must first ask ourselves a few questions:

Are there any missing data?

There is no missing data=

Is this dataset normally distributed?

Is this dataset imbalanced?

We can measure this by grouping each class to find the proportion of each class relative to the number of examples.

Does multicolinearity exist?
Which features are important? Which features are good predictors for classification?

Logistic Regression model

Sigmoid function: We define the sigmoid function so that the outputs are calculated to be between 0 and 1

Cost function: Defined as the total average loss. Binary classification uses a cross entropy.

Gradient Descent: Calculating the partial derivatives for all parameters (weights, biases) with respect to the cost function.

Update step: We simulatenously update the weights and biases by multiplying the learning rate by the partial derivatives for each feature.

Analysis: By visualizing the cost function vs iteration graph, I noticed that feature scaling was necessary as it occured to be oscillating. By utilizing the standard scaler library and normalizing the training data, the cost function started to decrease with every iteration.

Peformance:

KNN Model

Decision Tree Model

Conclusions/Findings

Evaluation Metrics

There are multiple metrics for evaluating classification problems. However since this dataset isn't imbalanced, we will be evaluating the performance of our two algorithms by measuring their accuracy. Furthermore, a confusion matrix will be visualized so other metrics such as precision and recall can be considered later.

Future Consideration

I will be deploying this model online so our algorithm will be able to determine whether you may have the presence of diabetes or not.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
ml-venv		ml-venv
unsupervised		unsupervised
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
baggingclassify.py		baggingclassify.py
decisiontree.py		decisiontree.py
diabetes.ipynb		diabetes.ipynb
healthcare.csv		healthcare.csv
logisticregression.py		logisticregression.py
node.py		node.py
test_models.py		test_models.py
testing.py		testing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Diabetes Project Topic

Research Question

Purpose of this project

Outline of this project

Data Exploration

Logistic Regression model

KNN Model

Decision Tree Model

Conclusions/Findings

Evaluation Metrics

Future Consideration

About

Uh oh!

Releases

Packages

Uh oh!

Languages

shinji-Yama77/ml-models

Folders and files

Latest commit

History

Repository files navigation

Diabetes Project Topic

Research Question

Purpose of this project

Outline of this project

Data Exploration

Logistic Regression model

KNN Model

Decision Tree Model

Conclusions/Findings

Evaluation Metrics

Future Consideration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages