- Explain the issues in the data with data cleaning and preperation
- Predicting if the patient is going to develop diabetes based on three or more children indicator i.e. if the mother has more than three children or not and then calculating the probability of developing diabetes given the mother has more than three children and vice verca
- Predicting if the patient is going to develop diabetes based on multiple parameters and choosing the best model to predict it using ToPredict dataset as testing dataset
The first task is the analyse the data and performe some data cleaning steps
-From the above figure we can visualise that there are many zeros in the column Insulin and Skinthickness. It is not possible to get Insulin and Skinthickness as zero therefore I decided the drop these columns as they will not be useful in the prediction process
- We also replace the missing values with the median for the rest of the columns
First, let's check for the correlation between the parameters
The correlation between the paramters is not higher than 0.7 which is good as it will help in predicting.
Now, for the bivariate analysis
From the pairplot it's difficult the classify based on the scatterplots
In the first step we will create a column called threeormore which indicated wether the patient has more than three children or not Then we calculate probability after the model fitting using logistic regression
Bayes rule was used to calculate probability for this step
First we need to check which model performs the best
We can see that logistic regression performs better than DecisionTreeRegressor therefore we will use it for our further prediction
The data used to create the dataset PimaDiabetes.cv, which is used in the coursework, was originally collected by the National Institute of Diabetes and Digestive and Kidney Diseases in the United States. It includes a 0/1 variable, Outcome, which indicates whether the subject ultimately tested positive for diabetes, along with a list of numerous diagnostic measures recorded from 750 women.









