🩺 Exploratory Data Analysis (EDA) on Diabetes Dataset

📌 Project Overview

This project performs an in-depth Exploratory Data Analysis (EDA) on the Pima Indians Diabetes Dataset. The goal is to investigate the relationship between various health metrics (like Glucose, BMI, and Age) and the onset of diabetes.

By analyzing the data distribution and correlations, we identify which factors are the strongest predictors of the disease.

📊 Dataset Features

The dataset includes several medical predictor variables and one target variable (Outcome):

Pregnancies: Number of times pregnant.
Glucose: Plasma glucose concentration.
BloodPressure: Diastolic blood pressure (mm Hg).
SkinThickness: Triceps skin fold thickness (mm).
Insulin: 2-hour serum insulin (mu U/ml).
BMI: Body mass index (weight in kg/(height in m)^2).
DiabetesPedigreeFunction: Diabetes likelihood based on family history.
Age: Age in years.
Outcome: Class variable (0 = Non-diabetic, 1 = Diabetic).

🚀 Key Analysis Steps

Data Cleaning: Identifying and handling missing values (zeros in Glucose/Insulin/BP).
Descriptive Statistics: Summary of mean, median, and variance across health metrics.
Distribution Analysis: Using Histograms and KDE plots to see the spread of the data.
Outlier Detection: Using Boxplots to identify extreme health readings.
Correlation Mapping: Using Heatmaps to see how features like BMI and Glucose relate to the Outcome.
Class Balance: Checking the ratio of Diabetic vs. Non-diabetic cases.

🛠️ Tech Stack

Language: Python
Libraries:
- Pandas (Data Cleaning)
- NumPy (Mathematical Operations)
- Matplotlib & Seaborn (Visualizations)

📈 Key Insights (Sample)

Glucose & BMI: Show the strongest positive correlation with a positive Diabetes outcome.
Age Factor: Older individuals in this dataset show a higher frequency of being diabetic.
Insulin Levels: A significant number of missing values (zeros) were found in the Insulin column, requiring specific data imputation strategies.

📂 Project Structure

├── diabetes.csv         # Raw dataset
├── EDA_Diabetes.ipynb   # Main Jupyter Notebook
├── requirements.txt     # List of dependencies
└── README.md            # Project documentation

⚙️ Installation

Clone the repo:

git clone https://github.com/Akshat8510/EDA-on-Diabetes_Dataset.git

Install libraries:

pip install pandas seaborn matplotlib numpy

Developed by Akshat as part of a Data Science portfolio.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Diabetes Prediction.ipynb		Diabetes Prediction.ipynb
README.md		README.md
diabetes.csv		diabetes.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🩺 Exploratory Data Analysis (EDA) on Diabetes Dataset

📌 Project Overview

📊 Dataset Features

🚀 Key Analysis Steps

🛠️ Tech Stack

📈 Key Insights (Sample)

📂 Project Structure

⚙️ Installation

About

Uh oh!

Releases

Packages

Languages

Akshat8510/EDA-on-Diabetes_Dataset

Folders and files

Latest commit

History

Repository files navigation

🩺 Exploratory Data Analysis (EDA) on Diabetes Dataset

📌 Project Overview

📊 Dataset Features

🚀 Key Analysis Steps

🛠️ Tech Stack

📈 Key Insights (Sample)

📂 Project Structure

⚙️ Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages