This repository hosts a Python-based machine learning project aimed at predicting diabetes using patient health data from https://www.kaggle.com/datasets/mathchi/diabetes-data-set
The project leverages common machine learning techniques and several popular libraries, including pandas, NumPy, scikit-learn, and Matplotlib, to preprocess data, train models, and evaluate their performance.
The dataset used in this project is derived from the the National Institute of Diabetes and Digestive and Kidney Diseases. It includes several diagnostic measurements such as glucose concentration, blood pressure, skin thickness, insulin level, BMI, age, and more.
- Data Preprocessing: Includes handling missing values, feature scaling, and data transformations to prepare the dataset for modeling.
- Model Training and Evaluation: Employs three different machine learning models:
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Support Vector Machine (SVM)
- Performance Analysis: Evaluates models based on accuracy, precision, and recall. Includes detailed visualizations of model performance.
- Data Visualization: Uses Matplotlib and Seaborn for insightful visualizations of the dataset distribution and model outcomes.
Contributions are welcome! For major changes, please open an issue first to discuss what you would like to change. Please ensure to update tests as appropriate.
This project is licensed under the MIT License - see the LICENSE file for details.