Welcome to the Kidney Disease Prediction Using Machine Learning repository! This project aims to help healthcare professionals by providing tools for predicting Chronic Kidney Disease (CKD) using various machine learning classifiers. The comprehensive pipeline includes data cleaning, preprocessing, model training, evaluation, and inference.
- Introduction
- Project Overview
- Installation
- Usage
- Data
- Modeling
- Evaluation
- Deployment
- Contributing
- License
- Contact
Chronic Kidney Disease (CKD) affects millions worldwide. Early detection can save lives and improve treatment outcomes. This project uses machine learning to predict CKD based on various health metrics. By leveraging predictive analytics, healthcare practitioners can make informed decisions quickly.
This repository contains:
- Data cleaning scripts
- Preprocessing steps
- Multiple machine learning models including:
- Logistic Regression
- Random Forest
- XGBoost
- Naive Bayes
- AdaBoost
- Evaluation metrics to assess model performance
- Inference tools for real-time predictions
You can download the latest release here.
To get started, clone this repository to your local machine:
git clone https://github.com/macenkrace/Kidney-Disease-Prediction-Using-ML.git
Navigate to the project directory:
cd Kidney-Disease-Prediction-Using-ML
Next, install the required packages. It is recommended to use a virtual environment. You can create one using:
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
Then install the necessary libraries:
pip install -r requirements.txt
After setting up the environment, you can start using the Jupyter Notebook for model training and evaluation. Run the following command to launch Jupyter:
jupyter notebook
Open the relevant notebook files and follow the instructions to train your models. For inference, use the provided scripts to make predictions based on new input data.
The dataset used in this project is derived from various health metrics. It includes features such as:
- Age
- Blood Pressure
- Specific Gravity
- Albumin
- Sugar
- Blood Glucose
- Serum Creatinine
- Hemoglobin
- Packed Cell Volume
You can find the dataset in the data
folder. Make sure to clean and preprocess the data before training your models.
This project implements several machine learning algorithms. Here’s a brief overview of each:
Logistic regression is used for binary classification. It predicts the probability of CKD based on input features.
This algorithm builds multiple decision trees and merges them to improve accuracy and control overfitting.
XGBoost is a powerful gradient boosting algorithm that optimizes for speed and performance. It is effective for large datasets.
Naive Bayes uses the Bayes theorem for classification. It assumes independence among features, making it simple yet effective.
AdaBoost combines multiple weak classifiers to create a strong classifier. It adjusts weights based on the errors of previous classifiers.
Model evaluation is crucial for understanding performance. This project includes metrics such as:
- Accuracy
- Precision
- Recall
- F1 Score
- ROC-AUC
These metrics help determine which model performs best for CKD prediction.
For deploying the model as a web application, we use Streamlit. To run the app, execute:
streamlit run app.py
This command will launch a web interface where users can input health metrics and receive predictions on CKD.
We welcome contributions! If you want to help improve this project, please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Commit your changes.
- Push your branch to your forked repository.
- Create a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
For any questions or suggestions, feel free to reach out:
- Author: Your Name
- Email: [email protected]
You can also check the latest releases here.
Thank you for your interest in this project! Together, we can improve healthcare outcomes through data-driven insights.