This project explores the Indian Liver Patient Dataset using both unsupervised and supervised learning methods.
The goal is to analyze patient data, uncover patterns, and build predictive models for liver disease.
- Source: Kaggle β Indian Liver Patient Dataset
- 583 samples, 10 features
- Features include Age, Gender, Bilirubin, Enzyme levels, Proteins, etc.
- Target: Liver disease diagnosis (1 = Disease, 2 = No Disease)
-
Exploratory Data Analysis (EDA)
- Correlation heatmap
- Gender distribution
- Feature distributions
-
Clustering (Unsupervised)
- KMeans
- Gaussian Mixture Models (GMM)
-
Classification (Supervised)
- Random Forest Classifier
| Model | Accuracy | Notes |
|---|---|---|
| KMeans | ~69% | Captured partial structure |
| GMM | ~57% | Poor clustering |
| Random Forest | ~75-80% | Best performing supervised model |
- Supervised learning (Random Forest) outperformed clustering approaches.
- Clustering showed some structure but not reliable for diagnosis.
- This project highlights the importance of preprocessing and proper model choice in medical datasets.
pip install -r requirements.txt
jupyter notebook liver_patient_analysis.ipynb