Skip to content

Tdoggie/diabetic-readmission-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🏥 Hospital Readmission Risk Prediction

This project aims to build a machine learning model that predicts whether a diabetic patient is likely to be readmitted to the hospital within 30 days. Preventing avoidable readmissions is vital for improving patient outcomes and reducing healthcare costs. By leveraging structured medical records and a feature importance analysis, we provide both accurate predictions and actionable insights.


📌 Problem Statement

Hospital readmissions are costly and often avoidable. This project develops a predictive solution that identifies high-risk patients at discharge, enabling hospitals to intervene proactively. Our model uses features from patient demographics, hospitalization history, medications, diagnoses, and more.


🧪 Dataset Overview

The dataset used is sourced from the UCI Machine Learning Repository, containing over 100,000 hospital encounter records for diabetic patients from 130 U.S. hospitals.

Key characteristics:

  • 50 features
  • Categorical and numeric variables
  • Target variable: readmitted (within 30 days, after 30 days, or not at all)

🧠 Data Science Process

1. 📂 Data Cleaning

  • Removed irrelevant or high-missing columns (weight, payer_code, medical_specialty)
  • Filtered invalid or ambiguous entries (e.g., unknown gender, expired patients)
  • Consolidated category levels using mappings provided in the dataset

2. 🔍 Exploratory Data Analysis (EDA)

  • Assessed feature distributions, class imbalance, and correlations
  • Investigated relationships between patient demographics and readmission

3. 🏗️ Feature Engineering

  • Created new features such as:
    • total_visits
    • is_chronic_patient
    • num_medication_changes
  • Encoded ordinal/categorical variables appropriately
  • Standardized numeric features
  • Applied one-hot encoding where necessary

4. 🤖 Modeling & Evaluation

  • Used Logistic Regression as a baseline
  • Trained advanced models: Decision Tree, Random Forest, and XGBoost
  • Applied random oversampling to balance the classes
  • Evaluated models using:
    • Accuracy
    • Recall
    • F1-score
    • ROC-AUC
    • Classification report

5. 📊 Feature Importance Analysis

  • Used model-native feature importance metrics (e.g., .feature_importances_, coef_)
  • Identified top predictors of readmission risk
  • Visualized feature importance to interpret global model behavior
  • Used insights to support real-world decision-making in healthcare operations

🧰 Tech Stack

  • Language: Python
  • Libraries: pandas, NumPy, scikit-learn, matplotlib, seaborn
  • Tools: Jupyter Notebook / VSCode

📁 File Structure

project_directory/
│
├── Data/
│   └── diabetic_data.csv   # Main dataset containing patient hospital records
    └── IDS_mapping.csv     # Mapping file for categorical ID fields
├── index.ipynb             # Jupyter Notebook containing the full ML pipeline
├── README.md               # Project overview and documentation (this file)
├── presentation.pdf        # Non-technical presentation


💡 Key Insights

  • Features like num_medications, time_in_hospital, and age were among the most predictive
  • Patients with more procedures and chronic diagnoses showed higher readmission risk
  • Certain discharge types and gender patterns were also statistically significant

📈 Business Value

This model can be integrated into hospital EHR systems to:

  • Flag high-risk patients before discharge
  • Optimize care plans and follow-ups
  • Reduce costs from preventable readmissions
  • Improve patient satisfaction and clinical outcomes

🚀 Future Work

  • Deploy model via a Streamlit dashboard
  • Explore cost-sensitive learning techniques
  • Collect more recent or real-time data from hospitals
  • Expand scope to other chronic conditions

About

Predicting hospital readmissions for diabetic patients using machine learning. Includes data cleaning, feature engineering, class balancing, model evaluation, and feature importance analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors