🏥 Hospital Readmission Risk Prediction

This project aims to build a machine learning model that predicts whether a diabetic patient is likely to be readmitted to the hospital within 30 days. Preventing avoidable readmissions is vital for improving patient outcomes and reducing healthcare costs. By leveraging structured medical records and a feature importance analysis, we provide both accurate predictions and actionable insights.

📌 Problem Statement

Hospital readmissions are costly and often avoidable. This project develops a predictive solution that identifies high-risk patients at discharge, enabling hospitals to intervene proactively. Our model uses features from patient demographics, hospitalization history, medications, diagnoses, and more.

🧪 Dataset Overview

The dataset used is sourced from the UCI Machine Learning Repository, containing over 100,000 hospital encounter records for diabetic patients from 130 U.S. hospitals.

Key characteristics:

50 features
Categorical and numeric variables
Target variable: readmitted (within 30 days, after 30 days, or not at all)

🧠 Data Science Process

1. 📂 Data Cleaning

Removed irrelevant or high-missing columns (weight, payer_code, medical_specialty)
Filtered invalid or ambiguous entries (e.g., unknown gender, expired patients)
Consolidated category levels using mappings provided in the dataset

2. 🔍 Exploratory Data Analysis (EDA)

Assessed feature distributions, class imbalance, and correlations
Investigated relationships between patient demographics and readmission

3. 🏗️ Feature Engineering

Created new features such as:
- total_visits
- is_chronic_patient
- num_medication_changes
Encoded ordinal/categorical variables appropriately
Standardized numeric features
Applied one-hot encoding where necessary

4. 🤖 Modeling & Evaluation

Used Logistic Regression as a baseline
Trained advanced models: Decision Tree, Random Forest, and XGBoost
Applied random oversampling to balance the classes
Evaluated models using:
- Accuracy
- Recall
- F1-score
- ROC-AUC
- Classification report

5. 📊 Feature Importance Analysis

Used model-native feature importance metrics (e.g., .feature_importances_, coef_)
Identified top predictors of readmission risk
Visualized feature importance to interpret global model behavior
Used insights to support real-world decision-making in healthcare operations

🧰 Tech Stack

Language: Python
Libraries: pandas, NumPy, scikit-learn, matplotlib, seaborn
Tools: Jupyter Notebook / VSCode

📁 File Structure

project_directory/
│
├── Data/
│   └── diabetic_data.csv   # Main dataset containing patient hospital records
    └── IDS_mapping.csv     # Mapping file for categorical ID fields
├── index.ipynb             # Jupyter Notebook containing the full ML pipeline
├── README.md               # Project overview and documentation (this file)
├── presentation.pdf        # Non-technical presentation

💡 Key Insights

Features like num_medications, time_in_hospital, and age were among the most predictive
Patients with more procedures and chronic diagnoses showed higher readmission risk
Certain discharge types and gender patterns were also statistically significant

📈 Business Value

This model can be integrated into hospital EHR systems to:

Flag high-risk patients before discharge
Optimize care plans and follow-ups
Reduce costs from preventable readmissions
Improve patient satisfaction and clinical outcomes

🚀 Future Work

Deploy model via a Streamlit dashboard
Explore cost-sensitive learning techniques
Collect more recent or real-time data from hospitals
Expand scope to other chronic conditions

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Data		Data
Scripts		Scripts
README.md		README.md
best_model_with_features.pkl		best_model_with_features.pkl
index.ipynb		index.ipynb
presentation.pdf		presentation.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏥 Hospital Readmission Risk Prediction

📌 Problem Statement

🧪 Dataset Overview

🧠 Data Science Process

1. 📂 Data Cleaning

2. 🔍 Exploratory Data Analysis (EDA)

3. 🏗️ Feature Engineering

4. 🤖 Modeling & Evaluation

5. 📊 Feature Importance Analysis

🧰 Tech Stack

📁 File Structure

💡 Key Insights

📈 Business Value

🚀 Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🏥 Hospital Readmission Risk Prediction

📌 Problem Statement

🧪 Dataset Overview

🧠 Data Science Process

1. 📂 Data Cleaning

2. 🔍 Exploratory Data Analysis (EDA)

3. 🏗️ Feature Engineering

4. 🤖 Modeling & Evaluation

5. 📊 Feature Importance Analysis

🧰 Tech Stack

📁 File Structure

💡 Key Insights

📈 Business Value

🚀 Future Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages