An AI-powered healthcare analytics system that predicts patient appointment no-shows using machine learning and surfaces risk insights through an interactive Streamlit dashboard.
Missed clinical appointments lead to:
- Resource wastage and idle clinic capacity
- Increased patient waiting times
- Revenue loss for healthcare providers
- Disrupted continuity of patient care
NoShowAI estimates the probability that an appointment will result in a no-show using historical scheduling data. Clinics can identify high-risk appointments early and take preventive actions — sending reminders, rescheduling, or initiating social-work outreach.
- Data Preprocessing & Feature Engineering — cleaned appointment records with lead-time, demographic encoding, and reminder flags
- Multi-Model ML Training — Logistic Regression, Decision Tree, and Random Forest trained with class balancing (SMOTE or
class_weight='balanced') - Best-Model Auto-Selection — model with highest F1 (No-Show) score is saved as
model/best_model.pkl - Risk Probability Estimation — per-appointment probability scores with configurable high-risk threshold
- Interactive Streamlit Dashboard — 3-page workflow: Upload → Risk Dashboard → Analytics
- Model Performance Comparison — Accuracy, F1, and R² metrics surfaced per model in the UI
| Layer | Libraries |
|---|---|
| Data | pandas, numpy |
| ML | scikit-learn, imbalanced-learn (optional, for SMOTE) |
| Serialisation | joblib |
| Visualisation | matplotlib, seaborn, Streamlit native charts |
| Dashboard | streamlit |
Dependencies: requirement.txt
Raw Data (CSV)
│
▼
Preprocessing / Feature Engineering ←── notebooks/
│
▼
Model Training (src/train.py)
├── Logistic Regression → model/logistic_regression.pkl
├── Decision Tree → model/decision_tree.pkl
├── Random Forest → model/random_forest.pkl
└── Best Model (by F1) → model/best_model.pkl
│
▼
Streamlit Dashboard (ui/app.py)
├── Page 1: Upload Dataset
├── Page 2: Risk Dashboard (predictions + colour-coded risk table)
└── Page 3: Analytics (distributions, model metrics, feature importance)
NoShowAI/
├── data/ # Raw and processed datasets
├── notebooks/ # EDA, preprocessing, and modelling notebooks
├── src/
│ └── train.py # Full training pipeline (3 models + best-model export)
├── model/ # Saved model artefacts (.pkl files)
│ ├── best_model.pkl
│ ├── logistic_regression.pkl
│ ├── decision_tree.pkl
│ ├── random_forest.pkl
│ └── scaler.pkl
├── ui/
│ └── app.py # Streamlit dashboard (Upload → Risk → Analytics)
├── docs/ # Architecture notes and documentation
├── requirement.txt # Python dependencies
└── README.md
python -m venv .venv
source .venv/bin/activatepip install -r requirement.txtOptionally install
imbalanced-learnfor SMOTE oversampling during training:pip install imbalanced-learn
Place your cleaned dataset at data/noshow_cleaned (1).csv, then run:
python src/train.pyThis will train all three models, print a comparison table, and save the best-performing model as model/best_model.pkl.
streamlit run ui/app.pyOpen http://localhost:8501 in your browser. Follow the 3-step in-app workflow:
- Upload Dataset — upload the preprocessed CSV
- Risk Dashboard — click Predict No-Show Risk to score all appointments; high-risk rows are highlighted in red
- Analytics — view model performance (Accuracy / F1 / R²), no-show distribution, probability histogram, and feature importance
| Step | Page | Action |
|---|---|---|
| 1 | Upload Dataset | Upload a preprocessed appointments CSV |
| 2 | Risk Dashboard | Run predictions; adjust high-risk threshold (default 0.70) |
| 3 | Analytics | Review per-model metrics, charts, and feature importances |
| Member | Responsibility |
|---|---|
| Harsh | Data preprocessing & feature engineering |
| Manmath | ML modelling & evaluation |
| Ansh | Analysis, visualisation & insights |
| Yash | UI development & system integration |
- Automated intervention recommendations (reminders / rescheduling suggestions)
- Explainability layer (SHAP values for per-patient rationale)
- Deployment for clinic workflow integration (Docker / cloud)
- Real-time appointment ingestion via API
Developed for academic and research purposes.