GitHub - purvanshjoshi/clinical-risk-predictor: 🩺 AstraMed: A Clinical Risk Intelligence Platform. Powered by SOTA Ensemble ML & BioMistral-7B for predictive medical analytics and explainable risk scoring.

🏥 𝐀𝐬𝐭𝐫𝐚𝐌𝐞𝐝: 𝐂𝐥𝐢𝐧𝐢𝐜𝐚𝐥 𝐑𝐢𝐬𝐤 𝐀𝐈

Next-Generation Predictive Analytics & Decision Support System

AstraMed represents a paradigm shift inExplore Live App | Test API Engine | View Code

🌍 Live Deployment

Component	Status	Stack	Link
Prediction Engine	🟢 Online	FastAPI + XGBoost/CatBoost	API Docs
ML Inference Node	🟢 Online	Python 3.10	Model Spaces
Frontend App	🟢 Online	React + TypeScript	Live App

Interactive Demo: Visit the API Docs link to explore the Swagger documentation and test the model inference directly.

🎯 Problem Statement

🏥 Track 1: Clinical Decision Support

graph LR
    A[😷 Silent Disease<br/>Progression] --> B[⏰ Late Detection]
    B --> C[💰 Costly<br/>Interventions]
    C --> D[📉 Poor<br/>Outcomes]
    
    style A fill:#ff6b6b,stroke:#c92a2a,color:#fff
    style B fill:#ffa94d,stroke:#e8590c,color:#fff
    style C fill:#ffd43b,stroke:#fab005,color:#000
    style D fill:#ff6b6b,stroke:#c92a2a,color:#fff

🔍 Context

Chronic diseases such as diabetes often develop silently. By the time symptoms appear, interventions become costly and outcomes worsen. Clinicians operate under:

⏱️ Time Pressure — Limited consultation windows
📊 Data Gaps — Incomplete historical records
❓ Uncertainty — Complex probabilistic assessments

Meanwhile, patients struggle to understand probabilistic health risks and preventive actions.

⚠️ The Challenge

Design a clinical decision support workflow that:

✅ Surfaces early risk signals from routine patient data

✅ Supports informed, timely interventions

✅ Doesn't overwhelm doctors or mislead patients

💡 What Our Solution Enables

🔬 For Clinicians

📈 High-density risk scores with confidence intervals
🎯 Key contributing factors ranked by importance
📊 SHAP-based explanations and visualizations
💊 Evidence-based action recommendations
📉 Longitudinal trend analysis

👤 For Patients

🚦 Simple risk gauges (traffic light system)
📝 Plain-language summaries
🥗 Personalized lifestyle guidance
📱 Progress tracking over time
✨ AI-generated action plans

💡 Solution Architecture

🏗️ System Design

🔄 Data Flow Pipeline

🚀 Key Features

🎯 Core Capabilities

1️⃣ Risk Scoring & Stratification 📊 Multi-Level Classification: Low / Medium / High risk tiers 📈 Confidence Intervals: Uncertainty quantification 📉 Longitudinal Tracking: Risk velocity over time 🎯 Percentile Rankings: Population-based context	2️⃣ Explainability & Transparency 🔍 SHAP Values: Feature importance rankings 📊 Force Plots: Visual explanation of predictions 🎨 Interactive Charts: Drill-down analysis 📋 Audit Trails: Complete decision logs
3️⃣ Counterfactual Reasoning 🎛️ What-If Scenarios: "Reduce BMI by 5% → Risk ↓15%" 🔄 Interactive Simulation: Real-time slider controls 🎯 Modifiable Factors: Focus on actionable changes 📈 Impact Visualization: Before/after comparisons	4️⃣ AI-Powered Reports 📝 Clinical Summaries: Technical detail for providers 👤 Patient Explanations: Plain-language versions 🤖 BioMistral-7B: Medical-grade language model 📄 PDF Generation: Exportable reports
5️⃣ Population Analytics 👥 Digital Twin Matching: Find similar patient outcomes 📊 Cohort Analysis: Demographic comparisons 🎯 Percentile Context: "Your risk is higher than 82% of peers" 📈 Trend Detection: Population-level patterns	6️⃣ Pro Max UI/UX Experience 💎 Glassmorphism 2.0: Premium, accessible aesthetic 🔲 Bento Grid Layout: Information-dense, organized dashboard 🎛️ Interactive Sliders: Real-time "What-If" adjustments 📊 Radial Risk Gauges: Dynamic, animated risk visualization

🧠 The Machine Learning Engine

AstraMed is powered by an enterprise-grade Ensemble Learning Pipeline designed for high-stakes clinical environments where accuracy and explainability are paramount.

🔬 Architecture: The "Tri-Force" Ensemble

Instead of relying on a single model, we leverage a Soft-Voting Ensemble of three industry-leading gradient boosting algorithms:

XGBoost (eXtreme Gradient Boosting): Optimized for speed and performance on structured clinical data.
CatBoost (Categorical Boosting): Handles categorical features (e.g., "Gender", "Smoking History") natively without leakage.
LightGBM: Provides high efficiency on large-scale datasets.

graph TD
    A[Patient Data] --> B[Preprocessing Pipeline]
    B --> C{Ensemble Core}
    C -->|Probability| D[XGBoost]
    C -->|Probability| E[CatBoost]
    C -->|Probability| F[LightGBM]
    D & E & F --> G[Soft Voting Aggregator]
    G --> H[Final Risk Score]
    H --> I[Calibration Layer]
    I --> J[Risk Stratification]

🔍 Explainable AI (XAI) with SHAP

We solve the "Black Box" problem using SHAP (SHapley Additive exPlanations). Every prediction comes with a mathematical justification:

Local Interpretability: Why did this specific patient get a high risk score? (e.g., "+15% due to High HbA1c").
Global Interpretability: What factors drive disease risk across the entire population?

🔄 "What-If" Counterfactual Simulation

AstraMed goes beyond static predictions. Our Counterfactual Engine allows clinicians to simulate outcomes:

"If the patient reduces BMI by 2 points and lowers HbA1c to 5.7%, how does their 5-year risk change?" This empowers shared decision-making and personalized goal setting.

🛠️ Tech Stack

Backend Stack

Frontend Stack

ML & AI Stack

Frontend

Backend & ML

DevOps & Infrastructure

📊 Technology Matrix

Layer	Technology	Purpose
🎨 Frontend	React + TypeScript	Interactive UI components
🎨 Styling	Tailwind CSS	Responsive design system
⚡ Backend	FastAPI	High-performance REST API
🧠 ML Engine	XGBoost + LightGBM + CatBoost	SOTA ensemble prediction
🔍 Explainability	SHAP	Feature importance analysis
🤖 AI Engine	BioMistral-7B	Medical language model
💾 Database	JSON Store (MVP) → PostgreSQL	Patient history & records
🐳 Container	Docker + Docker Compose	Consistent deployment
🚀 Deployment	Huggingface (Backend) + Vercel (Frontend)	Cloud hosting

👥 Team Structure

🎯 4-Member Multidisciplinary Team

🔬 ML Engineer

Model Development & Explainability

🎯 Responsibilities

📊 Dataset cleaning and exploratory data analysis
🤖 Risk model development (XGBoost, LightGBM, CatBoost)
📈 Uncertainty quantification and calibration
🔍 SHAP-based feature importance
🎲 Counterfactual reasoning implementation
⚖️ Bias detection and fairness analysis

📦 Deliverables

ml-research/train.py — Model training pipeline
backend/models/risk_model.py — Inference engine
backend/models/explainability.py — SHAP integration
Model performance reports and visualizations

⚙️ Backend Engineer

FastAPI Services & Infrastructure

🎯 Responsibilities

🏗️ API architecture and endpoint design
📥 Patient data ingestion and validation
🔐 Authentication and authorization
📊 Risk computation API endpoints
👥 Cohort analysis and digital twin matching
🚀 Deployment setup (Docker, Render)

📦 Deliverables

backend/app.py — Main FastAPI application
backend/routes/ — All API endpoints
backend/schemas/ — Pydantic models
API documentation (OpenAPI/Swagger)

👨‍⚕️ Frontend Engineer (Clinician)

Professional Dashboard Interface

🎯 Responsibilities

🎨 Clinician dashboard UI/UX design
🔍 Patient search and filtering system
📊 Risk score visualization (gauges, charts)
🎯 Key driver display components
📋 Explanation panels and tooltips
💊 Action recommendation interface

📦 Deliverables

frontend/src/components/Clinician/ — Dashboard components
Risk visualization library
Clinical workflow integration
Responsive design implementation

👤 Frontend Engineer (Patient)

Patient Portal & Documentation

🎯 Responsibilities

🎨 Patient portal UI/UX design
🚦 Simple risk gauge (traffic light)
📝 Plain-language explanation generation
🥗 Lifestyle recommendation interface
📈 Progress tracking visualizations
📚 Project documentation and pitch deck

📦 Deliverables

frontend/src/components/Patient/ — Patient components
User-friendly health guidance interface
docs/ — Comprehensive documentation
Presentation slides and demo materials

📅 Development Timeline

📋 Detailed Sprint Plan

🗓️ Week 1: Design & Core Model (by Jan 24)

🎯 Click to expand tasks

🗓️ Week 2: Full Stack Development (by Jan 31)

🎯 Click to expand tasks

🗓️ Week 3: Polish & Submission (by Feb 9)

🎯 Click to expand tasks

⚡ Quick Start

📋 Prerequisites

# Required software
✅ Python 3.10+
✅ Node.js 18+
✅ Git 2.30+
✅ Docker 24.0+ (optional)

🐍 Backend Setup

# Navigate to backend directory
cd backend

# Create virtual environment
python -m venv venv

# Activate virtual environment
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the server
uvicorn backend.api:app --reload --port 8001

# 🎉 Server running at http://localhost:8001
# 📚 API docs at http://localhost:8001/docs

⚛️ Frontend Setup

# Navigate to frontend directory
cd frontend

# Install dependencies
npm install

# Start development server
npm run dev

# 🎉 App running at http://localhost:5173

🧠 ML Model Training

# Navigate to ML research directory
cd ml-research

# Install dependencies
pip install -r requirements.txt

# Train the model
python train_pro.py

# 📦 Models saved to backend/models/
# 📊 Performance metrics in outputs/

🐳 Docker Deployment (Recommended)

# Build and run all services
docker-compose up --build

# Services available at:
# 🎨 Frontend: http://localhost:3000
# ⚡ Backend: http://localhost:8001
# 📚 API Docs: http://localhost:8001/docs

📦 Project Structure

clinical-risk-predictor/
│
├── 📁 backend/                     # FastAPI Server
│   ├── 📄 app.py                   # Main application entry point
│   ├── 📄 requirements.txt         # Python dependencies
│   │
│   ├── 📁 models/                  # ML Risk Models
│   │   ├── 📄 risk_model.py        # Ensemble prediction engine
│   │   ├── 📄 counterfactuals.py   # What-if analysis logic
│   │   └── 📄 explainability.py    # SHAP feature importance
│   │
│   ├── 📁 routes/                  # API Endpoints
│   │   ├── 📄 patient.py           # Patient data management
│   │   ├── 📄 risk.py              # Risk computation APIs
│   │   └── 📄 cohort.py            # Population analytics
│   │
│   ├── 📁 schemas/                 # Data Validation
│   │   ├── 📄 patient.py           # Patient data models
│   │   └── 📄 prediction.py        # Prediction schemas
│   │
│   └── 📁 utils/                   # Helper Functions
│       ├── 📄 preprocessing.py     # Feature engineering
│       └── 📄 validation.py        # Data validation
│
├── 📁 frontend/                    # React Application
│   ├── 📄 package.json             # Node dependencies
│   ├── 📄 vite.config.ts           # Vite configuration
│   │
│   ├── 📁 public/                  # Static Assets
│   │   └── 🖼️ logo.svg
│   │
│   └── 📁 src/
│       ├── 📄 App.tsx              # Root component
│       ├── 📄 main.tsx             # Entry point
│       │
│       ├── 📁 components/          # React Components
│       │   │
│       │   ├── 📁 Clinician/       # Doctor Dashboard
│       │   │   ├── 📄 RiskDashboard.tsx
│       │   │   ├── 📄 PatientList.tsx
│       │   │   ├── 📄 RiskDetail.tsx
│       │   │   └── 📄 CohortAnalysis.tsx
│       │   │
│       │   ├── 📁 Patient/         # Patient Portal
│       │   │   ├── 📄 RiskGauge.tsx
│       │   │   ├── 📄 SimpleReport.tsx
│       │   │   ├── 📄 ActionPlan.tsx
│       │   │   └── 📄 Progress.tsx
│       │   │
│       │   └── 📁 Common/          # Shared Components
│       │       ├── 📄 Header.tsx
│       │       ├── 📄 Footer.tsx
│       │       └── 📄 LoadingSpinner.tsx
│       │
│       ├── 📁 pages/               # Page Components
│       │   ├── 📄 ClinicianView.tsx
│       │   └── 📄 PatientView.tsx
│       │
│       ├── 📁 hooks/               # Custom Hooks
│       │   └── 📄 useRiskPrediction.ts
│       │
│       └── 📁 utils/               # Utilities
│           └── 📄 api.ts           # API client
│
├── 📁 ml-research/                 # ML Development
│   ├── 📄 train.py                 # Model training script
│   ├── 📄 evaluate.py              # Model evaluation
│   ├── 📄 requirements.txt         # ML dependencies
│   │
│   ├── 📁 notebooks/               # Jupyter Notebooks
│   │   ├── 📓 01_EDA.ipynb         # Exploratory analysis
│   │   ├── 📓 02_Modeling.ipynb    # Model development
│   │   └── 📓 03_Evaluation.ipynb  # Performance analysis
│   │
│   └── 📁 experiments/             # Experiment Logs
│       └── 📄 model_metrics.json
│
├── 📁 data/                        # Datasets
│   ├── 📊 diabetes_dataset.csv     # Training data (provided)
│   ├── 📊 synthetic_patients.csv   # Test data
│   └── 📊 population_stats.json    # Cohort statistics
│
├── 📁 docs/                        # Documentation
│   ├── 📄 ARCHITECTURE.md          # System design details
│   ├── 📄 API_SPEC.md              # API documentation
│   ├── 📄 MODEL_CARD.md            # Model specifications
│   ├── 📄 ETHICS_AND_LIMITATIONS.md # Safety considerations
│   ├── 📄 TEAM_ROLES.md            # Team structure
│   ├── 📄 TIMELINE.md              # Sprint planning
│   └── 📄 DEPLOYMENT.md            # Deployment guide
│
├── 📁 .github/                     # GitHub Configuration
│   └── 📁 workflows/
│       ├── 📄 backend-tests.yml    # Backend CI/CD
│       └── 📄 frontend-tests.yml   # Frontend CI/CD
│
├── 📄 docker-compose.yml           # Multi-container setup
├── 📄 .gitignore                   # Git ignore rules
├── 📄 README.md                    # This file
├── 📄 CONTRIBUTING.md              # Contribution guidelines
└── 📄 LICENSE                      # MIT License

📊 Expected Deliverables

🎯 Final Showcase Outputs

📦 1. Public GitHub Repository

Complete Source Code with Documentation

✅ Well-organized file structure
✅ Comprehensive README.md
✅ Code comments and docstrings
✅ Architectural diagrams
✅ API documentation (OpenAPI)
✅ Version control history

Repository Link: GitHub.com/YourTeam/clinical-risk-predictor

💻 2. Working Prototype

Full-Stack Application Demo

✅ FastAPI backend (deployed)
✅ React frontend (deployed)
✅ Clinician dashboard interface
✅ Patient portal interface
✅ Real-time risk predictions
✅ Interactive visualizations

Live Demo: app.clinical-risk.demo

🎥 3. Demo Video

5-7 Minute Walkthrough

✅ Problem statement explanation
✅ Solution architecture overview
✅ Live feature demonstration
✅ Key technical insights
✅ Impact and use cases
✅ Future roadmap

Video Link: YouTube/Product-Demo

📚 4. Comprehensive Documentation

Technical & Clinical Documentation

✅ MODEL_CARD.md — ML model details
✅ ETHICS_AND_LIMITATIONS.md — Safety analysis
✅ ARCHITECTURE.md — System design
✅ API_SPEC.md — Endpoint reference
✅ DEPLOYMENT.md — Setup guide
✅ Presentation slides (PDF)

📚 Documentation

📖 Available Documentation

🏗️ Architecture

System design, data flow, component interactions

🔌 API Reference

Endpoint documentation, request/response schemas

🤖 Model Card

ML model details, performance metrics

⚖️ Ethics & Safety

Bias analysis, limitations, safety guidelines

👥 Team Structure

Detailed role breakdown, deliverables

🚀 Deployment

Production setup, Docker guide

📄 License

MIT License — See LICENSE file for details

Ready to Transform Healthcare Through AI?
⭐ Star this repository • 🍴 Fork and contribute • 📧 Get in touch

Last Updated: January 2025 | Version: 1.0.0 | Status: 🚧 In Active Development

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
.github		.github
.vscode		.vscode
backend		backend
catboost_info		catboost_info
data		data
docker		docker
docs		docs
frontend		frontend
logs		logs
ml-research		ml-research
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

🏥 𝐀𝐬𝐭𝐫𝐚𝐌𝐞𝐝: 𝐂𝐥𝐢𝐧𝐢𝐜𝐚𝐥 𝐑𝐢𝐬𝐤 𝐀𝐈

Next-Generation Predictive Analytics & Decision Support System

📑 Table of Contents

🌍 Live Deployment

🎯 Problem Statement

🏥 Track 1: Clinical Decision Support

🔍 Context

⚠️ The Challenge

💡 What Our Solution Enables

🔬 For Clinicians

👤 For Patients

💡 Solution Architecture

🏗️ System Design

🔄 Data Flow Pipeline

🚀 Key Features

🎯 Core Capabilities

1️⃣ Risk Scoring & Stratification

2️⃣ Explainability & Transparency

3️⃣ Counterfactual Reasoning

4️⃣ AI-Powered Reports

5️⃣ Population Analytics

6️⃣ Pro Max UI/UX Experience

🧠 The Machine Learning Engine

🔬 Architecture: The "Tri-Force" Ensemble

🔍 Explainable AI (XAI) with SHAP

🔄 "What-If" Counterfactual Simulation

🛠️ Tech Stack

Backend Stack

Frontend Stack

ML & AI Stack

Frontend

Backend & ML

DevOps & Infrastructure

📊 Technology Matrix

👥 Team Structure

🎯 4-Member Multidisciplinary Team

🔬 ML Engineer

🎯 Responsibilities

📦 Deliverables

⚙️ Backend Engineer

🎯 Responsibilities

📦 Deliverables

👨‍⚕️ Frontend Engineer (Clinician)

🎯 Responsibilities

📦 Deliverables

👤 Frontend Engineer (Patient)

🎯 Responsibilities

📦 Deliverables

📅 Development Timeline

📋 Detailed Sprint Plan

🗓️ Week 1: Design & Core Model (by Jan 24)

🗓️ Week 2: Full Stack Development (by Jan 31)

🗓️ Week 3: Polish & Submission (by Feb 9)

⚡ Quick Start

📋 Prerequisites

🐍 Backend Setup

⚛️ Frontend Setup

🧠 ML Model Training

🐳 Docker Deployment (Recommended)

📦 Project Structure

📊 Expected Deliverables

🎯 Final Showcase Outputs

📦 1. Public GitHub Repository

💻 2. Working Prototype

🎥 3. Demo Video

📚 4. Comprehensive Documentation

📚 Documentation

📖 Available Documentation

🏗️ Architecture

🔌 API Reference

🤖 Model Card

⚖️ Ethics & Safety

👥 Team Structure

🚀 Deployment

📄 License

About

Packages