CardioSense AI is a multi-layered Clinical Decision Support System (CDSS) designed for high-performance cardiovascular risk assessment with a focus on trust, interpretability, and safety.
The system follows a decoupled architecture where the Core Intelligence Layer is wrapped by a Production API (FastAPI) and served through a Clinical Dashboard (Streamlit).
graph TB
subgraph "Frontend Layer (Streamlit)"
UI[Clinical Dashboard]
Report[PDF Report Generator]
end
subgraph "Backend Layer (FastAPI)"
API[RESTful API]
Auth[Security & Audit Gate]
end
subgraph "Core Intelligence Layer (Python)"
Predictor[HeartDiseasePredictor]
Explainer[SHAP Explainability Engine]
Safety[Safety & Confidence Engine]
Simulator[Intervention Simulator]
Recommender[Clinical Recommendation Engine]
Audit[Bandit/Safety Audit Layer]
end
subgraph "Data & Artifacts"
Model[(XGBoost Model)]
Meta[(Model Metadata)]
Hist[(Inference History SQLite)]
Data[(UCI Patient Data)]
end
UI <--> API
API --> Predictor
UI --> Predictor
Predictor --> Model
Explainer --> Predictor
Safety --> Predictor
Simulator --> Model
Recommender --> Explainer
Report --> Predictor
API --> Hist
UI --> Hist
Audit -.-> API
Audit -.-> UI
We employ XGBoost as the primary engine, optimized via Optuna and supported by a Production Preprocessing Pipeline to ensure medical-grade accuracy and inference stability.
- Robust Feature Engineering:
- Numerical Normalization:
StandardScaleris applied to all continuous vitals (age,trestbps,chol,thalach,oldpeak) to prevent feature-dominance and ensure gradient stability. - Categorical Encoding:
OneHotEncoder(drop='if_binary')converts clinical categorical markers (sex,cp,fbs,restecg,exang,slope,ca,thal) into a sparse, machine-readable format.
- Numerical Normalization:
- Pipeline Orchestration: The entire transformation is wrapped in a Scikit-Learn
Pipeline. This ensures that the exact same mathematical shifts are applied during real-time inference as were used during training, eliminating training-serving skew.
graph LR
A[(Raw Clinical Data)] --> B[ColumnTransformer]
B --> C[StandardScaler / OHE]
C --> D{Optuna Meta-Learner}
D --> E[XGBoost Hyper-Tuning]
E --> F[Cross-Validation]
F --> D
D --> G[Optimized Model / .joblib]
D --> H[Fitted Preprocessor / .joblib]
style D fill:#f9f,stroke:#333,stroke-width:2px
This sequence illustrates the path from a patient profile to a "What-If" intervention simulation.
sequenceDiagram
participant C as Clinician
participant U as UI (Streamlit)
participant S as Safety Engine
participant M as Model (XGBoost)
participant X as Explainability (SHAP)
participant I as Intervention Engine
C->>U: Input Patient Vitals
U->>S: Run OOD & Guardrail Checks
S-->>U: Confidence & Safety Alerts
U->>M: Request Risk Probability
M-->>U: 92% Risk Score
U->>X: Generate Driver Analysis
X-->>U: Waterfall SHAP Plot
C->>I: Simulate BP reduction (-20 mmHg)
I->>M: Re-predict with modified vitals
M-->>I: 75% Risk Score
I-->>U: Risk Delta: -17%
In medical AI, "Black Box" models are unusable. We implement four layers of trust:
- Clinical Overrides: Hard-stop medical rules based on ACC/AHA guidelines (e.g., Hypertensive Crisis, Ischemia detected in ECG) that trigger alerts regardless of AI probability.
- Out-of-Distribution (OOD) Monitoring: Compares input data against the statistical bounds of the training set (e.g., age ranges, BP maximums).
- Entropy-Based Confidence: Calculates the mathematical uncertainty of the model's output, labeled as High, Moderate, or Low based on probability distribution clusters.
- Audit Hashes: Every inference result is cryptographically linked to the model version and timestamp (
usedforsecurity=Falseflagged for audit compliance). - Adaptive Monitoring Gateway: Implements a
MonitoringEnginethat tracks Data Drift (using Evidently AI) and Performance Decay (Concept Drift). The engine uses an Adaptive Search pattern to maintain compatibility across varying host environments. - Persistent Telemetry: All inference events are logged to a local SQLite database, enabling the clinical dashboard to run longitudinal drift analysis and performance audits.
The Risk Optimization Engine moves beyond simple simulation to find the Least Effort Path to clinical stability.
- Clinical Cost-Weights: Each modifiable factor is assigned a "difficulty" (e.g., Blood Pressure: 1.0 vs. Max Heart Rate: 2.0) representing lifestyle feasibility.
- Greedy Optimization: The core algorithm identifies which changes yield the greatest risk reduction relative to their "effort."
- Roadmap Generation: Converts numerical optima into a prioritized, actionable clinical sequence.
We utilize a dual-engine interpretability layer to ensure every prediction is explainable from multiple mathematical perspectives:
- SHAP (SHapley Additive exPlanations): Provides globally consistent local risk attribution using game-theoretic Shapley values.
- Local Explanations: Waterfall plots showing exactly how each vital contributed to a specific patient's risk.
- LIME (Local Interpretable Model-agnostic Explanations): Provides a local "linear surrogate" that approximates the complex model around a specific patient's data point.
- Sensitivity Analysis: Reveals how small changes in patient vitals would affect the model's confidence, identifying the most "fragile" risk factors.
- Model Reasoning Layer: NLP-driven summarization of both SHAP and LIME signals to provide a plain-text "Physician's Summary" of the AI's logic.
To meet modern medical-legal standards for AI, the training pipeline includes a Bias Audit Layer:
- Slicing: After the XGBoost model is optimized, the validation set is sliced into protected and clinical subgroups (e.g.,
Sex,Age_LT45,Age_GT65). - Parity Metrics: The system calculates specialized metrics for each slice:
- Recall Parity: Ensuring high sensitivity across all groups to prevent false negatives in vulnerable populations.
- Precision Stability: Monitoring for consistent diagnostic quality.
- Reporting: These metrics are baked into the
model_metadata.jsonand surfaced in the System Integrity dashboard for clinical transparency.
CardioSense AI adheres to modern security standards for medical software:
- Static Analysis (SAST): Integrated Bandit scanning to identify and mitigate insecure Python code patterns (e.g., shell injection, weak hashes).
- Dependency Auditing: Integrated Safety scans to ensure all third-party libraries in
requirements.txtare free from known CVEs. - Hardened API: Configurable
HOSTandPORTvia environment variables to prevent accidental exposure of the inference gateway. - Middleware Integrity: Request-ID injection ensures full traceability from the frontend to the backend logs.
api/: Production FastAPI gateway and middleware.app/: Clinical Streamlit dashboard and UI logic.src/models/: Training and real-time inference wrappers.src/explainability/: Logic for SHAP values and model reasoning.src/simulation/: The cost-weighted Risk Optimization Engine.src/recommendation/: Pattern-based medical advice generation.src/utils/: Safety engines, PDF report orchestration, and logging.tests/: Multi-modal test suite (Clinical, API, and Inference).