|
1 | | -# Time-Series ICU Patient Deterioration Predictor 📉⏳ |
2 | | - |
3 | | -Early warning system predicting ICU patient deterioration on MIMIC-IV Clinical Demo v2.2 dataset (100 patients), framework comparing LightGBM vs Temporal Convolutional Network across 3 targets: peak deterioration (max_risk), typical risk (median_risk), and proportion of admission in high-risk states (pct_time_high). |
4 | | - |
5 | | -Temporal and aggregated feature engineering with clinical-validity-aware missing data handling, using custom NEWS2-derived ground-truth values (GCS/LOC mapping, supplemental O₂ and CO₂ retainer custom logic). |
6 | | - |
7 | | -TCN trained on 171 temporal-features x 96 timestamps, 24hr rolling windows across 8 vital parameters, with 3-layer TemporalBlock stack, kernel=3, dropout=0.2, head_hidden=64, batch=32, 50 epochs, early stopping at epoch 10; |
8 | | - |
9 | | -LightGBM trained on 40 aggregated patient-features with 5-fold stratified CV, with hyperparameter tuning. |
10 | | - |
11 | | -TCN greater sensivity on max_risk (AUC +9.3%, AP +1.25%), LightGBM greater reliability and calibration on median_risk (AUC +17%, Brier ↓68%, ECE ↓63%) and more precise pct_time_high (RMSE ↓32%, R² +44%, residual SD ↓42%). |
12 | | - |
13 | | -evaluating interpretability (SHAP) versus saliency-based explanations for clinical adoption. |
14 | | - |
15 | | -Deployed, reproducible auditable pipeline with deployment-lite, and full documentation for clinical validation. |
16 | | - |
17 | | -TCN_refined captures short-term acute events with rapid early detection, whereas LightGBM provides robust, calibrated estimates of sustained deterioration exposure; supports ICU triage, continuous monitoring, and escalation decisions with quantified, actionable confidence. |
18 | | -TCN excels at short-term acute events with rapid detection while LightGBM provides reliable long-term estimates of sustained deterioration exposure, suggesting ensemble approach for production deployment. |
19 | | - |
20 | | -Portfolio-ready, deployed, and clinically-informed. |
21 | | - |
22 | | -**Tech stack**: python, pandas, NumpPy, LightGBM, PyTorch |
23 | | - |
24 | | -**Pipeline** |
25 | | -```text |
26 | | -Raw ICU Vitals (long format, MIMIC-style) |
27 | | - └─> compute_news2.py |
28 | | - ├─ Input: raw vitals CSV |
29 | | - ├─ Action: compute NEWS2 scores per timestamp |
30 | | - └─ Output: news2_scores.csv (wide format with vitals, NEWS2 score, escalation labels), news2_patient_summary.csv (patient-level summary) |
31 | | -
|
32 | | -news2_scores.csv |
33 | | - └─> make_timestamp_features.py |
34 | | - ├─ Action: |
35 | | - │ ├─ Aggregate per patient |
36 | | - │ ├─ Add missingness flags |
37 | | - │ ├─ Apply LOCF per vital |
38 | | - │ ├─ Compute carried-forward flags |
39 | | - │ ├─ Compute rolling window stats (1h/4h/24h) |
40 | | - │ ├─ Compute time-since-last-observation |
41 | | - │ └─ Encode risk/escalation as ordinal numeric |
42 | | - └─ Output: news2_features_timestamp.csv |
43 | | - (ML-ready timestamp-level features) |
44 | | -
|
45 | | -news2_scores.csv |
46 | | - └─> make_patient_features.py |
47 | | - ├─ Action: |
48 | | - │ ├─ Aggregate per patient |
49 | | - │ ├─ Compute median, mean, min, max per vital |
50 | | - │ └─ Include % missingness per vital |
51 | | - └─ Output: news2_features_patient.csv |
52 | | - (ML-ready patient-level summary features) |
53 | | -``` |
54 | | - |
55 | | -# Timestamp features rationale |
56 | | -- We compute rolling window features over 1h, 4h, and 24h intervals. |
57 | | - - Mean, min, max capture the magnitude and variability of vitals. |
58 | | - - Slope gives the trend — whether the vital is rising or falling and how fast. |
59 | | - - AUC measures cumulative exposure, i.e., how much and for how long a patient has experienced abnormal values. |
60 | | -- These features provide temporal context for the ML model, so it doesn’t just see isolated values but also their trajectory over time. |
61 | | - |
62 | | - |
63 | | -# LightGBM vs Neural Network (TCN) Pipeline |
64 | | -```text |
65 | | -ML Model (LightGBM) |
66 | | - ├─ Input: news2_features_patient.csv |
67 | | - │ ├─ Median, mean, min, max per vital |
68 | | - │ ├─ Impute missing values |
69 | | - │ ├─ % missing per vital |
70 | | - │ └─ Risk summary stats (max, median, % time at high risk) |
71 | | - ├─ Action: |
72 | | - │ ├─ Train predictive model for deterioration / escalation |
73 | | - │ ├─ Use timestamp trends + missingness flags |
74 | | - │ └─ Evaluate performance (AUROC, precision-recall, etc.) |
75 | | - └─ Output: predictions, feature importances, evaluation metrics |
76 | | -
|
77 | | -ML Model (Neural Network, TCN) |
78 | | - ├─ Input: news2_features_timestamp.csv |
79 | | - │ ├─ Timestamp-level vitals & rolling features (mean, min, max, std, slopes, AUC) |
80 | | - │ ├─ Missingness flags |
81 | | - │ ├─ Carried-forward flags |
82 | | - │ └─ Time since last observation |
83 | | - ├─ Action: |
84 | | - │ ├─ Train predictive model for deterioration / escalation |
85 | | - │ ├─ Learn temporal patterns, trends, and interactions |
86 | | - │ ├─ Can handle sequences of variable length per patient |
87 | | - │ └─ Evaluate performance (AUROC, precision-recall, calibration) |
88 | | - └─ Output: |
89 | | - ├─ Predictions per timestamp or per patient |
90 | | - ├─ Learned feature embeddings / attention weights (if applicable) |
91 | | - └─ Evaluation metrics |
92 | | -``` |
93 | | - |
94 | | -# LightGBM vs Neural Network (TCN) Pipeline Visualisation |
95 | | -```text |
96 | | - Raw EHR Data (vitals, observations, lab results) |
97 | | - │ |
98 | | - ▼ |
99 | | -Timestamp Feature Engineering (news2_scores.csv) |
100 | | - - Rolling statistics (mean, min, max, std) |
101 | | - - Slopes, AUC, time since last observation |
102 | | - - Imputation & missingness flags |
103 | | - │ |
104 | | - ├─────────────► TCN Neural Network Model (v2) |
105 | | - │ - Input: full time-series per patient |
106 | | - │ - Can learn temporal patterns, trends, dynamics |
107 | | - │ |
108 | | - ▼ |
109 | | -Patient-Level Feature Aggregation (make_patient_features.py → news2_features_patient.csv) |
110 | | - - Median, mean, min, max per vital |
111 | | - - % missing per vital |
112 | | - - Risk summary stats (max, median, % time at high risk) |
113 | | - - Ordinal encoding for risk/escalation |
114 | | - │ |
115 | | - ▼ |
116 | | -LightGBM Model (v1) |
117 | | - - Input: one row per patient (fixed-length vector) |
118 | | - - Uses aggregated statistics only |
119 | | - - Cannot handle sequences or variable-length time series |
120 | | -``` |
121 | | - |
122 | | - # Model Comparison: LightGBM vs Neural Network (V1 & V2) |
123 | | - |
124 | | -| Aspect | LightGBM (V1) | Temporal Convolutional Network (TCN) (V2) | |
125 | | -|--------|-------------------|-------------------| |
126 | | -| **ML Model Name / Type** | LightGBM (Gradient Boosted Decision Trees) | Temporal Convolutional Network (TCN)(Neural network) | |
127 | | -| **V1 / V2** | V1: uses patient-level features, baseline interpretable patient summary (classic tabular ML) | V2: uses timestamp-level features, advanced sequence modeling (modern deep learning) | |
128 | | -| **Input Datasets** | `news2_features_patient.csv` (patient-level summaries) | `news2_features_timestamp.csv` (time series of vitals, missingness flags) | |
129 | | -| **Optional Inputs** | Timestamp features could be added later for hybrid model | Patient-level summary features from `news2_features_patient.csv` can be appended but not mandatory | |
130 | | -| **Reason for this input choice** | LightGBM is a tree-based model: handles static features and aggregates well; does not naturally model temporal sequences | Neural networks (LSTM/TCN) can model temporal trends, sequences, and interactions over time; need full timestamp features to exploit sequential information | |
131 | | -| **Why two different models** | LightGBM: fast, interpretable (feature importance), strong baseline. | Neural network: captures temporal dynamics, can potentially improve predictive performance on time-series deterioration<br>Complements LightGBM; addresses potential limitations of static patient summaries by using sequential information in timestamp features | |
132 | | -| **Strengths** | - Handles missing values gracefully.<br>- Fast training and inference.<br>- Provides feature importances.<br>- Works well with tabular summary features. | - Models temporal trends and interactions.<br>- Can capture subtle patterns in sequences of vitals.<br>- Potentially better performance on real-time deterioration prediction. | |
133 | | -| **Weaknesses / Limitations** | - Ignores sequence and timing of events.<br>- May lose some granularity of patient trajectory.<br>- Cannot capture interactions over time. | - Requires more computation and tuning.<br>- Harder to interpret.<br>- Sensitive to missing data; requires careful imputation or masking. | |
134 | | -| **Output** | Predictions per patient, feature importances, evaluation metrics (AUROC, PR-AUC, etc.) | Predictions per timestamp or per patient trajectory, evaluation metrics (AUROC, PR-AUC, potentially time-dependent metrics) | |
135 | | -| **Use case / Deployment** | Baseline model; interpretable; fast deployment; can be used for early warning systems using summary features | Advanced model for final deployment or v2 experimentation; may be integrated in real-time monitoring dashboards for continuous deterioration prediction | |
136 | | - |
137 | | - |
138 | | -Portfolio narrative framing (objective and honest) |
139 | | - |
140 | | -Here’s how you can present this: |
141 | | - 1. State the limitation upfront: |
142 | | - • “Synthetic dataset contains very few high-risk events; patient-level deterioration classification targets were largely zero. Standard classification tasks were infeasible.” |
143 | | - 2. Pivot your narrative to learnable outcomes: |
144 | | - • LightGBM: Predict patient-level NEWS2 / continuous risk burden, analyze feature importances to show clinical insights. |
145 | | - • TCN: Predict timestamp-level NEWS2 trends to capture dynamic risk evolution. |
146 | | - 3. Metrics and comparison: |
147 | | - • Report regression metrics (RMSE, R², MAE). |
148 | | - • Compare to simple baselines (mean NEWS2, last observation carried forward) to show your model improves predictive performance. |
149 | | - • Highlight trend detection and feature influence, which is a clinically relevant skill. |
150 | | - 4. Why this is still strong for a portfolio: |
151 | | - • Demonstrates data wrangling, preprocessing, CV, feature engineering, ML pipeline, model selection, hyperparameter tuning, and neural networks. |
152 | | - • Shows clinical insight (feature importance, temporal trends). |
153 | | - • Recruiters and technical reviewers care about how you solved real-world limitations, not just “predicted rare events.” |
| 1 | +# Time-Series ICU Patient Deterioration Predictor |
| 2 | + |
| 3 | +## *Hybrid Machine Learning System for Early Warning in Critical Care* |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Executive Summary |
| 8 | + |
| 9 | +**Tech stack:** *Python, pandas, NumPy, LightGBM, PyTorch, Scikit-learn* |
| 10 | + |
| 11 | +This project implements a dual-architecture early warning system comparing gradient-boosted decision trees (LightGBM) against temporal convolutional networks (TCN) for predicting ICU patient deterioration, across three risk horizons (maximum risk atained, average sustained risk, % time spent in high risk). Built on MIMIC-IV Clinical Demo v2.2 dataset (100 patients), the system processes 171 temporal features across 24-hour windows and 40 aggregated patient-level features, to support continuous monitoring and escalation decisions. |
| 12 | + |
| 13 | +The hybrid approach reveals complementary strengths: LightGBM achieves superior calibration and regression fidelity (68% Brier reduction, +17% AUC, +44% R²) for sustained risk assessment, while TCN demonstrates stronger acute event discrimination (+9.3% AUC, superior sensitivity) for detecting rapid deterioration. |
| 14 | + |
| 15 | +The complete pipeline includes NHS-validated NEWS2 preprocessing with CO₂ retainer logic, GCS mapping, and supplemental O₂ protocols; extensive evaluation metrics and model-specific interpretability methods for clinical validation (SHAP for LightGBM, absolute gradient×input saliency for TCN); and a deployment-ready dual inference system (batch and per-patient) for end-to-end usability. |
| 16 | + |
| 17 | +**Key Contributions:** |
| 18 | +- Clinical validity pipeline with robust NEWS2 computation |
| 19 | +- Dual feature engineering (patient-level vs timestamp) for both classical and deep learning models |
| 20 | +- Duel model training with hyperparameter tuning |
| 21 | +- Rigorous refinement and model evaluation |
| 22 | +- Transparent interpretability validated against domain knowledge |
| 23 | +- Deployment-lite inference pipeline demonstrating end-to-end usability |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +## Table of Contents |
| 28 | +1. [Introduction](#introduction) |
| 29 | +2. [Clinical Motivation](#clinical-motivation) |
| 30 | +3. [Data Pipeline Overview](#data-pipeline-overview) |
| 31 | +4. [Phase 1: CO₂ Retainer Identification & NEWS2 Tracker](#phase-1-co2-retainer-identification--news2-tracker) |
| 32 | +5. [Phase 2: ML-Ready Feature Engineering](#phase-2-ml-ready-feature-engineering) |
| 33 | +6. [Phase 3: LightGBM Training & Validation](#phase-3-lightgbm-training--validation) |
| 34 | +7. [Next Steps](#next-steps) |
| 35 | + |
| 36 | +--- |
| 37 | + |
| 38 | + |
| 39 | +## 1. Clinical Background & Motivation |
| 40 | + |
| 41 | +### The Problem |
| 42 | +ICU patient deterioration manifests through subtle vital sign changes hours before critical events. The National Early Warning Score 2 (NEWS2) is widely used in UK hospitals to detect and escalate care for deteriorating patients. Accurate, real-time scoring and risk stratification can: |
| 43 | +- Enable earlier intervention and ICU escalation |
| 44 | +- Support clinical decision-making with actionable, interpretable metrics |
| 45 | +- Provide a foundation for advanced ML models to improve patient outcomes |
| 46 | + |
| 47 | +Although NEWS2 is the national standard for deterioration detection, it has well-recognised constraints: |
| 48 | +- **No true temporal modelling:** Although observations are charted sequentially, the scoring algorithm treats each set of vitals independently and does not incorporate trend, slope, variability, or rate-of-change. |
| 49 | +- **Discrete scoring limitations:** NEWS2 discretises continuous physiological signals into coarse bands and does not model interactions between multiple variables, which limits sensitivity to subtle multivariate deterioration patterns. |
| 50 | +- **Escalation overload:** Threshold-based scoring generates many false positives in elderly and multimorbid cohorts, contributing to alert burden and escalation fatigue. |
| 51 | +- **Limited predictive horizon:** NEWS2 typically identifies deterioration only after thresholds are crossed, offering limited early-warning capability compared with models that can detect sub-threshold physiological drift. |
| 52 | + |
| 53 | +### Clinical Escalation Context |
| 54 | +NEWS2 scoring bands map directly to clinical monitoring frequency and escalation actions; these operational consequences define the clinical targets we aim to predict: |
| 55 | + |
| 56 | +| NEWS2 Score. | Clinical Risk | Monitoring Frequency | Clinical Response | |
| 57 | +|-----------------------------------|---------------|--------------------------------------------------------|------------------------------------------------------------------------------------| |
| 58 | +| **0** | Low | Minimum every **12 hours** | Routine monitoring by registered nurse. | |
| 59 | +| **1–4** | Low | Minimum every **4–6 hours** | Nurse to assess need for change in monitoring or escalation. | |
| 60 | +| **Score of 3 in any parameter** | Low–Medium | Minimum every **1 hour** | **Urgent** review by ward-based doctor to decide monitoring/escalation. | |
| 61 | +| **5–6** | Medium | Minimum every **1 hour** | **Urgent** review by ward-based doctor or acute team nurse; consider critical care team review. | |
| 62 | +| **≥7** | High | **Continuous** monitoring | **Emergent** assessment by clinical/critical-care team; usually transfer to HDU/ICU. | |
| 63 | + |
| 64 | +#### Why this matters |
| 65 | +- Transitions between risk bands (especially into medium or high) drive clinical workload and resource allocation, including urgent reviews and ICU involvement. |
| 66 | +- Predicting imminent transitions into these categories (e.g., entering high risk within the next 4–6 hours) enables earlier intervention, reducing delayed escalations and improving critical-care resource planning. |
| 67 | + |
| 68 | +#### Why NEWS2 is used as the reference standard |
| 69 | +- NEWS2 is the nationally accepted standard for ward-based clinical deterioration assessment. Using it as the ground-truth ensures that ML models are trained and evaluated against a clinically validated reference. |
| 70 | +- ML models predict summary outcomes derived from NEWS2 clinical-risk categories: |
| 71 | + - `max_risk`: Maximum risk attained during the observation window |
| 72 | + - `median_risk`: Average sustained risk across the stay |
| 73 | + - `pct_time_high`: Percentage of time spent in high-risk state |
| 74 | +- Evaluating ML predictions against these NEWS2-derived outcomes allows assessment of **predictive horizon**, **sensitivity**, and the ability to anticipate **clinically actionable deterioration trends** before standard escalation would occur. |
| 75 | + |
| 76 | +### Why Machine Learning? |
| 77 | +ICU deterioration is complex and often subtle, involving multivariate temporal patterns that standard threshold-based systems cannot fully capture. ML models allow us to go beyond static scoring by predicting summary outcomes derived from NEWS2 clinical-risk categories. |
| 78 | + |
| 79 | +#### LightGBM (classical, non-temporal ML) |
| 80 | +- LightGBM, a gradient-boosted decision tree (GBDT) algorithm, provides a strong baseline for tabular clinical data |
| 81 | +- Captures nonlinear interactions between vital signs |
| 82 | +- Fast to train and tune, handles missing data robustly |
| 83 | +- Highly interpretable via SHAP |
| 84 | +- Often competitive or superior when temporal structure is weak |
| 85 | + |
| 86 | +#### Temporal Convolutional Network (TCN) (temporal deep learning) |
| 87 | +- TCN captures time-dependent patterns, slopes, and variability |
| 88 | +- Models long-range temporal context |
| 89 | +- Robust to irregular sampling |
| 90 | +- Potentially detects subtle deterioration earlier than threshold-based approaches |
| 91 | + |
| 92 | +#### Why compare both |
| 93 | +- LightGBM provides a robust classical-ML baseline for tabular clinical data. |
| 94 | +- TCN evaluates whether temporal modelling yields measurable gains by capturing sequential patterns and slopes in vital signs. |
| 95 | +- This comparison reflects realistic deployment: classical ML may suffice for lower-frequency ward data, whereas temporal models exploit high-resolution ICU monitoring to detect early deterioration. |
| 96 | +- The evaluation clarifies where temporal modelling adds value, where classical ML is sufficient, and the trade-offs between interpretability and predictive performance. |
| 97 | + |
| 98 | +This project therefore systematically evaluates temporal vs. non-temporal ML approaches for predicting ICU deterioration, using clinically meaningful NEWS2-derived summary outcomes as targets. |
0 commit comments