You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2. Create missingness flags for each vital (before fills).
273
-
3. LOCF forward-fill per subject (optionally backward-fill for initial missingness or leave as NaN), do not use population median.
274
-
4. Create carried-forward flags (binary indicator - 1 if the value came from LOCF). Helps ML distinguish between observed vs assumed stable, exploit missingness patterns (e.g. vitals measured more frequently when patients deteriorate).
275
-
5.**Compute rolling windows (1h, 4h, 24h)**: mean,min,max,std,count,slope,AUC.
276
-
6. Compute time since last observation (`time_since_last_obs`) for each vital (staleness).
277
-
7. Convert textual escalation/risk labels → numeric ordinal encoding (Low=0, Low-Medium=1, Medium=2, High=3) for ML. Keeps things simple - one column, easy to track in feature importance
2. Create missingness flags for each vital (before fills).
273
+
3. LOCF forward-fill per subject (optionally backward-fill for initial missingness or leave as NaN), do not use population median.
274
+
4. Create carried-forward flags (binary indicator - 1 if the value came from LOCF). Helps ML distinguish between observed vs assumed stable, exploit missingness patterns (e.g. vitals measured more frequently when patients deteriorate).
275
+
5.**Compute rolling windows (1h, 4h, 24h)**: mean,min,max,std,count,slope,AUC.
276
+
6. Compute time since last observation (`time_since_last_obs`) for each vital (staleness).
277
+
7. Convert textual escalation/risk labels → numeric ordinal encoding (Low=0, Low-Medium=1, Medium=2, High=3) for ML. Keeps things simple - one column, easy to track in feature importance
2.**Group by patient**: Aggregate vitals per patient timeline (median, mean, min, max per vital).
291
-
3.**Median imputation**: Fill missing values for each vital using patient-specific median (so their profile isn’t biased by others), if a patient never had a vital recorded, fall back to population median.
292
-
4.**% Missing per vital**: Track proportion of missing values per vital before imputation (HR missing in 30% of their rows = 0.3), missingness itself may signal clinical patterns (e.g. some vitals only measured in deteriorating patients).
293
-
5.**Encode risk/escalation labels**: Ordinal encoding (Low=0, Low-Medium=1, Medium=2, High=3), calculate summary stats per patient: max risk (highest escalation they reached), median risk (typical risk level), % time at High risk (what fraction of their trajectory was spent here).
294
-
6.**Output**: news2_features_patient.csv (compact, one row per patient, ML-ready summary).
288
+
**Pipeline (make_patient_features.py)**:
289
+
1. Start from news2_scores.csv.
290
+
2.**Group by patient**: Aggregate vitals per patient timeline (median, mean, min, max per vital).
291
+
3.**Median imputation**: Fill missing values for each vital using patient-specific median (so their profile isn’t biased by others), if a patient never had a vital recorded, fall back to population median.
292
+
4.**% Missing per vital**: Track proportion of missing values per vital before imputation (HR missing in 30% of their rows = 0.3), missingness itself may signal clinical patterns (e.g. some vitals only measured in deteriorating patients).
293
+
5.**Encode risk/escalation labels**: Ordinal encoding (Low=0, Low-Medium=1, Medium=2, High=3), calculate summary stats per patient: max risk (highest escalation they reached), median risk (typical risk level), % time at High risk (what fraction of their trajectory was spent here).
294
+
6.**Output**: news2_features_patient.csv (compact, one row per patient, ML-ready summary).
295
295
296
-
-**Rationale**:
297
-
- Median imputation preserves patient-specific patterns without introducing bias from other patients.
298
-
- % Missing captures signal from incomplete measurement patterns.
299
-
- Ordinal risk encoding simplifies downstream ML model input while retaining interpretability. Together, these three summary features summarise a patient’s escalation profile across their stay. Proportion features (like % high) are standard numeric features (not encoded categories).
300
-
- This is enough for model; don’t need optional metrics like streaks, AUC, or rolling windows for the patient summary.
296
+
**Rationale**:
297
+
- Median imputation preserves patient-specific patterns without introducing bias from other patients.
298
+
- % Missing captures signal from incomplete measurement patterns.
299
+
- Ordinal risk encoding simplifies downstream ML model input while retaining interpretability. Together, these three summary features summarise a patient’s escalation profile across their stay. Proportion features (like % high) are standard numeric features (not encoded categories).
300
+
- This is enough for model; don’t need optional metrics like streaks, AUC, or rolling windows for the patient summary.
0 commit comments