You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implemented Steps 2–4 (missingness, LOCF, carried-forward flags) and drafted Step 5 rolling features
• Added add_missingness_flags() to create *_missing columns for each vital.
• Implemented apply_locf() with forward-fill + backfill per (subject_id, stay_id).
• Added add_carried_forward_flags() using missingness flags to distinguish true vs imputed values.
• Began add_rolling_features() to compute rolling-window stats (mean, min, max, std, slope, AUC) for 1h/4h/24h windows on numeric vitals.
• Verified output with df.head() checks after each step.
Copy file name to clipboardExpand all lines: notes.md
+64-1Lines changed: 64 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -427,4 +427,67 @@ Only Step 1 was implemented today; Steps 2–8 remain.
427
427
- Add `_missing` columns for each vital before LOCF.
428
428
- Confirm flags align with actual NaNs.
429
429
- If possible, progress into **Step 3 (LOCF imputation)** and **Step 4 (Carried-forward flags)**.
430
-
- Keep using small previews (`.head()`, `.isna().sum()`) to verify correctness.
430
+
- Keep using small previews (`.head()`, `.isna().sum()`) to verify correctness.
431
+
432
+
---
433
+
434
+
## Day 5 Notes - Missingness, Carried-Forward Flags & Rolling Features
435
+
436
+
### Goals
437
+
- Continue building `make_timestamp_features.py` pipeline.
438
+
-**Extend Step 2 → Step 5**:
439
+
-**Step 2**: Add missingness flags.
440
+
-**Step 3**: Apply forward-filling (LOCF).
441
+
-**Step 4**: Add carried-forward flags.
442
+
-**Step 5**: Start rolling window features (mean, min, max, std, slope, AUC).
443
+
444
+
### What We Did
445
+
#### Step 2: Missingness Flags
446
+
- Implemented `add_missingness_flags(df)` to generate new columns like `respiratory_rate_missing`, `spo2_missing`, etc.
447
+
-**Logic**: for each vital, `df[v].isna().astype(int)` creates a flag column where `1 = missing` and `0 = observed`.
448
+
- Called after loading + sorting the CSV with `load_and_sort_data(INPUT_FILE)`.
449
+
- Verified output by printing `df.head()`.
450
+
#### Step 3: LOCF (Forward- and Back-Fill)
451
+
- Wrote `apply_locf(df)` to handle missing values by carrying the last observed measurement forward (`ffill`) within each patient stay (`groupby(['subject_id', 'stay_id'])`).
452
+
- Added an extra `.bfill()` so the very first row of each stay (if missing) is backfilled with the next available measurement.
453
+
- Ensures no missing values remain for the chosen vitals.
454
+
#### Step 4: Carried-Forward Flags
455
+
- Added `add_carried_forward_flags(df)` to track which values in the filled dataset are real vs imputed.
456
+
- Used missingness flags from Step 2 as ground truth:
457
+
- Carried = `value is not NaN after fill`**AND**`was missing before fill`.
458
+
- Output = new columns like `respiratory_rate_carried`, `spo2_carried`, etc.
459
+
- This avoids the problem of falsely marking naturally repeated values as carried-forward.
460
+
#### Step 5: Rolling Features (in progress)
461
+
- Started `add_rolling_features(df)` to compute rolling-window statistics on numeric vitals (`respiratory_rate`, `spo2`, `temperature`, `systolic_bp`, `heart_rate`).
0 commit comments