|
| 1 | +# EXP-2859 — Bootstrap Confidence Replaces Boolean Simpson Flag (2026-04-22) |
| 2 | + |
| 3 | +**Stream**: B (operational) — also methodological for Stream A |
| 4 | +**Predecessor**: EXP-2853 (point Simpson), EXP-2856 (rolling stability), EXP-2858 (no flip drivers) |
| 5 | +**Productionized**: ✅ `p_simpson` field + 3 new severity rules |
| 6 | + |
| 7 | +## Headline |
| 8 | + |
| 9 | +Block bootstrap (N=200, 48h block size) over per-patient β_fast and |
| 10 | +β_slow gives **explicit confidence** that the noisy boolean Simpson |
| 11 | +flag was missing: |
| 12 | + |
| 13 | +| Cohort | n | |
| 14 | +|--------|---| |
| 15 | +| **High-confidence Simpson** (P ≥ 0.9) | **2/26** | |
| 16 | +| **High-confidence non-Simpson** (P ≤ 0.1) | **12/26** | |
| 17 | +| **Boundary / uncertain** (0.1 < P < 0.9) | **12/26** | |
| 18 | + |
| 19 | +The boolean EXP-2853 Simpson flag tagged 9/29 patients as Simpson. |
| 20 | +The bootstrap shows **only 2 of those are statistically robust** — |
| 21 | +the rest sit near the regime boundary with median P=0.76 (still |
| 22 | +"more likely than not" but far from confident). EXP-2856 saw this |
| 23 | +as agreement-fraction; bootstrap quantifies it as a probability. |
| 24 | + |
| 25 | +## Method |
| 26 | + |
| 27 | +Block bootstrap to preserve within-window correlation: |
| 28 | + |
| 29 | +1. Slice each patient's data into non-overlapping 48h chunks |
| 30 | + `(WIN_SIZE = 48 × 12)`. |
| 31 | +2. Resample chunks with replacement N=200 times. |
| 32 | +3. For each replicate, compute β_fast (5-min OLS over flattened |
| 33 | + chunks) and β_slow (OLS over per-chunk means). |
| 34 | +4. P(simpson) = fraction of replicates with `sign(β_fast) ≠ sign(β_slow)` |
| 35 | + AND both magnitudes > 1e-6. |
| 36 | + |
| 37 | +Block bootstrap is essential — naive sample-with-replacement of |
| 38 | +5-min rows would destroy the slow-window structure (β_slow is |
| 39 | +defined on 48h means). |
| 40 | + |
| 41 | +## Results |
| 42 | + |
| 43 | +- N=26 patients with ≥7 chunks (336h ≈ 14 days) of data. |
| 44 | +- Median P(simpson) overall: 0.16 |
| 45 | +- Median P(simpson) | point Simpson = True: **0.76** |
| 46 | +- Median P(simpson) | point Simpson = False: **0.01** |
| 47 | + |
| 48 | +The point flag and bootstrap agree on direction (median P 0.76 vs |
| 49 | +0.01 across the two subsets), but bootstrap reveals that the |
| 50 | +"True" subset is heterogeneous — only 2 are confidently Simpson; |
| 51 | +7 are boundary cases. |
| 52 | + |
| 53 | +## Visualization (Charter V8) |
| 54 | + |
| 55 | + |
| 56 | + |
| 57 | +Left: P(simpson) distribution with 0.1 / 0.9 cut lines. |
| 58 | +Right: β_fast × β_slow scatter with bootstrap 95% CI bars colored |
| 59 | +by P(simpson). Points near the axes have wide CIs and intermediate |
| 60 | +P; points far from axes have tight CIs and clear classification. |
| 61 | + |
| 62 | +## Production change |
| 63 | + |
| 64 | +`AuditionInputs` gains optional `p_simpson: Optional[float]` field |
| 65 | +(audition_matrix.py:69). New top-priority branch in |
| 66 | +`classify_triage_flags`: |
| 67 | + |
| 68 | +| `p_simpson` | Severity | Action | |
| 69 | +|---|---|---| |
| 70 | +| ≥ 0.9 | **medium** | "high-confidence Simpson regime" | |
| 71 | +| 0.1 < P < 0.9 | **low** | "boundary case ... sanity-check" | |
| 72 | +| ≤ 0.1 | (suppress) | "confidently non-Simpson" | |
| 73 | +| `None` | fall through | EXP-2854/2856 boolean+stability path | |
| 74 | + |
| 75 | +3 new tests: |
| 76 | +- `test_p_simpson_high_emits_medium` |
| 77 | +- `test_p_simpson_boundary_emits_low` |
| 78 | +- `test_p_simpson_low_suppresses` (overrides up_shift phenotype proxy) |
| 79 | + |
| 80 | +`SimpsonFactsLoader` extended: |
| 81 | +- New `bootstrap_path` arg (defaults to |
| 82 | + `externals/experiments/exp-2859_bootstrap_simpson.parquet`). |
| 83 | +- `SimpsonAuditionFacts` gains `p_simpson: Optional[float]` field. |
| 84 | +- Live smoke-test: 30 patients indexed (29 from EXP-2853 ∪ EXP-2856 |
| 85 | + + ~26 from EXP-2859), `b` returns `(True, 0.20, 0.39)` — |
| 86 | + boundary case as expected. |
| 87 | + |
| 88 | +19/19 audition + loader tests pass. |
| 89 | + |
| 90 | +## Findings invariants |
| 91 | + |
| 92 | +- **Bootstrap sharpens classification**: 12/26 confidently clean, |
| 93 | + 2/26 confidently Simpson, 12/26 boundary. The boolean flag was |
| 94 | + a 50-50 coin flip for the 7 "boundary-Simpson" patients. |
| 95 | +- **Block bootstrap is mandatory** — non-block resampling would |
| 96 | + violate β_slow's exchangeability assumption. |
| 97 | +- **2 confident-Simpson patients** (P ≥ 0.9) merit medium-severity |
| 98 | + attention; **12 confident-clean** can have Simpson flag suppressed |
| 99 | + outright; **12 boundary** get low-severity acknowledgment. |
| 100 | +- The point Simpson flag from EXP-2853 stays as a fallback when |
| 101 | + bootstrap data isn't available. |
| 102 | + |
| 103 | +## Deliverables |
| 104 | + |
| 105 | +| File | Purpose | |
| 106 | +|------|---------| |
| 107 | +| `tools/cgmencode/exp_bootstrap_simpson_2859.py` | Driver | |
| 108 | +| `externals/experiments/exp-2859_bootstrap_simpson.parquet` | Per-patient P + β_fast/β_slow CIs | |
| 109 | +| `externals/experiments/exp-2859_summary.json` | Cohort tabulation | |
| 110 | +| `docs/60-research/figures/exp-2859_bootstrap_simpson.png` | Two-panel chart | |
| 111 | +| `tools/cgmencode/production/audition_matrix.py` | `p_simpson` field + severity rules | |
| 112 | +| `tools/cgmencode/production/simpson_facts_loader.py` | Bootstrap artifact loader | |
| 113 | +| `tools/cgmencode/production/test_audition_matrix.py` | 3 new tests | |
| 114 | +| `tools/cgmencode/production/test_simpson_facts_loader.py` | bootstrap-path test fixtures | |
| 115 | + |
| 116 | +## Next experiments |
| 117 | + |
| 118 | +- **EXP-2860**: bootstrap CI on per-(patient, TOD) Simpson — |
| 119 | + combine EXP-2855's TOD slicing with EXP-2859's bootstrap to give |
| 120 | + TOD-aware confidence (do TOD buckets stabilize the boundary |
| 121 | + cases?). |
| 122 | +- **EXP-2861**: extend bootstrap to other audition signals (ISF gap, |
| 123 | + recovery fraction) — generalize the "confidence-band" pattern. |
| 124 | +- **viz-meal-overlay-absorption** (carryover): meal-event chart |
| 125 | + with declared vs modeled carb absorption. |
0 commit comments