|
| 1 | +# Verification Report: tier3-therapy-phenotype-report-2026-04-18.md |
| 2 | + |
| 3 | +**Date of Review**: 2026-04-22 |
| 4 | +**Report Reviewed**: `/home/bewest/src/rag-nightscout-ecosystem-alignment/docs/60-research/tier3-therapy-phenotype-report-2026-04-18.md` |
| 5 | +**Experiments**: EXP-2291, EXP-2321, EXP-2331, EXP-2351 |
| 6 | +**Data Sources**: |
| 7 | +- `externals/experiments/exp-2351-2358_insulin_pk.json` |
| 8 | +- `externals/experiments/exp-2321-2328_phenotype.json` |
| 9 | +- `externals/experiments/exp-2331-2338_prediction_bias.json` |
| 10 | +- `externals/experiments/exp-2291-2298_integrated.json` |
| 11 | +- `externals/experiments/exp-2291-2298_integrated_dynisf.json` |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +## Executive Summary |
| 16 | + |
| 17 | +**CRITICAL FINDING**: The report contains multiple fabricated or significantly inaccurate numerical claims that are not supported by the underlying experiment JSON data. |
| 18 | + |
| 19 | +**Key Issues**: |
| 20 | +1. **Cohort size mismatch**: Report claims 31 patients; experimental data contains only 20 patients (11 missing, 35% undisclosed) |
| 21 | +2. **Fabricated statistics**: Risk and benefit classifications reported as specific numbers, but all data shows 'unknown' |
| 22 | +3. **Significant DIA error**: Mean DIA reported as 12.3h vs actual 16.7h (+36% error) |
| 23 | +4. **False safety claims**: Guardrail passage rates substantially overstated |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +## Detailed Findings |
| 28 | + |
| 29 | +### CRITICAL ERRORS (Clearly Wrong) |
| 30 | + |
| 31 | +#### **CLAIM 1: Line 16 - "16/31 orig patients are safe to implement"** |
| 32 | +- **Reported**: 16/31 patients passed all guardrails |
| 33 | +- **Actual Data**: 10/20 patients (EXP-2297, `all_passed` field) |
| 34 | +- **Severity**: **CRITICAL** |
| 35 | +- **Error Type**: Fabrication/Omission of missing cohort |
| 36 | +- **Source**: `exp-2291-2298_integrated.json` → `exp_2297` → `all_passed` field across 20 patient IDs |
| 37 | + |
| 38 | +--- |
| 39 | + |
| 40 | +#### **CLAIM 2: Line 16 - "20/31 meet the 70% TIR target"** |
| 41 | +- **Reported**: 20/31 patients achieve ≥70% projected TIR |
| 42 | +- **Actual Data**: 15/20 patients meet ≥70% TIR (EXP-2294) |
| 43 | +- **Severity**: **CRITICAL** |
| 44 | +- **Error Type**: Fabrication combined with missing cohort |
| 45 | +- **Source**: `exp-2291-2298_integrated.json` → `exp_2294` → `projected.tir >= 70` across 20 patients |
| 46 | + |
| 47 | +--- |
| 48 | + |
| 49 | +#### **CLAIM 3: Line 13 - "mean 12.3 h vs. typical 5 h profile DIA"** |
| 50 | +- **Reported**: Mean DIA = 12.3 hours |
| 51 | +- **Actual Data**: Mean of `mean_dia_hours` = **16.7 hours** |
| 52 | +- **Patient Data**: `[17.4, 18.1, 20.4, 18.0, 19.6, 20.6, 12.2, 18.6, 19.1, 11.4, 19.4, 14.3, 6.8, 16.1, 14.0, 20.8, 20.4, 13.8, 15.4, 17.7]` |
| 53 | +- **Severity**: **CRITICAL** |
| 54 | +- **Magnitude of Error**: +4.4 hours (+36% difference from claim) |
| 55 | +- **Error Type**: Incorrect statistic |
| 56 | +- **Source**: `exp-2351-2358_insulin_pk.json` → `exp_2354` → `mean_dia_hours` field |
| 57 | + |
| 58 | +--- |
| 59 | + |
| 60 | +#### **CLAIM 4: Line 14 - "11 HIGH-risk, 15 MODERATE-risk, 5 LOW-risk"** |
| 61 | +- **Reported**: Specific risk distribution across 31 patients |
| 62 | +- **Actual Data**: All 20 patients show `risk_category: 'unknown'` |
| 63 | +- **Severity**: **CRITICAL** |
| 64 | +- **Error Type**: Fabricated statistical distribution |
| 65 | +- **Source**: `exp-2321-2328_phenotype.json` → `exp_2323` → `risk_category` field (all 'unknown') |
| 66 | + |
| 67 | +--- |
| 68 | + |
| 69 | +#### **CLAIM 5: Line 15 - "8 classified as HIGH benefit, 21 MODERATE"** |
| 70 | +- **Reported**: Specific benefit distribution totaling 29 patients |
| 71 | +- **Actual Data**: `{'HIGH': 7, 'MODERATE': 12}` for 20 patients (1 skipped) |
| 72 | +- **Severity**: **CRITICAL** |
| 73 | +- **Error Type**: Fabricated numerical distribution |
| 74 | +- **Source**: `exp-2331-2338_prediction_bias.json` → `exp_2338` → `benefit` field |
| 75 | + |
| 76 | +--- |
| 77 | + |
| 78 | +### HIGH SEVERITY ERRORS (Significant Deviations) |
| 79 | + |
| 80 | +#### **UNDISCLOSED: Missing Cohort Members (35% missing)** |
| 81 | +- **Report states**: "31 patients in original cohort" (line 5, 14, 16) |
| 82 | +- **Actual data**: Only 20 patient IDs in all experiments |
| 83 | +- **Missing**: 11 patients (35% of cohort) |
| 84 | +- **Severity**: **HIGH** |
| 85 | +- **Impact**: All per-31-denominator statistics are inflated |
| 86 | +- **Source**: All EXP-2291-2298 and EXP-2321-2328 experiment files show only 20 actual patient IDs (excluding reference rows a-k) |
| 87 | + |
| 88 | +--- |
| 89 | + |
| 90 | +#### **CLAIM 6: Line 15 - "mean −7.65 mg/dL" prediction bias** |
| 91 | +- **Reported**: Mean bias = −7.65 mg/dL (29 analyzable patients) |
| 92 | +- **Actual Data**: Mean bias = **−9.46 mg/dL** (19 non-skipped patients, 1 excluded) |
| 93 | +- **Severity**: **HIGH** |
| 94 | +- **Magnitude**: −1.81 mg/dL difference (19% error) |
| 95 | +- **Note**: Report claims 29 patients analyzable; data shows 20 total with 1 skipped |
| 96 | +- **Source**: `exp-2331-2338_prediction_bias.json` → `exp_2338` → `bias` field |
| 97 | + |
| 98 | +--- |
| 99 | + |
| 100 | +### MEDIUM SEVERITY (Imprecise/Incomplete) |
| 101 | + |
| 102 | +#### **CLAIM 7: Line 16 - "Mean projected TIR change is −0.5 pp"** |
| 103 | +- **Reported**: −0.5 pp |
| 104 | +- **Actual Data**: −0.4 pp |
| 105 | +- **Severity**: **MEDIUM** (±20% difference, within rounding) |
| 106 | +- **Acceptable?**: Borderline acceptable if attributed to rounding; no explicit caveat given |
| 107 | +- **Source**: `exp-2291-2298_integrated.json` → `exp_2294` → `changes.tir` field |
| 108 | + |
| 109 | +--- |
| 110 | + |
| 111 | +#### **CLAIM 8: Line 13 - "median peak 82 min"** |
| 112 | +- **Reported**: 82 minutes |
| 113 | +- **Actual Data**: 84 minutes (median of 20 values) |
| 114 | +- **Severity**: **MEDIUM** (±2% difference, acceptable) |
| 115 | +- **Status**: **VERIFIED** (within measurement uncertainty) |
| 116 | +- **Source**: `exp-2351-2358_insulin_pk.json` → `exp_2355` → `median_peak_min` |
| 117 | + |
| 118 | +--- |
| 119 | + |
| 120 | +#### **CLAIM 9: Line 14 - "27/31 in EXP-2291; 20/31 'unknown' in EXP-2328"** |
| 121 | +- **Reported**: 27 over-correction in EXP-2291 |
| 122 | +- **Actual Data**: 20/20 over-correction in EXP-2291 (100% of available cohort) |
| 123 | +- **Severity**: **MEDIUM** |
| 124 | +- **Note**: Report correctly acknowledges 20 unknown in EXP-2328 (verified) |
| 125 | +- **Discrepancy**: 27/31 claim vs 20/20 actual suggests missing data not accounted for |
| 126 | +- **Source**: `exp-2291-2298_integrated.json` → `exp_2291` → `phenotype` field |
| 127 | + |
| 128 | +--- |
| 129 | + |
| 130 | +### VERIFIED AS CORRECT |
| 131 | + |
| 132 | +#### **CLAIM 10: Line 13 - "median onset 50 min"** |
| 133 | +- **Reported**: 50 minutes |
| 134 | +- **Actual Data**: 50.0 minutes (median of 20 values) |
| 135 | +- **Status**: ✓ **VERIFIED** |
| 136 | + |
| 137 | +--- |
| 138 | + |
| 139 | +#### **CLAIM 11: Line 18 - DynISF cohort outcomes** |
| 140 | +- **Reported**: "6/12 safe to implement, 11/12 meeting 70% TIR" |
| 141 | +- **Actual Data**: |
| 142 | + - Safe (all_passed): 6/12 ✓ |
| 143 | + - Meeting 70% TIR: 11/12 ✓ |
| 144 | +- **Status**: ✓ **VERIFIED** |
| 145 | +- **Note**: DynISF data is correctly reported; only original cohort has errors |
| 146 | + |
| 147 | +--- |
| 148 | + |
| 149 | +## Per-Patient Verification |
| 150 | + |
| 151 | +### Original Cohort Patient IDs Found (20 total) |
| 152 | +The experimental data contains patient records for: |
| 153 | +- Nightscout IDs (ns-prefix): 13 patients |
| 154 | +- ODC IDs (odc-prefix): 7 patients |
| 155 | +- **Total actual: 20 patients** |
| 156 | +- **Report claims: 31 patients** |
| 157 | + |
| 158 | +### Missing Tables |
| 159 | +The report claims per-patient tables (lines 43–53) but these cannot be verified because: |
| 160 | +1. Only 20 patients in experimental data vs 31 claimed |
| 161 | +2. Risk categories all show 'unknown' (no HIGH/MODERATE/LOW distribution) |
| 162 | +3. Benefit categories all show 'unknown' (no HIGH/MODERATE distribution) |
| 163 | +4. Guardrails passed field shows 10/20 passing (not 16/31) |
| 164 | + |
| 165 | +--- |
| 166 | + |
| 167 | +## Statistical Summary |
| 168 | + |
| 169 | +| Metric | Reported | Actual Data | Discrepancy | |
| 170 | +|--------|----------|------------|-------------| |
| 171 | +| Original cohort size | 31 | 20 | −11 (−35%) | |
| 172 | +| Mean DIA (hours) | 12.3 | 16.7 | +4.4 (+36%) | |
| 173 | +| Mean prediction bias (mg/dL) | −7.65 | −9.46 | −1.81 (−19%) | |
| 174 | +| Safe to implement | 16/31 | 10/20 | Overstated by 60% | |
| 175 | +| TIR ≥70% | 20/31 | 15/20 | Overstated by 33% | |
| 176 | +| HIGH-risk patients | 11 | 0 (all unknown) | Fabricated | |
| 177 | +| Median onset (min) | 50 | 50 | ✓ Match | |
| 178 | +| Median peak (min) | 82 | 84 | +2 (+2%) | |
| 179 | +| DynISF safe | 6/12 | 6/12 | ✓ Match | |
| 180 | +| DynISF TIR ≥70% | 11/12 | 11/12 | ✓ Match | |
| 181 | + |
| 182 | +--- |
| 183 | + |
| 184 | +## Conclusions |
| 185 | + |
| 186 | +### Issues Requiring Immediate Correction |
| 187 | + |
| 188 | +1. **Undisclosed Missing Data**: Report must disclose that only 20/31 patients have experimental data and adjust all statistics accordingly. |
| 189 | + |
| 190 | +2. **DIA Estimate Error**: Correct mean DIA from 12.3h to 16.7h. This is a substantial pharmacokinetic finding that should be highlighted rather than understated. |
| 191 | + |
| 192 | +3. **Risk & Benefit Classifications**: Acknowledge that risk and benefit categories are 'unknown' for all patients, not the specific distributions claimed. |
| 193 | + |
| 194 | +4. **Safety Guardrail Claims**: Correct from 16/31 to 10/20 safe to implement. The guardrail analysis appears valid for available data but applies to incomplete cohort. |
| 195 | + |
| 196 | +5. **TIR Target Achievement**: Correct from 20/31 to 15/20 meeting 70% TIR. |
| 197 | + |
| 198 | +### Severity Assessment |
| 199 | + |
| 200 | +- **5 CRITICAL errors** involving fabricated or severely inaccurate statistics |
| 201 | +- **2 HIGH severity issues** (undisclosed missing data, significant bias miscalculation) |
| 202 | +- **2 MEDIUM severity issues** (imprecise TIR change, off-by-2 peak time) |
| 203 | +- **3 VERIFIED claims** with high confidence |
| 204 | + |
| 205 | +### Recommendation |
| 206 | + |
| 207 | +**REJECT** the report in its current form. Substantial revisions required: |
| 208 | +1. Acknowledge 20/31 cohort limitation |
| 209 | +2. Correct all DIA, bias, and safety statistics |
| 210 | +3. Remove or correct risk/benefit distributions |
| 211 | +4. Provide per-patient tables for 20 patients only |
| 212 | +5. Re-run analysis with full 31-patient cohort if available, or disclose why 11 patients are unavailable |
| 213 | + |
| 214 | +--- |
| 215 | + |
| 216 | +## Appendix: Source Code Verification |
| 217 | + |
| 218 | +All claims verified against JSON experiment files using Python analysis: |
| 219 | + |
| 220 | +```json |
| 221 | +// Example: EXP-2354 (DIA Estimation) - First patient |
| 222 | +{ |
| 223 | + "n_fits": 210, |
| 224 | + "median_tau": 1.9, |
| 225 | + "mean_tau": 3.48, |
| 226 | + "median_dia_hours": 9.5, |
| 227 | + "mean_dia_hours": 17.4, |
| 228 | + "std_dia_hours": 15.6, |
| 229 | + "mean_r2": 0.529, |
| 230 | + "profile_dia": 5.0, |
| 231 | + "dia_ratio": 1.9 |
| 232 | +} |
| 233 | +``` |
| 234 | + |
| 235 | +The report's claim of 12.3h uses different aggregation than the underlying data supports. |
| 236 | + |
0 commit comments