docs(research): enrichment shot RESULTS — enriched_worse_check, per-leg line closed

JohnCCarter · claude · JohnCCarter · commit abd8cece28f9 · 2026-06-25T11:30:47.000+02:00
Blind Commit-2 of the enrichment LOCK (Track A). Parity confirmed: AP-baseline
(4h k=3) = Stage-2 headline 0.0567, n_test_pos=65, excl=0 (no look-ahead).
Causal `exclusivity` lowers pooled OOS AP 0.0567-&gt;0.0387; AP-lift -0.018, CI
[-0.070, -0.0019], p(lift&lt;=0)=0.994 -&gt; enriched_worse_check. Validity checks
pass (not a bug). Mechanism (Inferred): 0.80 collinear with cleanliness. Per-leg
-feature line CLOSED; fork (B grow-facit recommended; A' decorrelated under new
lock, low prior) surfaced to user. No edge/behaviour/PnL/Genesis claim.

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/docs/research_wiki/handoff.md b/docs/research_wiki/handoff.md
@@ -27,32 +27,32 @@ append-only trail lives in [log.md](log.md).
 
 **ETH/USD:** blocked until BTC protocol approved.
 
-## Next Step (consolidated — requires explicit GO)
+## Next Step (requires explicit GO)
 
-Selection-learning is **paused at the enrichment lock** (Commit 1, docs-only; no code). Next is a
-**GO-fork** — direction is the user's call (AGENTS.md §1); recommendation: **A first**, then B.
+Enrichment shot **DONE** (Commit 2, 4h k=3): blind verdict **`enriched_worse_check`** — `exclusivity`
+is significantly *worse* than Stage-2 (AP-lift CI [−0.070, −0.0019]); validity checks pass (not a bug).
+**Per-leg-feature line is CLOSED.** Next is a **GO-fork** — direction is the user's call (AGENTS.md §1);
+recommendation: **B**.
 
-- **A — enrichment Commit 2:** build/run the `exclusivity` / leg-completeness shot
-  (`selection_learning_enrich.py`) — nested AP-lift vs the current Stage-2 model, 4h k=3, per the
-  [enrichment LOCK](reviews/btc-fib-selection-learning-enrichment-lock-20260624.md). Prior **low**;
-  a clean `no_enrichment_signal` routes pre-committed to B (§E8). Cheap/locked/blind → run first.
-- **B — grow facit:** park modeling, return to the main quest — more/better human labels
-  ([main-quest reset](reviews/btc-fib-selection-learning-main-quest-reset-20260624.md) §5).
+- **B — grow facit (recommended):** park modeling, return to the main quest — more/better human labels
+  ([main-quest reset](reviews/btc-fib-selection-learning-main-quest-reset-20260624.md) §5). Binding
+  constraint is now **data, not features**.
+- **A′ — decorrelated exclusivity (low prior, NEW lock):** `exclusivity` was 0.80 collinear with
+  `cleanliness`; an orthogonalized variant needs its **own** Commit-1 lock (reopens a closed line). Not free.
 
-**No code/run/build until a separate explicit GO** (discriminator = appetite: close per-leg features via A → then B, or pivot straight to B).
+**No code/run/build until a separate explicit GO.**
 
 ## Recent Changes
 
-- **2026-06-24 Fib SELECTION-LEARNING model-ENRICHMENT — LOCKED (Commit 1, docs-only); ⏸ PAUSED.**
-  Blind lock for one lean shot: does a causal **leg-completeness / `exclusivity`** feature
-  (pivot-structural, k*=3, distinct from `cleanliness`) raise pooled OOS AP **over the current Stage-2
-  model** (nested) on 4h k=3? Prior **low**; clean null routes to facit growth (§E8). Resume/fork now
-  under **## Next Step** above. No code started. [Enrichment LOCK](reviews/btc-fib-selection-learning-enrichment-lock-20260624.md).
+- **2026-06-25 Fib SELECTION-LEARNING model-ENRICHMENT — RUN → `enriched_worse_check` (4h k=3); line
+  CLOSED.** Blind Commit-2 of the [enrichment LOCK](reviews/btc-fib-selection-learning-enrichment-lock-20260624.md)
+  (`c80acb0`). Parity: AP-baseline = Stage-2 headline 0.0567, n_test_pos=65, excl=0 (no look-ahead).
+  Causal `exclusivity` *lowers* pooled OOS AP 0.0567→0.0387; AP-lift −0.018, CI [−0.070, −0.0019],
+  p(lift≤0)=0.994. Mechanism (Inferred): 0.80 collinear with `cleanliness`. Per-leg-feature line
+  **closed** → grow-facit (fork under **## Next Step**). [Results](reviews/btc-fib-selection-learning-enrichment-results-20260625.md).
 - **2026-06-24 Fib SELECTION-LEARNING — MAIN-QUEST RESET (docs-only).** Stop the mechanics drift,
   re-anchor to the north star (above). Controls/mechanics (artifact-probe, snapping/net-path mechanics,
-  flip) are **DONE; matched-null / detector-geometry side-quests PARKED.** Next step only if it improves
-  the model's human-like leg/range selection vs facit (behind a blind lock); else park the modeling and
-  return to the human BTC top-down labeling main quest.
+  flip) **DONE; matched-null / detector-geometry side-quests PARKED.**
   [Main-quest reset](reviews/btc-fib-selection-learning-main-quest-reset-20260624.md).
 
 - **2026-06-22 Fib SELECTION-LEARNING W-gap study — BUILT + module split, RUN PENDING (home).** Commit 2 of side-quest #1, built to the [W-gap LOCK](reviews/btc-fib-selection-learning-w-gap-lock-20260622.md) (`4f47d8e`): `gap(k)=AP(retro-W)−AP(live-k)` on identical rows, embargo=W, L5 verdict. New `research/selection_learning_gap.py` (+5 tests); W-gap code split out to keep `selection_learning.py` under the §6 size cap (was 995 lines); flushed-stderr `_progress` logging in `build_candidates`+`build_retro_features` so a long run is never blind (result-neutral). **Run NOT executed** — inherent ~2-3h per-endpoint-detect cost on the ~20k-bar 4h frame (leakage-bearing truncation, no legal shortcut); to run at home (see Next tracks). No gap results, no verdict. Commit `884d4c0`, gates green (pytest 549, cov 75%).
diff --git a/docs/research_wiki/log.md b/docs/research_wiki/log.md
@@ -16,6 +16,25 @@ Types: `ingest`, `decision`, `review`, `question`, `maintenance`.
 > Pre-reset (2026-06-10 and earlier): [part 3](log-archive-pre-btc-reset-part3.md) →
 > [part 2](log-archive-pre-btc-reset-part2.md) → [part 1](log-archive-pre-btc-reset-part1.md)
 
+## [2026-06-25] review | Fib SELECTION-LEARNING model-enrichment RUN → `enriched_worse_check`; line CLOSED
+
+Blind Commit-2 of the [enrichment LOCK](reviews/btc-fib-selection-learning-enrichment-lock-20260624.md)
+(Track A; harness `c80acb0`, seed 20260618, frozen-data parity, preflight READY). **Parity gate passed:**
+`ap_baseline_stage2` (4h k=3) = **0.056737** = the Stage-2 headline 0.0567, `n_test_pos` = 65, `excl=0`
+(every row reconstructs causally, no look-ahead) — the nested baseline IS the current model. Spec note:
+the pre-run "n_candidates ≈ 24852" was a label mix-up (24852 = n_test; full universe 86244).
+
+**Verdict (4h primary, powered):** adding causal `exclusivity` *lowers* pooled OOS AP 0.0567→0.0387;
+AP-lift −0.018, decision-point cluster bootstrap CI95 **[−0.070, −0.0019]** (excludes 0 below),
+p(lift≤0)=0.994 → **`enriched_worse_check`**. Direction-guard checks (parity, excl=0, bootstrap unit,
+power) all pass → **not a bug**. Mechanism (Inferred, per E1): `corr(exclusivity, cleanliness)` = 0.80
+on train — near-proxy, variance cost on 65 positives. **Per-leg-feature modeling line CLOSED**;
+substantive north-star implication = the E8 route (grow the facit). The `enriched_worse_check` branch
+is not pre-committed to a direction, so the fork (B = grow facit, recommended; A′ = decorrelated
+exclusivity under a NEW lock, low prior) is surfaced to the user — no direction chosen by the agent.
+No edge/behaviour/PnL/Genesis/auto-fib claim. Artifacts gitignored.
+[Results](reviews/btc-fib-selection-learning-enrichment-results-20260625.md).
+
 ## [2026-06-25] maintenance | Consolidated the A/B next-step into a handoff `## Next Step` block
 
 Docs-only. The next step (the enrichment-lock GO-fork: **A** = build/run the `exclusivity` Commit 2,
diff --git a/docs/research_wiki/reviews/btc-fib-selection-learning-enrichment-results-20260625.md b/docs/research_wiki/reviews/btc-fib-selection-learning-enrichment-results-20260625.md
@@ -0,0 +1,81 @@
+# BTC Fib Selection-Learning — model-ENRICHMENT RESULTS (leg-completeness) (2026-06-25)
+
+Blind Commit-2 execution of the [enrichment LOCK
+(2026-06-24)](btc-fib-selection-learning-enrichment-lock-20260624.md). One pre-specified feature
+(`exclusivity` / leg-completeness, E1), one nested comparison vs the current Stage-2 model (E2), one
+blind verdict (E4). **No edge / behaviour / PnL / backtest claim** (E7). Harness:
+[`selection_learning_enrich.py`](../../../src/fibengine/research/selection_learning_enrich.py)
+(commit `c80acb0`); seed `20260618`; frozen-data parity (no `--refresh`); preflight READY before run.
+
+> **Verdict (blind, 4h primary k=3): `enriched_worse_check`.** The enriched model is *significantly
+> worse* than current Stage-2 (AP-lift 95% CI entirely below 0). The lock's direction-guard checks
+> (parity, no look-ahead, bootstrap unit, power) all pass → **not a bug**. For the north-star this is
+> a negative shot: the locked per-leg `exclusivity` feature does **not** add human-like leg-selection
+> signal over the model we already have. The per-leg-feature modeling line is **closed**.
+
+## Observed (measured — 4h primary, powered: 65 test positives)
+
+| quantity | value |
+|---|---|
+| AP baseline (current Stage-2, nested) | **0.056737** |
+| AP enriched (Stage-2 + `exclusivity`) | **0.038744** |
+| AP-lift (point) | **−0.017993** |
+| AP-lift bootstrap mean | −0.023651 |
+| **AP-lift 95% CI** | **[−0.070026, −0.001895]** (excludes 0, below) |
+| p(lift ≤ 0), one-sided | 0.994 |
+| bootstrap | decision-point cluster by `anchor_b`, 2000 resamples, 2071 groups |
+| ROC-AUC enriched (secondary) | 0.9252 |
+| `corr(exclusivity, cleanliness)` (train) | **0.804** |
+| `exclusivity` standardized weight | +0.1142 (`cleanliness` +0.1502 still leads) |
+| n_candidates / n_train / n_test | 86244 / 61368 / 24852 |
+| rows excluded (endpoint beyond data / not reconstructible) | 0 / 0 |
+| `exclusivity` dist | mean 0.275, std 0.345, frac@0 0.497, frac@1 0.093 |
+
+**Parity gate (proves the nested baseline IS the current model):** `ap_baseline_stage2` =
+**0.056737** = the Stage-2 headline **0.0567**; `n_test_positives` = **65**, matching Stage-2.
+*Spec-reconciliation:* the pre-run note "n_candidates ≈ 24852" was a label mix-up — **24852 = n_test**;
+the full candidate universe is **86244** (= Stage-2's universe). Substantive parity holds.
+`rows_excluded = 0` confirms every row reconstructs causally (no look-ahead, no endpoint dropped).
+
+**Context cells (underpowered, never refuted — E3 power floor ≥10 positives):**
+
+| TF | test pos | AP base | AP enr | lift | note |
+|---|---|---|---|---|---|
+| 1M | 5 | 0.2636 | 0.2789 | +0.0153 | underpowered; corr 0.878 |
+| 1w | 0 | — | — | — | no positives |
+| 1d | 7 | 0.1617 | 0.1599 | −0.0018 | underpowered; corr 0.808 |
+
+Context is reported for completeness only; the verdict rests solely on the 4h powered cell (E3/E4).
+
+## Inferred (interpretation — not measured)
+
+- **The locked `exclusivity` definition does not enrich the current model.** A negative powered lift
+  with CI excluding 0 means the 6th feature does not help and, as fit, costs net OOS ranking power.
+- **Most likely mechanism (reported per E1, *not* a reason to discount the verdict): collinearity.**
+  `corr(exclusivity, cleanliness) = 0.804` on train — `exclusivity` is largely a `cleanliness` proxy.
+  Adding a near-collinear, noisier regressor on only 65 test positives plausibly inflates variance and
+  drags pooled test AP down. This is a *mechanism*, not grounds to soften the blind result.
+- **North-star read:** this closes the per-leg-feature line cleanly. The locked honest prior was low
+  (four per-leg features already ~0 at k=3; only `cleanliness` stuck); the shot confirms the per-leg
+  approach has hit its ceiling on this corpus. The binding constraint is now **data, not features**.
+- **Lock-routing nuance:** E8 pre-committed `no_enrichment_signal → grow the facit`. The realized
+  branch is `enriched_worse_check`, whose *substantive* north-star implication is the same (per-leg
+  features do not beat Stage-2 → grow the facit), but the **direction choice is not pre-committed** for
+  this branch — it is surfaced to the user (see handoff Next Step).
+
+## Unverified (open — would need a NEW lock)
+
+- Whether a **decorrelated / residualized** exclusivity (orthogonalized vs `cleanliness`) carries any
+  orthogonal signal. This is a **different feature needing its own Commit-1 lock**, not a continuation
+  of this one — and the prior is **low** (if the 0.80-collinear version's residual hurt here, the
+  orthogonal component is small). Not a free natural next step; reopening a closed line.
+- The `cleanliness`-as-genuine-signal crux stays **OPEN** (E7) — this shot does not resolve it.
+- Absolute reproduction of human selection remains capped by the ~0.83 coverage ceiling (E7); no
+  edge / behaviour / PnL / Genesis / auto-fib-as-truth / label-mutation claim is made or implied.
+
+## Artifacts
+
+- Summary JSON: `experiments/review/fib_selection_learning/enrich/summary.json` (**gitignored**).
+- Per-cell checkpoints: `experiments/review/fib_selection_learning/enrich/cells/*.json` (**gitignored**).
+- Harness + tests: `selection_learning_enrich.py`,
+  `tests/research/test_selection_learning_enrich.py` (commit `c80acb0`; gates green — 601 pass, 74% cov).