Skip to content

Commit abd8cec

Browse files
JohnCCarterclaude
andcommitted
docs(research): enrichment shot RESULTS — enriched_worse_check, per-leg line closed
Blind Commit-2 of the enrichment LOCK (Track A). Parity confirmed: AP-baseline (4h k=3) = Stage-2 headline 0.0567, n_test_pos=65, excl=0 (no look-ahead). Causal `exclusivity` lowers pooled OOS AP 0.0567->0.0387; AP-lift -0.018, CI [-0.070, -0.0019], p(lift<=0)=0.994 -> enriched_worse_check. Validity checks pass (not a bug). Mechanism (Inferred): 0.80 collinear with cleanliness. Per-leg -feature line CLOSED; fork (B grow-facit recommended; A' decorrelated under new lock, low prior) surfaced to user. No edge/behaviour/PnL/Genesis claim. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent c80acb0 commit abd8cec

3 files changed

Lines changed: 118 additions & 18 deletions

File tree

docs/research_wiki/handoff.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -27,32 +27,32 @@ append-only trail lives in [log.md](log.md).
2727

2828
**ETH/USD:** blocked until BTC protocol approved.
2929

30-
## Next Step (consolidated — requires explicit GO)
30+
## Next Step (requires explicit GO)
3131

32-
Selection-learning is **paused at the enrichment lock** (Commit 1, docs-only; no code). Next is a
33-
**GO-fork** — direction is the user's call (AGENTS.md §1); recommendation: **A first**, then B.
32+
Enrichment shot **DONE** (Commit 2, 4h k=3): blind verdict **`enriched_worse_check`**`exclusivity`
33+
is significantly *worse* than Stage-2 (AP-lift CI [−0.070, −0.0019]); validity checks pass (not a bug).
34+
**Per-leg-feature line is CLOSED.** Next is a **GO-fork** — direction is the user's call (AGENTS.md §1);
35+
recommendation: **B**.
3436

35-
- **A — enrichment Commit 2:** build/run the `exclusivity` / leg-completeness shot
36-
(`selection_learning_enrich.py`) — nested AP-lift vs the current Stage-2 model, 4h k=3, per the
37-
[enrichment LOCK](reviews/btc-fib-selection-learning-enrichment-lock-20260624.md). Prior **low**;
38-
a clean `no_enrichment_signal` routes pre-committed to B (§E8). Cheap/locked/blind → run first.
39-
- **B — grow facit:** park modeling, return to the main quest — more/better human labels
40-
([main-quest reset](reviews/btc-fib-selection-learning-main-quest-reset-20260624.md) §5).
37+
- **B — grow facit (recommended):** park modeling, return to the main quest — more/better human labels
38+
([main-quest reset](reviews/btc-fib-selection-learning-main-quest-reset-20260624.md) §5). Binding
39+
constraint is now **data, not features**.
40+
- **A′ — decorrelated exclusivity (low prior, NEW lock):** `exclusivity` was 0.80 collinear with
41+
`cleanliness`; an orthogonalized variant needs its **own** Commit-1 lock (reopens a closed line). Not free.
4142

42-
**No code/run/build until a separate explicit GO** (discriminator = appetite: close per-leg features via A → then B, or pivot straight to B).
43+
**No code/run/build until a separate explicit GO.**
4344

4445
## Recent Changes
4546

46-
- **2026-06-24 Fib SELECTION-LEARNING model-ENRICHMENT — LOCKED (Commit 1, docs-only); ⏸ PAUSED.**
47-
Blind lock for one lean shot: does a causal **leg-completeness / `exclusivity`** feature
48-
(pivot-structural, k*=3, distinct from `cleanliness`) raise pooled OOS AP **over the current Stage-2
49-
model** (nested) on 4h k=3? Prior **low**; clean null routes to facit growth (§E8). Resume/fork now
50-
under **## Next Step** above. No code started. [Enrichment LOCK](reviews/btc-fib-selection-learning-enrichment-lock-20260624.md).
47+
- **2026-06-25 Fib SELECTION-LEARNING model-ENRICHMENT — RUN → `enriched_worse_check` (4h k=3); line
48+
CLOSED.** Blind Commit-2 of the [enrichment LOCK](reviews/btc-fib-selection-learning-enrichment-lock-20260624.md)
49+
(`c80acb0`). Parity: AP-baseline = Stage-2 headline 0.0567, n_test_pos=65, excl=0 (no look-ahead).
50+
Causal `exclusivity` *lowers* pooled OOS AP 0.0567→0.0387; AP-lift −0.018, CI [−0.070, −0.0019],
51+
p(lift≤0)=0.994. Mechanism (Inferred): 0.80 collinear with `cleanliness`. Per-leg-feature line
52+
**closed** → grow-facit (fork under **## Next Step**). [Results](reviews/btc-fib-selection-learning-enrichment-results-20260625.md).
5153
- **2026-06-24 Fib SELECTION-LEARNING — MAIN-QUEST RESET (docs-only).** Stop the mechanics drift,
5254
re-anchor to the north star (above). Controls/mechanics (artifact-probe, snapping/net-path mechanics,
53-
flip) are **DONE; matched-null / detector-geometry side-quests PARKED.** Next step only if it improves
54-
the model's human-like leg/range selection vs facit (behind a blind lock); else park the modeling and
55-
return to the human BTC top-down labeling main quest.
55+
flip) **DONE; matched-null / detector-geometry side-quests PARKED.**
5656
[Main-quest reset](reviews/btc-fib-selection-learning-main-quest-reset-20260624.md).
5757

5858
- **2026-06-22 Fib SELECTION-LEARNING W-gap study — BUILT + module split, RUN PENDING (home).** Commit 2 of side-quest #1, built to the [W-gap LOCK](reviews/btc-fib-selection-learning-w-gap-lock-20260622.md) (`4f47d8e`): `gap(k)=AP(retro-W)−AP(live-k)` on identical rows, embargo=W, L5 verdict. New `research/selection_learning_gap.py` (+5 tests); W-gap code split out to keep `selection_learning.py` under the §6 size cap (was 995 lines); flushed-stderr `_progress` logging in `build_candidates`+`build_retro_features` so a long run is never blind (result-neutral). **Run NOT executed** — inherent ~2-3h per-endpoint-detect cost on the ~20k-bar 4h frame (leakage-bearing truncation, no legal shortcut); to run at home (see Next tracks). No gap results, no verdict. Commit `884d4c0`, gates green (pytest 549, cov 75%).

docs/research_wiki/log.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,25 @@ Types: `ingest`, `decision`, `review`, `question`, `maintenance`.
1616
> Pre-reset (2026-06-10 and earlier): [part 3](log-archive-pre-btc-reset-part3.md)
1717
> [part 2](log-archive-pre-btc-reset-part2.md)[part 1](log-archive-pre-btc-reset-part1.md)
1818
19+
## [2026-06-25] review | Fib SELECTION-LEARNING model-enrichment RUN → `enriched_worse_check`; line CLOSED
20+
21+
Blind Commit-2 of the [enrichment LOCK](reviews/btc-fib-selection-learning-enrichment-lock-20260624.md)
22+
(Track A; harness `c80acb0`, seed 20260618, frozen-data parity, preflight READY). **Parity gate passed:**
23+
`ap_baseline_stage2` (4h k=3) = **0.056737** = the Stage-2 headline 0.0567, `n_test_pos` = 65, `excl=0`
24+
(every row reconstructs causally, no look-ahead) — the nested baseline IS the current model. Spec note:
25+
the pre-run "n_candidates ≈ 24852" was a label mix-up (24852 = n_test; full universe 86244).
26+
27+
**Verdict (4h primary, powered):** adding causal `exclusivity` *lowers* pooled OOS AP 0.0567→0.0387;
28+
AP-lift −0.018, decision-point cluster bootstrap CI95 **[−0.070, −0.0019]** (excludes 0 below),
29+
p(lift≤0)=0.994 → **`enriched_worse_check`**. Direction-guard checks (parity, excl=0, bootstrap unit,
30+
power) all pass → **not a bug**. Mechanism (Inferred, per E1): `corr(exclusivity, cleanliness)` = 0.80
31+
on train — near-proxy, variance cost on 65 positives. **Per-leg-feature modeling line CLOSED**;
32+
substantive north-star implication = the E8 route (grow the facit). The `enriched_worse_check` branch
33+
is not pre-committed to a direction, so the fork (B = grow facit, recommended; A′ = decorrelated
34+
exclusivity under a NEW lock, low prior) is surfaced to the user — no direction chosen by the agent.
35+
No edge/behaviour/PnL/Genesis/auto-fib claim. Artifacts gitignored.
36+
[Results](reviews/btc-fib-selection-learning-enrichment-results-20260625.md).
37+
1938
## [2026-06-25] maintenance | Consolidated the A/B next-step into a handoff `## Next Step` block
2039

2140
Docs-only. The next step (the enrichment-lock GO-fork: **A** = build/run the `exclusivity` Commit 2,
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# BTC Fib Selection-Learning — model-ENRICHMENT RESULTS (leg-completeness) (2026-06-25)
2+
3+
Blind Commit-2 execution of the [enrichment LOCK
4+
(2026-06-24)](btc-fib-selection-learning-enrichment-lock-20260624.md). One pre-specified feature
5+
(`exclusivity` / leg-completeness, E1), one nested comparison vs the current Stage-2 model (E2), one
6+
blind verdict (E4). **No edge / behaviour / PnL / backtest claim** (E7). Harness:
7+
[`selection_learning_enrich.py`](../../../src/fibengine/research/selection_learning_enrich.py)
8+
(commit `c80acb0`); seed `20260618`; frozen-data parity (no `--refresh`); preflight READY before run.
9+
10+
> **Verdict (blind, 4h primary k=3): `enriched_worse_check`.** The enriched model is *significantly
11+
> worse* than current Stage-2 (AP-lift 95% CI entirely below 0). The lock's direction-guard checks
12+
> (parity, no look-ahead, bootstrap unit, power) all pass → **not a bug**. For the north-star this is
13+
> a negative shot: the locked per-leg `exclusivity` feature does **not** add human-like leg-selection
14+
> signal over the model we already have. The per-leg-feature modeling line is **closed**.
15+
16+
## Observed (measured — 4h primary, powered: 65 test positives)
17+
18+
| quantity | value |
19+
|---|---|
20+
| AP baseline (current Stage-2, nested) | **0.056737** |
21+
| AP enriched (Stage-2 + `exclusivity`) | **0.038744** |
22+
| AP-lift (point) | **−0.017993** |
23+
| AP-lift bootstrap mean | −0.023651 |
24+
| **AP-lift 95% CI** | **[−0.070026, −0.001895]** (excludes 0, below) |
25+
| p(lift ≤ 0), one-sided | 0.994 |
26+
| bootstrap | decision-point cluster by `anchor_b`, 2000 resamples, 2071 groups |
27+
| ROC-AUC enriched (secondary) | 0.9252 |
28+
| `corr(exclusivity, cleanliness)` (train) | **0.804** |
29+
| `exclusivity` standardized weight | +0.1142 (`cleanliness` +0.1502 still leads) |
30+
| n_candidates / n_train / n_test | 86244 / 61368 / 24852 |
31+
| rows excluded (endpoint beyond data / not reconstructible) | 0 / 0 |
32+
| `exclusivity` dist | mean 0.275, std 0.345, frac@0 0.497, frac@1 0.093 |
33+
34+
**Parity gate (proves the nested baseline IS the current model):** `ap_baseline_stage2` =
35+
**0.056737** = the Stage-2 headline **0.0567**; `n_test_positives` = **65**, matching Stage-2.
36+
*Spec-reconciliation:* the pre-run note "n_candidates ≈ 24852" was a label mix-up — **24852 = n_test**;
37+
the full candidate universe is **86244** (= Stage-2's universe). Substantive parity holds.
38+
`rows_excluded = 0` confirms every row reconstructs causally (no look-ahead, no endpoint dropped).
39+
40+
**Context cells (underpowered, never refuted — E3 power floor ≥10 positives):**
41+
42+
| TF | test pos | AP base | AP enr | lift | note |
43+
|---|---|---|---|---|---|
44+
| 1M | 5 | 0.2636 | 0.2789 | +0.0153 | underpowered; corr 0.878 |
45+
| 1w | 0 |||| no positives |
46+
| 1d | 7 | 0.1617 | 0.1599 | −0.0018 | underpowered; corr 0.808 |
47+
48+
Context is reported for completeness only; the verdict rests solely on the 4h powered cell (E3/E4).
49+
50+
## Inferred (interpretation — not measured)
51+
52+
- **The locked `exclusivity` definition does not enrich the current model.** A negative powered lift
53+
with CI excluding 0 means the 6th feature does not help and, as fit, costs net OOS ranking power.
54+
- **Most likely mechanism (reported per E1, *not* a reason to discount the verdict): collinearity.**
55+
`corr(exclusivity, cleanliness) = 0.804` on train — `exclusivity` is largely a `cleanliness` proxy.
56+
Adding a near-collinear, noisier regressor on only 65 test positives plausibly inflates variance and
57+
drags pooled test AP down. This is a *mechanism*, not grounds to soften the blind result.
58+
- **North-star read:** this closes the per-leg-feature line cleanly. The locked honest prior was low
59+
(four per-leg features already ~0 at k=3; only `cleanliness` stuck); the shot confirms the per-leg
60+
approach has hit its ceiling on this corpus. The binding constraint is now **data, not features**.
61+
- **Lock-routing nuance:** E8 pre-committed `no_enrichment_signal → grow the facit`. The realized
62+
branch is `enriched_worse_check`, whose *substantive* north-star implication is the same (per-leg
63+
features do not beat Stage-2 → grow the facit), but the **direction choice is not pre-committed** for
64+
this branch — it is surfaced to the user (see handoff Next Step).
65+
66+
## Unverified (open — would need a NEW lock)
67+
68+
- Whether a **decorrelated / residualized** exclusivity (orthogonalized vs `cleanliness`) carries any
69+
orthogonal signal. This is a **different feature needing its own Commit-1 lock**, not a continuation
70+
of this one — and the prior is **low** (if the 0.80-collinear version's residual hurt here, the
71+
orthogonal component is small). Not a free natural next step; reopening a closed line.
72+
- The `cleanliness`-as-genuine-signal crux stays **OPEN** (E7) — this shot does not resolve it.
73+
- Absolute reproduction of human selection remains capped by the ~0.83 coverage ceiling (E7); no
74+
edge / behaviour / PnL / Genesis / auto-fib-as-truth / label-mutation claim is made or implied.
75+
76+
## Artifacts
77+
78+
- Summary JSON: `experiments/review/fib_selection_learning/enrich/summary.json` (**gitignored**).
79+
- Per-cell checkpoints: `experiments/review/fib_selection_learning/enrich/cells/*.json` (**gitignored**).
80+
- Harness + tests: `selection_learning_enrich.py`,
81+
`tests/research/test_selection_learning_enrich.py` (commit `c80acb0`; gates green — 601 pass, 74% cov).

0 commit comments

Comments
 (0)