docs(research): main-quest reset + north-star guardrail (stop the mechanics drift)

JohnCCarter · claude · JohnCCarter · commit 3ca46e3d9bcb · 2026-06-24T15:15:52.000+02:00
Re-anchor the selection-learning line to its original goal: learn how the human
selects meaningful fib legs/ranges and draws Fib like a human analyst (facit = ground
truth) -- NOT explaining detector/snapping/measurement geometry.

- What directly helps the main quest: the human's leg choice is partly learnable on
  4H (cleaner legs, lift +0.052), live-available, not a detection problem (recall
  ~0.90), lives in the leg/range gestalt -- but agreement is LOW (AP ~0.057 vs ~0.83)
  and thin (one feature), so the model does not yet draw like the human.
- Control/mechanics (prominence-family, k-sweep, W-gap, artifact-probe, mechanics +
  flip) added rigor, not capability -- the mechanics/flip work was the drift.
- PARK: artifact/snapping/net-path mechanics, matched-null / detector-independent
  universe (gate not met), further detector-geometry; exclusivity only if it improves
  selection.
- Next step IF it serves the goal: enrich the selection model toward the human's
  meaningful-leg criteria and measure facit-agreement, behind a blind lock. Else PARK
  modeling and return to the human BTC top-down labeling main quest.

Binding north-star guardrail added to handoff Current Focus + a dedicated reset doc:
every future selection-learning step must first answer "does this improve human-like
leg/range selection vs the facit?" -- if no, do not start it. Docs-only, no claim, no
code/run, no matched-null, no new universe, no Genesis/1H/ETH/refresh.

Co-Authored-By: Claude Opus 4.8 &lt;noreply@anthropic.com&gt;
diff --git a/docs/research_wiki/handoff.md b/docs/research_wiki/handoff.md
@@ -5,6 +5,13 @@ append-only trail lives in [log.md](log.md).
 
 ## Current Focus
 
+> **NORTH STAR (binding — no drift):** the selection-learning line exists to *learn how the human
+> selects meaningful fib legs/ranges and draws Fib like a human analyst (facit = ground truth)* — **not**
+> to explain detector/snapping/measurement geometry. Every selection-learning step must first answer
+> *"does this improve the model's ability to select human-like legs/ranges vs the facit?"* — if no,
+> don't start it; park it. Controls + mechanics are **DONE/PARKED**
+> ([main-quest reset](reviews/btc-fib-selection-learning-main-quest-reset-20260624.md)).
+
 **BTC monthly-first top-down protocol** — re-labeling on BTC/USD only after the
 **2026-06-09 log-scale + profile reset** (prior linear / 0.236 labels archived).
 
@@ -22,6 +29,13 @@ append-only trail lives in [log.md](log.md).
 
 ## Recent Changes
 
+- **2026-06-24 Fib SELECTION-LEARNING — MAIN-QUEST RESET (docs-only).** Stop the mechanics drift,
+  re-anchor to the north star (above). Controls/mechanics (artifact-probe, snapping/net-path mechanics,
+  flip) are **DONE; matched-null / detector-geometry side-quests PARKED.** Next step only if it improves
+  the model's human-like leg/range selection vs facit (behind a blind lock); else park the modeling and
+  return to the human BTC top-down labeling main quest.
+  [Main-quest reset](reviews/btc-fib-selection-learning-main-quest-reset-20260624.md).
+
 - **2026-06-22 Fib SELECTION-LEARNING W-gap study — BUILT + module split, RUN PENDING (home).** Commit 2 of side-quest #1, built to the [W-gap LOCK](reviews/btc-fib-selection-learning-w-gap-lock-20260622.md) (`4f47d8e`): `gap(k)=AP(retro-W)−AP(live-k)` on identical rows, embargo=W, L5 verdict. New `research/selection_learning_gap.py` (+5 tests); W-gap code split out to keep `selection_learning.py` under the §6 size cap (was 995 lines); flushed-stderr `_progress` logging in `build_candidates`+`build_retro_features` so a long run is never blind (result-neutral). **Run NOT executed** — inherent ~2-3h per-endpoint-detect cost on the ~20k-bar 4h frame (leakage-bearing truncation, no legal shortcut); to run at home (see Next tracks). No gap results, no verdict. Commit `884d4c0`, gates green (pytest 549, cov 75%).
 - **2026-06-18 Fib SELECTION-LEARNING k-sweep {0,3,6,12} (4h) → `k_stable_live_selection_signal`.** Mandatory confirmation-buffer sweep (live-only), locked prominence-FAMILY survival rule (powered AND CI excludes 0 vs **every** §6 baseline — magnitude + prominence A/B). **k=0 degenerate** (0 candidates, reachable 0.0, unpowered — *not interpretable*, excluded); **k=3/6/12 all powered and survive** the locked family (`p_one_sided lift≤0 = 0/2000` throughout; lowest CI floor k=12 vs prom-sum 0.025). ≥2 survivors → cross-k verdict **`k_stable_live_selection_signal`**: the lead is **not** a narrow-buffer artifact. **Modest framing holds:** `cleanliness` still dominates (~0.20) at every powered k; at k=12 `scale_confluence` enters at ~0.13 only as a **secondary hint** (causally available there), not a second pillar; AP rises only 0.057→0.066, far under the 0.83 ceiling — **still single-feature, NOT a reproduction, no edge/behaviour/backtest/Genesis claim**; 1M/1w/1d **underpowered, not refuted**. Code+tests `ea6c2ea` (gates green). [Results](reviews/btc-fib-selection-learning-results-20260618.md).
 - **2026-06-18 Fib SELECTION-LEARNING prominence-baseline sensitivity (4h) → `survives_prominence_family`.** Locked pre-run (A=summed endpoint prominence = `prominence` feature col; B=max endpoint prominence) + locked verdict rule. Same universe/viewport/k/ε/split/model — only baseline rule differs. Model AP-lift robust vs **all three** §6 baselines: magnitude [0.023,0.120], prominence-A +0.043 [0.018,0.104], prominence-B +0.049 [0.021,0.116]; every CI excludes 0, 0/2000 ≤ 0. Sanity: prominence baselines beat magnitude (as expected); model beats both. Weights unchanged → **`cleanliness` still carries the lift** (0.20). So the lead is **not** a magnitude- or prominence-artifact — but still single-feature, low absolute AP (0.057 vs 0.83 ceiling), **not a reproduction**, no edge claim; 1M/1w/1d underpowered. Open: is `cleanliness` a detection/anchoring artifact? [Results](reviews/btc-fib-selection-learning-results-20260618.md).
diff --git a/docs/research_wiki/reviews/btc-fib-selection-learning-main-quest-reset-20260624.md b/docs/research_wiki/reviews/btc-fib-selection-learning-main-quest-reset-20260624.md
@@ -0,0 +1,71 @@
+# BTC Fib Selection-Learning — MAIN-QUEST RESET / north-star guardrail (2026-06-24)
+
+**Lean Fib Research. Docs-only, no code/run/claim.** A deliberate stop to the mechanics drift and a
+re-anchor to the original goal. Binding for the whole selection-learning line.
+
+> **NORTH STAR (Chamoun's original idea — binding):** *Get the machine to learn how the human selects
+> meaningful fib legs/ranges and draws Fib like a human analyst, using the human facit as ground
+> truth.* **NOT** explaining detector/snapping/measurement geometry detail.
+
+## 1. What we now know that DIRECTLY helps the main quest
+
+- **The human's leg choice is partly learnable (4H).** Human-marked legs are measurably **cleaner /
+  more efficient**, and a model out-ranks the trivial baselines out-of-sample (Stage-2 lift **+0.052**,
+  CI excludes 0). → there **is** a real learnable selection signal.
+- **It is live-available** (`no_causal_gap`) — a human-like selector would not need hindsight.
+- **It is not a detection problem** (Stage-1 recall ~0.90): the human's anchors are already in the
+  candidate universe; the gap is in **ranking/selecting among candidates** — exactly what a model can
+  improve.
+- **The signal lives in the leg/range gestalt**, not the lone pivot (Stage-1 null) — a human-like
+  selector must model **legs/ranges**, not individual pivots.
+- **But agreement is LOW and thin:** AP ~0.057 vs the ~0.83 reachability ceiling, carried almost
+  entirely by **one feature** (cleanliness). The model does **not** yet draw like the human — it needs
+  a **richer representation of "meaningful."** *(This is the gap that defines the real next step.)*
+
+## 2. What was only control / mechanics (rigor, not new capability)
+
+- Prominence-family sensitivity + k-sweep (robustness), W-gap (hindsight control), the cleanliness
+  artifact-probe (is the lead a detector artifact?), and the mechanics + snapping-flip notes
+  (detector/snapping geometry). **All were necessary rigor or interesting mechanism — none added model
+  capability to pick better legs.** The mechanics/flip work is precisely the drift this reset stops.
+
+## 3. Sidetracks to PARK now
+
+- **Artifact / snapping / net-path mechanics** — PARK (questions answered descriptively; does not help
+  the model pick human-like legs).
+- **Matched-null / detector-independent universe** — PARK (gated, its A8 gate was **not** met, high
+  methodological risk; an artifact-question tool, not a capability-builder).
+- **Set-level `exclusivity`** loose end — revisit **only** if it demonstrably improves leg selection.
+- **Further detector-geometry explanation** — PARK.
+
+## 4. Next step IF the goal is better human-like leg/range selection
+
+- The **only** directly-aligned move: **enrich the selection model toward the human's actual "meaningful
+  leg/range" criteria and measure agreement against the facit** (AP toward the 0.83 ceiling). Concretely
+  — go beyond "cleanest leg" to the multi-component gestalt the prereg already named (scale, pairing,
+  direction, exclusivity, HTF/context) and test whether **facit-agreement rises** — behind a **blind
+  design lock** (forking-paths discipline; same two-commit gate as every prior step).
+- **Precondition:** a concrete feature/representation hypothesis that *plausibly* raises agreement, AND
+  enough facit to fit/validate without overfitting (BTC-only, **365** 4h legs, one analyst, ~0.83
+  ceiling). If that precondition can't be met honestly, see §5.
+
+## 5. If the next step does NOT directly help the model pick better → stop/park
+
+- If we **cannot** specify a richer-feature hypothesis that plausibly raises facit-agreement without
+  forking-paths, **PARK the modeling line** and return to the **actual main quest**: the human BTC
+  top-down fib labeling (`1M → 1w → 1d → 4h`,
+  [protocol](../../BTC_FIRST_TOP_DOWN_FIB_PROTOCOL.md)) — which **is** "draw Fib like a human" and grows
+  the ground-truth corpus the model would learn from. Modeling resumes only with **more labels** or a
+  **concrete capability hypothesis**, never as another control/mechanics pass.
+
+## North-star guardrail (BINDING — no drift)
+
+Every future selection-learning step must answer one question first:
+
+> **"Does this improve the model's ability to select human-like fib legs/ranges, measured against the
+> facit?"**
+
+If the honest answer is **no** — it is a control, a mechanism explanation, or an artifact-geometry
+detail — **do not start it; log it as parked.** Controls and mechanics are **done**. The line either
+**advances selection capability** (§4, behind a blind lock) or it **pauses** (§5). No more
+detector/snapping/measurement-geometry side-quests.