|
| 1 | +# BTC Fib Selection-Learning — `cleanliness` artifact-probe LOCK (2026-06-24) |
| 2 | + |
| 3 | +**DOCS-ONLY. Authorises no code, no run, no build, no dependency, no matched-null, no new candidate |
| 4 | +universe, no label/corpus change, no push.** This is the **blind Commit-1 lock** for the |
| 5 | +**cleanliness artifact-probe** — the cheap-first, existing-data diagnostic that interrogates the |
| 6 | +single open CRUX named in the |
| 7 | +[campaign checkpoint](btc-fib-selection-learning-checkpoint-20260624.md). It is **not** a new prereg |
| 8 | +line: it tests an **already-produced** result (the Stage-2 `cleanliness` lead) for a measurement |
| 9 | +defect, blind to any artifact-probe output. Execution needs a **separate explicit GO** (Commit 2). |
| 10 | + |
| 11 | +**Blindness attestation:** no artifact-probe harness exists; **no reached/unreached cleanliness mean, |
| 12 | +no snapping gap, no CI has ever been computed or seen.** Every rule below is fixed from the campaign |
| 13 | +locks, the frozen config, and existing code — not from any artifact-probe result. |
| 14 | + |
| 15 | +## A0. Question + role (binding framing) |
| 16 | + |
| 17 | +> **Is the Stage-2 `cleanliness` lead a genuine human leg-selection signal, or a detection / anchoring |
| 18 | +> artifact?** |
| 19 | +
|
| 20 | +The campaign established (checkpoint) that the modest 4h selection lead is carried almost entirely by |
| 21 | +leg `cleanliness`, lives in the leg gestalt, is live-available (`no_causal_gap`) and not a |
| 22 | +coverage/pivot problem. The **one open crux** is whether `cleanliness` is partly **mechanical** — |
| 23 | +because the whole pipeline is conditioned on the detector's pivot universe and `cleanliness` is |
| 24 | +computed on detector-defined legs. This probe decomposes that crux into its **two** mechanisms and |
| 25 | +tests **both on existing facit data, with no new candidate universe**: |
| 26 | + |
| 27 | +1. **Surfacing-bias** — does the detector preferentially *surface* cleaner human legs? |
| 28 | +2. **Snapping-bias** — does *snapping* a human anchor to the nearest detector pivot mechanically |
| 29 | + *raise* the measured `cleanliness`? |
| 30 | + |
| 31 | +This is a **diagnostic, not a headline**; it adds **no positive claim** (A9). |
| 32 | + |
| 33 | +## A1. `cleanliness` formula (locked — source-bound, NOT redefined) |
| 34 | + |
| 35 | +The engine's existing feature, used verbatim by Stage-2 |
| 36 | +([`core/features.py::_cleanliness`](../../../src/fibengine/core/features.py)): |
| 37 | + |
| 38 | +``` |
| 39 | +cleanliness(span [lo,hi]) = |close[hi] − close[lo]| / Σ_{i∈(lo,hi]} |close[i] − close[i−1]| |
| 40 | +``` |
| 41 | + |
| 42 | +(net close-move ÷ total close-path; `1.0` for `<2` bars or zero path). **Key structural fact, locked: |
| 43 | +`cleanliness` depends ONLY on the endpoints' bar-index span `[lo,hi]` (close prices over that span) — |
| 44 | +NOT on the anchor's click-price.** Therefore: |
| 45 | + |
| 46 | +- The exact-vs-snapped contrast (A3) is a **pure index-span** contrast (human bar indices vs detector |
| 47 | + pivot bar indices). |
| 48 | +- Because every leg's span `[lo,hi] ⊆` (… `anchor_b`], i.e. fully **before** the decision point, |
| 49 | + `cleanliness` is **inherently causal** — the `k=3` truncation (A4) cannot change it. No hindsight is |
| 50 | + introduced by either contrast. This lock does **not** redefine the formula; it pins which one is used. |
| 51 | + |
| 52 | +## A2. reached / unreached definition (locked — Stage-2 ε-reconstruction, ALL legs) |
| 53 | + |
| 54 | +- **Corpus = ALL human 4h legs** (the 365 `fib_*.json` source legs; facit-discipline, human-only |
| 55 | + sidecars via [`load_human_legs`](../../../src/fibengine/research/selection_learning.py)). **Unreached |
| 56 | + legs are NOT filtered out — they are the signal** (A0.1). |
| 57 | +- A leg is **reached** iff **BOTH** its anchors are ε-reconstructable by the detector under the **exact |
| 58 | + Stage-2 rule**: each human anchor has a **causally detected** pivot of matching kind within **ε** |
| 59 | + (`time_tol = 3` bars, `price_tol = 0.5` × causal ATR; A4 of the campaign), where detection is |
| 60 | + [`detect_pivots`](../../../src/fibengine/pivots/detect.py) on the frame **truncated at `anchor_b + |
| 61 | + k`** with the frozen config (`fractal_n=1, lookback=3, min_prominence_atr=0.5`), `k = 3` (the Stage-2 |
| 62 | + headline cell). **Unreached** = at least one anchor not ε-reconstructable. This reproduces the |
| 63 | + Stage-2 ~0.83 leg-reachability split (expected ≈ 62 unreached on 4h). |
| 64 | +- **Primary = causal `k=3` detection** (parity with the pipeline that produced the lead). Full-frame |
| 65 | + detection is a **named sensitivity only**, not the verdict basis. |
| 66 | + |
| 67 | +## A3. exact-vs-snapped definition (locked — paired, reached legs only, no imputation) |
| 68 | + |
| 69 | +For each **reached** leg: |
| 70 | + |
| 71 | +- **exact-anchor cleanliness** = `cleanliness` over the span `[idx(anchor_a), idx(anchor_b)]`, where |
| 72 | + `idx(·)` is the human anchor's bar index ([`_pos_of_ts`](../../../src/fibengine/research/selection_learning.py)). |
| 73 | +- **snapped-anchor cleanliness** = `cleanliness` over `[idx(piv_a), idx(piv_b)]`, where `piv_·` is the |
| 74 | + **ε-matched detector pivot nearest** to each human anchor (tie-break: smallest time distance, then |
| 75 | + smallest price distance; ties logged). |
| 76 | +- **Contrast = `gap_snap = snapped − exact`** (paired, within-leg). |
| 77 | +- **No imputation.** Unreached legs have no snapped endpoints → **excluded from THIS contrast only** |
| 78 | + (they remain in the A2 surfacing contrast). Any reached leg whose snap is ambiguous/degenerate |
| 79 | + (e.g. `piv_a == piv_b`) is **dropped and logged**, never imputed. |
| 80 | + |
| 81 | +## A4. causal computation (locked) |
| 82 | + |
| 83 | +`cleanliness` and detection both computed on the frame **truncated at `anchor_b + k`, `k = 3`**, ATR |
| 84 | +**causal** (trailing Wilder to the decision point). Per A1, truncation is moot for `cleanliness` |
| 85 | +itself (span is pre-`anchor_b`); the lock keeps full parity with Stage-2/W-gap and forbids any |
| 86 | +full-series leakage in the **detection** step (A2). |
| 87 | + |
| 88 | +## A5. Statistic + bootstrap unit (locked — NOT row-level) |
| 89 | + |
| 90 | +- **Surfacing statistic:** `gap_surface = mean(cleanliness | reached) − mean(cleanliness | unreached)`. |
| 91 | +- **Snapping statistic:** `gap_snap = mean(snapped − exact)` over reached legs (paired). |
| 92 | +- **Bootstrap = block bootstrap by CALENDAR QUARTER of `anchor_b`** (detector-free, exogenous unit — |
| 93 | + the A3 structural-chunk used detector pivots and does **not** transfer to a detector-free probe). |
| 94 | + Resample whole quarters with replacement (each quarter carries its reached + unreached legs), |
| 95 | + recompute the statistic, **2000 resamples, seed `20260618`**. Report point estimate, 95% CI, and |
| 96 | + one-sided `p`. **Row-level bootstrap is explicitly rejected** (legs cluster by regime/quarter). |
| 97 | + Month-block is a named sensitivity, not the primary. |
| 98 | + |
| 99 | +## A6. Power floor (locked) |
| 100 | + |
| 101 | +- **Surfacing contrast powered** iff `min(n_reached, n_unreached) ≥ 10` **and** ≥ 3 distinct quarters |
| 102 | + contain an unreached leg (so the block bootstrap is non-degenerate). |
| 103 | +- **Snapping contrast powered** iff `n_reached ≥ 10`. |
| 104 | +- **Expected powered: 4h only.** 1M/1w/1d are **context if underpowered, never refuted** (too few |
| 105 | + unreached legs). |
| 106 | + |
| 107 | +## A7. Verdict rules (pre-stated, falsifiable — 4h primary; applied verbatim) |
| 108 | + |
| 109 | +**Surfacing (`gap_surface`, 95% CI):** |
| 110 | +- **`detector_surfacing_artifact`** — CI **excludes 0 ABOVE** (reached significantly cleaner): the |
| 111 | + detector preferentially surfaces cleaner human legs → the lead is **partly** a surfacing artifact. |
| 112 | +- **`no_surfacing_artifact`** — CI **includes 0**: no evidence the detector surfaces cleaner human legs |
| 113 | + → surfacing artifact **not supported** (artifact risk reduced on this axis). |
| 114 | +- **`inverse_surfacing`** (direction guard) — CI **excludes 0 BELOW** (unreached cleaner): unexpected; |
| 115 | + **investigate, not a finding.** |
| 116 | + |
| 117 | +**Snapping (`gap_snap`, 95% CI):** |
| 118 | +- **`snapping_inflates_cleanliness`** — CI **excludes 0 ABOVE**: snapping to detector endpoints |
| 119 | + mechanically raises `cleanliness` → measurement-bias artifact present. |
| 120 | +- **`no_snapping_inflation`** — CI **includes 0**: snapping does not inflate `cleanliness`. |
| 121 | +- **`snapping_deflates`** (direction guard) — CI **excludes 0 BELOW**: **investigate, not a finding.** |
| 122 | + |
| 123 | +**Combined artifact reading (locked):** |
| 124 | +- `no_surfacing_artifact` **AND** `no_snapping_inflation` → **`artifact_risk_reduced`** — the strongest |
| 125 | + non-artifact evidence the cheap probe can give: the `cleanliness` lead is **not explained** by |
| 126 | + detector surfacing or snapping. **This is NOT "cleanliness is proven human intuition"** (A9). |
| 127 | +- **Either** contrast fires its artifact branch → **`detector_artifact_supported`** — the lead is |
| 128 | + **partly mechanical**; report **which half** and by how much. |
| 129 | +- Underpowered → **`inconclusive_underpowered`** (checked first). |
| 130 | + |
| 131 | +## A8. Gate-rule for the matched-null / new candidate universe (locked) |
| 132 | + |
| 133 | +The matched-null (detector-independent leg universe) **may be considered ONLY IF** the cheap probe |
| 134 | +returns **`detector_artifact_supported`** (surfacing artifact present) **AND** quantifying the residual |
| 135 | +is judged necessary. If the probe returns **`artifact_risk_reduced`**, the matched-null is |
| 136 | +**UNJUSTIFIED** — the crux is resolved on the cheap axis and the expensive universe is scope-creep. |
| 137 | +**No matched-null, and no new candidate universe, may be built under this lock.** Any future |
| 138 | +matched-null requires its **own separate blind lock** (own design-check, own prereg). |
| 139 | + |
| 140 | +## A9. Non-claims (binding) |
| 141 | + |
| 142 | +- **Not a reproduction** of human selection. **Not** an edge / behaviour / PnL / backtest / strategy |
| 143 | + claim. No Genesis, no auto-fib-as-truth. |
| 144 | +- **`artifact_risk_reduced` does NOT prove `cleanliness` is "human intuition."** It only narrows the |
| 145 | + artifact risk on **two specific mechanisms** (surfacing, snapping). The broader "is human-leg |
| 146 | + cleanliness special vs any matched non-human swing" question is **out of scope** (gated, A8). |
| 147 | +- Underpowered TFs are **context, not refuted.** |
| 148 | +- **No 1H, no ETH, no label/corpus mutation, no `data.fetch --refresh`** (frozen-data parity — same |
| 149 | + universe as Stage-2 / W-gap / Stage-1). |
| 150 | + |
| 151 | +## A10. Implementation plan (Commit 2 — NOT executed here) |
| 152 | + |
| 153 | +- **New module `src/fibengine/research/selection_learning_artifact.py`** with its **own CLI entry**; |
| 154 | + **no code added to `selection_learning.py`** (byte-capped). Reuse `_cleanliness` (or its exact |
| 155 | + formula), `load_human_legs`, `_pos_of_ts`, `detect_pivots`, `atr`, `load_candles`, the ε constants, |
| 156 | + and the `FROZEN_SNAPSHOT` preflight pattern from `selection_learning_gap.py`. |
| 157 | +- **Tests** `tests/research/test_selection_learning_artifact.py` (reached/unreached split, exact-vs- |
| 158 | + snapped span, quarter-block bootstrap, verdict branches, no-imputation drop/log, k=3 causal parity). |
| 159 | +- **Results doc** later (`btc-fib-selection-learning-artifact-results-YYYYMMDD.md`, Observed / Inferred |
| 160 | + / Unverified). Artifacts under `experiments/review/fib_selection_learning/artifact/` (**gitignored**). |
| 161 | +- **Preflight FIRST**, frozen-data parity, per-cell/contrast checkpoint as needed. |
| 162 | + |
| 163 | +## A11. Why this answers the crux better than a new candidate universe |
| 164 | + |
| 165 | +- **Symmetric, not conservative.** `reached-vs-unreached` is informative in **both** directions |
| 166 | + (≈ → no surfacing artifact; ≫ → artifact), where a matched-null is **asymmetric** (fail-to-reject is |
| 167 | + inconclusive, not non-artifact) because its endogenous swing-validity rule correlates with |
| 168 | + `cleanliness` by construction and raises the null baseline. |
| 169 | +- **Detector-independent MEASUREMENT on existing data.** It uses **exact human anchors** (facit) — no |
| 170 | + new universe, no swing-filter, no `K`, no draw-pool, no arbitrary windowing (the convenience trap the |
| 171 | + B feasibility check flagged). |
| 172 | +- It targets the **exact** two mechanisms in "detection / anchoring artifact": `reached-vs-unreached` |
| 173 | + **is** the surfacing test; `exact-vs-snapped` **is** the anchoring/measurement test. |
| 174 | + |
| 175 | +## A12. What this doc does NOT do |
| 176 | + |
| 177 | +No code, no harness, no build, no run, no dependency, no matched-null, no new candidate universe, no |
| 178 | +label/corpus mutation, no push. Does **not** grant execution — Commit 2 requires a **separate explicit |
| 179 | +GO**, and must **halt and report before code** if any of {`cleanliness` formula, reached/unreached |
| 180 | +definition, exact-vs-snapped definition, bootstrap unit, power floor, verdict rules, matched-null |
| 181 | +gate-rule} is found unclear at build time. |
0 commit comments