docs(research): update campaign checkpoint with cleanliness artifact-probe result

JohnCCarter · claude · JohnCCarter · commit 5c33c8be2dc3 · 2026-06-24T13:15:32.000+02:00
Adds verdict-chain row 6 (artifact-probe, 1573b56) and folds the result into the CRUX section: the inflationary detector-artifact got NO support on the 4h primary (both contrasts point against — inverse_surfacing + snapping_deflates), but this is NOT artifact_risk_reduced (both CIs exclude 0) and the snapping reversal does not replicate (flips to inflation on 1d) → "investigate, not a finding". Crux stays OPEN, now narrowed (inflationary mechanism unsupported) with a sharper investigate-target. Reframes the next-step section: cheap-first B is DONE; the matched-null stays UNJUSTIFIED (A8 gate not met) and would need its own blind lock. No track recommended, no new track started. Consolidation only — no new positive claim, no lock change, no matched-null, no new universe, no Genesis/1H/ETH, holds all non-claims. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
diff --git a/docs/research_wiki/reviews/btc-fib-selection-learning-checkpoint-20260624.md b/docs/research_wiki/reviews/btc-fib-selection-learning-checkpoint-20260624.md
@@ -2,10 +2,15 @@
 
 **Lean Fib Research. Research-only. Selection learning — NOT a behaviour/edge claim, no
 backtest/PnL, no Genesis, no auto-fib-as-truth, no label mutation.** This doc **locks what the
-selection-learning line has actually established** across its five committed runs, names the **single
-open crux**, and makes the **next-step choice (A / B / C) well-posed**. It starts **no** new track,
-authorises **no** code, run, or build, and adds **no** new positive claim. Decision needs a separate
-explicit GO.
+selection-learning line has actually established** across its committed runs and names the **single
+open crux**. It starts **no** new track, authorises **no** code, run, or build, and adds **no** new
+positive claim. Decision needs a separate explicit GO.
+
+> **Update 2026-06-24 (artifact-probe ran).** The cheap-first scope of track B (the existing-data
+> `cleanliness` artifact-probe) has now been built + run (`1573b56`). It **narrowed but did not close**
+> the crux: the *inflationary* detector-artifact got **no support** on the 4h primary (both contrasts
+> point the other way), but **marginally / non-replicating** → "investigate, not a finding", **not**
+> `artifact_risk_reduced`. The crux stays **OPEN**. See the verdict chain row 6 + the CRUX section.
 
 ## Scope of this line (unchanged since the 2026-06-17 prereg)
 
@@ -23,6 +28,7 @@ explicit GO.
 | 3 | k-sweep {0,3,6,12} (06-18) | **`k_stable_live_selection_signal`** | k=3/6/12 all survive; k=0 degenerate | `ea6c2ea` |
 | 4 | W-gap causal-availability (06-23) | **`no_causal_gap`** | gap(k=3) **−0.0045**, CI [−0.070, +0.031] incl. 0 | `b515b08`↩`61c41d1` |
 | 5 | Stage-1 per-pivot (06-24) | **`no_pivot_signal_above_prominence`** | recall **0.902**; ranking lift +0.0228, CI [−0.035, +0.079] incl. 0 | `b515b08` |
+| 6 | `cleanliness` artifact-probe (06-24) | direction guards **`inverse_surfacing`** + **`snapping_deflates`** (A7-unregistered combined → `meta:` status, NOT a verdict) | surfacing gap **−0.0557** CI [−0.1150, −0.00095]; snapping gap **−0.0219** CI [−0.0320, −0.0102]; both exclude 0 **below** | `1573b56` |
 
 ## What we KNOW (positive, scoped to 4h + the frozen eight features)
 
@@ -52,32 +58,49 @@ explicit GO.
 The entire pipeline is **conditioned on the detector's pivot universe**, and `cleanliness` is computed
 on **detector-defined legs**. If the detector preferentially surfaces or anchors clean/efficient legs,
 then "human legs are cleaner" is partly **mechanical** — baked in by candidate generation, not a fact
-about human choice. This question has been flagged at **every** stage (06-18 sensitivity, k-sweep,
-W-gap, Stage-1) and **deliberately left OPEN**; nothing run so far can resolve it, because every run
-lives inside the same detector frame.
+about human choice.
+
+**What the artifact-probe settled, and what it did not** ([results](btc-fib-selection-learning-artifact-results-20260624.md),
+LOCK `b533385`): the cheap-first probe tested the two mechanisms of the *inflationary* version of this
+artifact on existing facit data. On the 4h primary (both contrasts powered, fidelity OK reached 0.860):
+
+- **Surfacing** — reached legs are *less* clean than unreached (gap −0.0557, CI excludes 0 **below**) →
+  guard `inverse_surfacing`. The detector does **not** surface cleaner human legs — if anything the
+  reverse (marginal: CI upper −0.00095).
+- **Snapping** — snapping anchors to detector pivots *lowers* cleanliness (gap −0.0219, CI excludes 0
+  **below**) → guard `snapping_deflates`. Snapping does **not** inflate it.
+
+So the simple **"the detector inflates cleanliness"** story gets **no support** on the powered primary.
+**But the crux stays OPEN:** (1) this is **not** `artifact_risk_reduced` (both CIs *exclude* 0, the
+direction guards, not include); (2) the surfacing reversal is **marginal** and the snapping reversal
+**does not replicate** (it flips to *inflation* on the 1d context cell, +0.0222) → TF-dependent,
+**"investigate, not a finding"**; (3) the broader "is `cleanliness` special vs a matched non-human
+swing" question is **out of scope** (matched-null gated, A8 — **not built**; its gate condition
+`detector_artifact_supported` on the primary was **not** met, so it stays unjustified). The crux is now
+**narrowed** (the inflationary mechanism is unsupported) with a sharper investigate-target: *why
+reached/snapped legs are less clean, and why snapping flips sign by TF.*
 
 **Secondary loose end:** set-level **`exclusivity`** (`k*=3`) was specced in the
 [§12 addendum](btc-fib-selection-learning-addendum-20260618.md) but the Stage-2 live whitelist actually
 built was `{magnitude, cleanliness, duration, prominence, structure_alignment}` — **exclusivity was
 never implemented or run.** It is an unfinished feature, not a result.
 
-## The next-step choice — A / B / C (all roads lead to the crux)
-
-| | Track | What it does | Decides the crux? | Main risk |
-|--|-------|--------------|:-----------------:|-----------|
-| **A** | set-level exclusivity / `cleanliness`-artifact diagnostic | stays in the detector frame: (i) build the unbuilt `exclusivity` feature, does `cleanliness` survive its inclusion; (ii) a diagnostic separating "human prefers clean legs" from "detector surfaces clean legs" | **partly** | within-detector-frame circularity may be **irreducible** — a within-frame test may not fully break the circle |
-| **B** | detector-independent anchor-probe | removes the detector as candidate-generator and tests whether the anchor / `cleanliness` signal survives **without** the suspected source of circularity | **most directly** | requires **inventing a detector-independent frame** → new arbitrary choices (validity-over-convenience hazard); MUST be locked blind |
-| **C** | pause + write "current theory of human fib selection" | synthesis-only; consolidates KNOW/crux into a stated theory, defers the empirical crux | **no** (defers) | leaves the artifact question open; risks theorizing past the evidence |
-
-- **A and B are two angles on the SAME crux** (cleanliness-as-artifact / detector-circularity); **C
-  defers it.** Whichever empirical track is chosen, its verdict rule must be **locked blind before any
-  build** (two-commit gate, as with W-gap and Stage-1), and must respect validity-over-convenience —
-  no quietly-chosen control or frame.
-- **Tentative lean (not a decision — the GO is the user's next turn):** the crux *is* the artifact
-  question, and only **B** attacks its suspected source head-on; **A**'s within-frame test may not
-  break the circularity and **C** defers it. So **B, but only behind a tight blind lock** — and if a
-  detector-independent frame cannot be defined without arbitrary convenience choices, **fall back to
-  A**'s narrow cleanliness-artifact diagnostic. Recommend: **this checkpoint first, then A or B.**
+## The next-step choice — where it stands after the artifact-probe
+
+The original A/B/C framing has partly resolved: **cheap-first B (the existing-data `cleanliness`
+artifact-probe) is DONE** (row 6). The remaining candidate doors — **none started, none authorised:**
+
+| | Track | Status / what it would do | Main risk |
+|--|-------|---------------------------|-----------|
+| **(i)** | **investigate** the artifact-probe's own finding | on existing data: why reached/snapped legs are *less* clean, and why snapping flips sign 4h↔1d (mechanical hypothesis: detector reconstructs larger/longer swings; snapping extends spans → more path) | low — descriptive, detector-frame |
+| **(ii)** | gated **matched-null** / detector-independent universe | **stays UNJUSTIFIED** — its A8 gate (`detector_artifact_supported` on the primary) was **not** met; would need its **own** separate blind lock and risks inventing an arbitrary frame | high (validity-over-convenience) |
+| **A′** | set-level **`exclusivity`** feature (the unbuilt loose end) | build the specced-but-unimplemented feature; orthogonal to the artifact crux | low–medium |
+| **C** | pause + write **"current theory of human fib selection"** | synthesis-only; consolidate KNOW + the now-narrowed crux; defers further empirics | risks theorizing past the evidence |
+
+- Any empirical track must be **locked blind before any build** (two-commit gate, as with W-gap /
+  Stage-1 / the artifact-probe) and respect validity-over-convenience — no quietly-chosen control or
+  frame. **The matched-null specifically may not be built without meeting its A8 gate AND a new lock.**
+- **No track is recommended here** — this is consolidation. The GO is the user's next turn.
 
 ## Non-claims (binding — carried from every prior lock)