Skip to content

Commit 1573b56

Browse files
JohnCCarterclaude
andcommitted
feat(research): Fib cleanliness artifact-probe BUILD+RUN → inflationary artifact not supported (investigate)
Commit 2 of the artifact LOCK (b533385), executed verbatim. Cheap-first track B: two existing-data contrasts decomposing the cleanliness-artifact crux. No matched-null, no new candidate universe (gated A8). The lock was NOT changed. 4h primary (both contrasts powered; fidelity OK, reached=0.860 ~ Stage-2 0.83): - Surfacing: reached legs LESS clean than unreached (gap -0.0557, CI [-0.1150, -0.00095] excludes 0 below) -> locked guard inverse_surfacing (marginal). - Snapping: snapping to detector pivots LOWERS cleanliness (gap -0.0219, CI [-0.0320, -0.0102]) -> locked guard snapping_deflates. Both guards point AGAINST the inflationary detector-artifact hypothesis, but: - NOT artifact_risk_reduced (both CIs exclude 0, not include). - snapping flips sign on 1d (+0.0222, detector_artifact_supported context) -> TF-dependent, investigate, no sign/positive claim. Combined: A7 did NOT pre-register a powered direction-guard outcome. No new combined verdict assigned; harness emits a descriptive meta: status (NOT inconclusive_ underpowered, which would be a misnomer since the cells are powered). The two locked per-contrast direction-guards are reported verbatim as "investigate, not a finding". New research/selection_learning_artifact.py (own --artifact CLI; no code into byte-capped selection_learning.py) + 13 tests. Quarter-block bootstrap (detector-free, seed 20260618), degenerate resamples skipped. Frozen-data parity (no --refresh). Diagnostic; crux stays OPEN; no reproduction/edge/behaviour/Genesis/1H/ETH claim. Gates green: ruff + format + 581 pytest (cov 74.39%) + bounds. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1 parent b533385 commit 1573b56

4 files changed

Lines changed: 929 additions & 14 deletions

File tree

docs/research_wiki/handoff.md

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -133,20 +133,25 @@ legs/ranges* (labels = facit; **no edge/behaviour/backtest/PnL/Genesis/auto-fib
133133
single open CRUX (is `cleanliness` a genuine signal or a **detector/anchoring artifact**?). Frames
134134
the next-step choice A (exclusivity / artifact diagnostic) / B (detector-independent anchor-probe) /
135135
C (pause + theory) — **none started**. [Checkpoint](reviews/btc-fib-selection-learning-checkpoint-20260624.md).
136-
- **2026-06-24 Fib SELECTION-LEARNING `cleanliness` artifact-probe — LOCKED (Commit 1, docs-only),
137-
RUN PENDING separate GO.** Cheap-first scope of track B (chosen after a design-only feasibility
138-
check): tests the open crux — is the Stage-2 `cleanliness` lead genuine or a detector/anchoring
139-
artifact — on **existing facit data, no new candidate universe**. Two contrasts, blind-locked in the
140-
[artifact LOCK](reviews/btc-fib-selection-learning-artifact-lock-20260624.md): (1) **surfacing** =
141-
reached-vs-unreached human-leg cleanliness (ALL 365 4h legs, exact anchors, Stage-2 ε-reconstruction
142-
split — unreached are the signal, not filtered); (2) **snapping** = exact-vs-snapped paired contrast.
143-
Quarter-block bootstrap, verdict {`detector_surfacing_artifact` / `no_surfacing_artifact` /
144-
`snapping_inflates_cleanliness` / `no_snapping_inflation``artifact_risk_reduced` vs
145-
`detector_artifact_supported`}. **Matched-null / new universe NOT built** — gated optional rung,
146-
only if surfacing artifact is found AND needs quantifying, and then only behind its own separate
147-
blind lock (A8). **Diagnostic; `artifact_risk_reduced` ≠ "cleanliness proven human intuition"; no
148-
reproduction/edge/behaviour claim.** Commit 2 (build+run) needs a separate GO and new module
149-
`selection_learning_artifact.py`.
136+
- **2026-06-24 Fib SELECTION-LEARNING `cleanliness` artifact-probe — BUILT + RUN → inflationary
137+
artifact NOT supported on 4h, but marginal/non-replicating → "investigate, not a finding".**
138+
Commit 2 of the [artifact LOCK](reviews/btc-fib-selection-learning-artifact-lock-20260624.md)
139+
(`b533385`), executed verbatim. New `research/selection_learning_artifact.py` (+13 tests; own
140+
`--artifact` CLI, no code into byte-capped `selection_learning.py`). Fidelity OK (4h reached
141+
**0.860**, reproduces Stage-2 ~0.83). **Surfacing:** reached legs *less* clean than unreached (gap
142+
**−0.0557**, CI [−0.1150, −0.00095] excludes 0 below) → locked guard **`inverse_surfacing`** (marginal:
143+
CI upper −0.00095). **Snapping:** snapping to detector pivots *lowers* cleanliness (gap **−0.0219**,
144+
CI [−0.0320, −0.0102]) → locked guard **`snapping_deflates`**. Both guards point **against** the
145+
inflationary detector-artifact hypothesis — but it is **NOT `artifact_risk_reduced`** (both CIs
146+
*exclude* 0, not include) and the snapping effect **flips sign on 1d** (+0.0222, `detector_artifact_
147+
supported` context) → **TF-dependent, investigate, no sign/positive claim.** **Combined: A7 did not
148+
pre-register a powered direction-guard outcome → no new combined verdict; harness emits a descriptive
149+
`meta:` status (NOT `inconclusive_underpowered`, the cells are powered). The lock was NOT changed.**
150+
Matched-null / new universe **NOT built** (gated, A8). Crux stays OPEN, sharper investigate-target
151+
(why reached/snapped legs are less clean; why snapping flips sign by TF). No reproduction/edge/
152+
behaviour/Genesis/1H/ETH. [Results](reviews/btc-fib-selection-learning-artifact-results-20260624.md);
153+
summary + `cells/*.json` gitignored/regenerable. Re-run (deterministic, frozen data, no `--refresh`):
154+
`PYTHONUNBUFFERED=1 uv run --no-sync python -u -m fibengine.research.selection_learning_artifact --artifact`.
150155

151156
**Next work requires a separate explicit GO. No W/gap, no Stage 1, no new sensitivity, and no Genesis
152157
may be started automatically.** Parked (test-only, separate GO): lock the facit-discipline refusal
Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# BTC Fib Selection-Learning — `cleanliness` artifact-probe RESULTS (2026-06-24)
2+
3+
**Lean Fib Research. Research-only. Selection learning — NOT a behaviour/edge claim, no
4+
backtest/PnL, no Genesis, no auto-fib-as-truth, no label mutation.** First (and only) run of the
5+
cheap-first `cleanliness` artifact-probe, executed exactly per the
6+
[artifact LOCK](btc-fib-selection-learning-artifact-lock-20260624.md) (`b533385`): two existing-data
7+
contrasts (surfacing A2, snapping A3), quarter-block bootstrap (A5), **per-contrast verdict rules A7
8+
fixed blind before any number existed**. Tests the open crux from the
9+
[campaign checkpoint](btc-fib-selection-learning-checkpoint-20260624.md). **No matched-null, no new
10+
candidate universe** (gated, A8). Builds on the [Stage-2 lead](btc-fib-selection-learning-results-20260618.md),
11+
[W-gap](btc-fib-selection-learning-w-gap-results-20260623.md), and
12+
[Stage-1](btc-fib-selection-learning-stage1-results-20260624.md).
13+
14+
> **STATUS — the inflationary detector-artifact hypothesis gets NO support on the 4h primary; both
15+
> contrasts point the OTHER way, but marginally / non-replicating → "investigate, not a finding".**
16+
> Fidelity holds (reached = **0.860**, in band, reproduces the Stage-2 ~0.83 split). **Surfacing:**
17+
> reached legs are *less* clean than unreached (gap **−0.0557**, 95% CI **[−0.1150, −0.00095]** excludes
18+
> 0 **below**, `p(gap≤0)=0.977`) → the locked direction guard **`inverse_surfacing`***investigate,
19+
> not a finding* (and **marginal**: CI upper −0.00095, a hair under 0, the mirror image of the
20+
> W-gap k=12 +0.0004 caveat). **Snapping:** snapping to detector pivots *lowers* cleanliness (gap
21+
> **−0.0219**, 95% CI **[−0.0320, −0.0102]**, `p=1.0`) → the locked direction guard
22+
> **`snapping_deflates`***investigate, not a finding*. Both guards point **against** the
23+
> inflationary artifact (which predicted reached>unreached and snapped>exact). **But this is NOT
24+
> `artifact_risk_reduced`** — that locked label requires both CIs to *include* 0, which did NOT happen.
25+
> And the snapping effect **does not replicate**: the 1d context cell shows snapping *inflates*
26+
> (+0.0222, CI [0.0007, 0.0492]) — **opposite sign** → TF-dependent / investigate. **No positive claim,
27+
> no lock change, no matched-null.**
28+
29+
## Combined outcome — A7 unregistered (reported verbatim, no new verdict)
30+
31+
**A7 did not pre-register a combined powered direction-guard outcome.** Therefore **no new combined
32+
verdict is assigned.** The correct handling — and the one applied here — is to report the two locked
33+
per-contrast direction guards verbatim as **"investigate, not a finding"**. The harness emits a
34+
**descriptive `meta:` status** (`meta:a7_unregistered_powered_direction_guard`) for this case, **not** a
35+
locked verdict, and explicitly **not** `inconclusive_underpowered` (the cells ARE powered — 314 reached
36+
/ 51 unreached, 2000 effective resamples; that label would be a misnomer). **The lock was not changed.**
37+
38+
## What was built + run
39+
40+
**New module** `src/fibengine/research/selection_learning_artifact.py` (own `--artifact` /
41+
`--artifact-preflight` CLI; **no code added to byte-capped `selection_learning.py`**) + 13 tests. Per
42+
leg: `cleanliness` is the source-bound `core.features._cleanliness` reduced to the endpoints' bar-index
43+
span (A1 — span-only, so inherently causal; the span is pre-`anchor_b`). A leg is **reached** iff both
44+
anchors are ε-reconstructable by the **causal** detector at `anchor_b + k=3` (Stage-2 ε-rule, A2);
45+
**all 365 4h legs are used — unreached are the signal, never filtered.** Snapping (A3) compares
46+
exact-anchor vs ε-matched-detector-pivot cleanliness, **paired, reached only, no imputation**.
47+
Bootstrap = quarter-block (detector-free, A5), 2000 resamples, seed `20260618`, degenerate resamples
48+
skipped (`n_boot_effective` reported). Run on the **frozen** data universe (no `--refresh`; preflight
49+
READY).
50+
51+
## Results — surfacing (coverage) and snapping reported separately (LOCK A2/A3)
52+
53+
**4h (primary, both contrasts powered):** reached = 314/365 = **0.860** (fidelity OK, band [0.75, 0.90]).
54+
55+
| contrast | gap | 95% CI | p(gap≤0) | n | locked per-contrast verdict |
56+
|----------|----:|--------|---------:|---|-----------------------------|
57+
| **surfacing** (clean reached − unreached) | **−0.0557** | [−0.1150, **−0.00095**] | 0.977 | 314 / 51 | **`inverse_surfacing`** — direction guard, *investigate* |
58+
| **snapping** (snapped − exact, paired) | **−0.0219** | [−0.0320, −0.0102] | 1.000 | 314 pairs, 0 dropped | **`snapping_deflates`** — direction guard, *investigate* |
59+
60+
- mean cleanliness: reached **0.743** vs unreached **0.799** (reached legs are *less* clean).
61+
- Both guards fire **against** the inflationary artifact; the surfacing one is **marginal** (CI upper
62+
−0.00095).
63+
64+
**Context (1M/1w/1d, k=3 — surfacing underpowered everywhere; snapping powered on 1w/1d):**
65+
66+
| TF | reached_frac | surfacing | snapping gap | snapping CI | snapping verdict | cell |
67+
|----|----:|-----------|----:|--------|------------------|------|
68+
| 1M | 1.000 (9 legs) | underpowered ||| underpowered | inconclusive_underpowered |
69+
| 1w | 0.905 (19/2) | underpowered | −0.0057 | [−0.0252, +0.0076] | `no_snapping_inflation` | inconclusive_underpowered |
70+
| 1d | 0.896 (60/7) | underpowered (7<10) | **+0.0222** | [+0.0007, +0.0492] | **`snapping_inflates_cleanliness`** | `detector_artifact_supported` |
71+
72+
- **The snapping effect does not replicate across TFs:** 4h `snapping_deflates` (−0.022) vs 1d
73+
`snapping_inflates_cleanliness` (+0.022) — **opposite sign**. A robust structural fact would hold
74+
sign; the flip says the effect is **TF-dependent / noisy**, which is exactly what the direction guard
75+
("investigate, not a finding") is for. The 1d cell's `detector_artifact_supported` is a **context**
76+
reading, not the primary, and the cross-TF disagreement means **neither sign is claimed**.
77+
- 1M/1w surfacing carry no inferential weight (0 / 2 unreached legs).
78+
79+
## Interpretation (honest; no positive claim)
80+
81+
- The Stage-2 `cleanliness` lead being a **detector-inflation** artifact would require the detector to
82+
**surface** cleaner human legs (reached>unreached) and/or **snapping** to **raise** cleanliness
83+
(snapped>exact). On the 4h primary **both go the opposite way.** So the simple "the detector inflates
84+
cleanliness" story gets **no support** on the powered primary cell.
85+
- **This is not a clean win for "genuine signal."** (1) It is **not** `artifact_risk_reduced` — both CIs
86+
*exclude* 0 (the direction guards), not include it. (2) The reversals are **marginal** (surfacing CI
87+
upper −0.00095) or **non-replicating** (snapping flips sign on 1d). (3) The guards were locked
88+
precisely as **"investigate, not a finding"** for this reason. Mechanically plausible: the detector
89+
reconstructs **larger/longer** swings (more intermediate retracement → lower cleanliness), and
90+
snapping **extends** the span to fuller extremes (more path) — both *deflate* cleanliness, neither
91+
proves anything about human selection.
92+
- **Net:** the result **weakens** the inflationary-artifact hypothesis but **claims nothing positive**;
93+
the crux stays **open**, now with a sharper investigate-target (why reached/snapped legs are *less*
94+
clean, and why snapping flips sign by TF).
95+
96+
## Build-time resolution (documented; not a locked decision point)
97+
98+
**Anchor kind** (low/high expected at each anchor) is taken from the leg `direction` when present, else
99+
derived from the price order (`anchor_b_price ≥ anchor_a_price ⇒ a=low, b=high`) — the `direction`
100+
sidecar field can be empty. This only affects which detector-pivot kind an anchor may ε-match; a wrong
101+
assignment would *reduce* matches (lower reached) → conservative, cannot manufacture an artifact signal.
102+
103+
## Observed / Inferred / Unverified
104+
105+
- **Observed (verified):** the numbers above; 4h reached 0.860 (fidelity OK); surfacing gap −0.0557 CI
106+
[−0.1150, −0.00095]; snapping gap −0.0219 CI [−0.0320, −0.0102]; 1d snapping +0.0222 CI [+0.0007,
107+
+0.0492] (opposite sign); both 4h contrasts powered (314/51), 2000 effective resamples;
108+
cleanliness span-only/causal; 13 unit tests green; run deterministic, resume-safe.
109+
- **Inferred (scoped to 4h / these contrasts):** the inflationary detector-artifact (surfacing/snapping
110+
mechanically raising cleanliness for human-matched legs) is **not supported** on the powered primary —
111+
both mechanisms point against it.
112+
- **Unverified / scope limits (do not claim past these):**
113+
1. **Not `artifact_risk_reduced`** — both CIs exclude 0 (direction guards), not include it.
114+
2. **Marginal / non-replicating** — surfacing CI upper −0.00095; snapping flips sign 4h↔1d →
115+
TF-dependent, investigate, **no sign claimed**.
116+
3. A7 has **no registered combined label** for this case → **no new combined verdict**; the binding
117+
reading is the two per-contrast guards (`meta:` status, not a verdict).
118+
4. The broader "is human-leg cleanliness special vs a matched non-human swing" question is **out of
119+
scope** (matched-null gated, A8 — **not built**, and would need its own separate blind lock).
120+
121+
## Non-claims (LOCK A9 binding)
122+
123+
Not a reproduction of human selection. **No edge / behaviour / PnL / backtest / strategy claim.** This
124+
result does **not** prove `cleanliness` is "human intuition"; it only **weakens one specific mechanical
125+
explanation** (detector inflation) on one powered cell, without a clean verdict. The `cleanliness`-as-
126+
genuine-signal question stays **OPEN**. No Genesis, no auto-fib-as-truth, no label/corpus mutation, no
127+
1H, no ETH, no `data.fetch --refresh` (frozen-data parity — same universe as Stage-2 / W-gap / Stage-1).
128+
**The lock was not changed.**
129+
130+
## Discipline honoured
131+
132+
Per-contrast verdict rules A7 fixed **blind** in the 2026-06-24 lock (`b533385`) before any number was
133+
computed; applied verbatim. The A7-unregistered combined case is reported as a **descriptive `meta:`
134+
status**, not relabelled and not invented as a new locked verdict — **the lock was not changed**.
135+
Frozen-data parity held (no `--refresh`; preflight READY). Coverage (reached) reported separately from
136+
both contrasts. Matched-null **not built** (gated, A8). Artifacts
137+
(`experiments/review/fib_selection_learning/artifact/summary.json` + `…/cells/*.json`) are
138+
**gitignored**, regenerable.
139+
140+
> On the 4h primary, the "detector inflates cleanliness" artifact gets **no support** — surfacing and
141+
> snapping both point the other way — but **marginally** (surfacing) and **non-replicating** (snapping
142+
> flips sign on 1d), so per the locked direction guards this is **"investigate, not a finding"**, **not**
143+
> `artifact_risk_reduced`, and **no positive claim**. The crux stays open. No lock change, no
144+
> matched-null, no edge/behaviour claim.

0 commit comments

Comments
 (0)