merge: sync origin/main (level_events, human review, AGENTS.md)

JohnCCarter · cursoragent · JohnCCarter · commit 784dc1099d81 · 2026-06-01T16:55:46.000+02:00
Resolve conflicts keeping multi-leg labeling and MTF research.
Fix UTF-8 writes in human review index on Windows.
Grandfather human_review_level_events per REPO_POLICY.

Co-authored-by: Cursor &lt;cursoragent@cursor.com&gt;
diff --git a/.gitignore b/.gitignore
@@ -228,3 +228,5 @@ data/screenshots/
 experiments/results/mtf_*.json
 experiments/results/*_compare_*T*.json
 experiments/runs/
+# Genererade human-review-paket (charts/sheets) är artefakter, inte källa.
+experiments/review/
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,46 @@
+# AGENTS.md
+
+## Cursor Cloud specific instructions
+
+### Product
+
+**fibengine** is a Python research engine for human-like Fibonacci swing selection (Layer A). There is no web server or database—workflows are CLI modules (`experiment`, `backtest`, `labeling`) plus an optional Matplotlib labeling GUI.
+
+### Dependencies (automatic on VM startup)
+
+The update script runs `uv sync --extra dev`, which creates/updates `.venv` from `pyproject.toml` / `uv.lock`. Python **3.11+** is required (CI and local use 3.12).
+
+### Lint / test / build (match CI)
+
+From repo root:
+
+```bash
+uv run ruff check src tests
+uv run ruff format --check src tests
+uv run pytest -q
+uv build
+```
+
+Optional local gate (same hooks as documented in README): `uv run pre-commit run --all-files`.
+
+### Running the main pipeline (hello-world)
+
+1. **Candles** — `uv run python -m fibengine.data.fetch` caches OHLCV under `data/raw/` (gitignored). This needs outbound HTTPS to Binance via CCXT. If the API is blocked in the VM, either request egress for `api.binance.com` or populate `data/raw/` manually before running pipelines that call `load_candles()`.
+2. **Experiment** — `uv run python -m fibengine.experiment` runs swing selection for all human labels in `data/labels/`, writes plots and `metrics.json` under `experiments/runs/experiment/<date>/<run_id>/`, and appends to `experiments/results/leaderboard.jsonl`.
+3. **Labeling worklist** — `uv run python -m fibengine.labeling.worklist` (no network).
+4. **Interactive labeler** — `uv run python -m fibengine.labeling.tool` needs a display/GUI backend (not typical in headless cloud VMs).
+
+### Services
+
+| Component | Required for | Notes |
+|-----------|----------------|-------|
+| `.venv` via `uv sync` | Everything | No Docker Compose in repo |
+| `pytest` | CI / dev | Uses synthetic fixtures; no network |
+| Binance (CCXT) | Live fetch / fresh caches | Optional if `data/raw/` already populated |
+| Matplotlib GUI | `labeling.tool` | Optional |
+
+### Gotchas
+
+- `load_candles(..., fetch_if_missing=True)` will call the exchange when cache is missing—failures look like CCXT `NetworkError` / SSL errors if egress is blocked.
+- Long timeframes use higher `timeframe_limits` in `config/settings.yaml`; labels can be `out_of_window` if history is too short (see experiment logs).
+- Coverage gate is **60%** via pytest `addopts` in `pyproject.toml`; `labeling/tool.py` is omitted from coverage.
diff --git a/README.md b/README.md
@@ -55,6 +55,7 @@ uv run python -m fibengine.backtest.runner --config config/variants/<profil>.yam
 > Obs: automatisk vikt-optimering (Optuna) togs medvetet bort — den optimerade mot
 > de manuella labelsen, vilket bryter mot filosofin (labels = referens, inte domare).
 > Se `premortem/reflections/2026-05-28-remove-optuna.md`. Vikter sätts på principgrund.
+> Arkiverade Optuna-artefakter finns kvar under `archive/` som historik.
 
 ## Pipeline (Lager A)
 
diff --git a/REPO_POLICY.md b/REPO_POLICY.md
@@ -118,6 +118,7 @@ Kör lokalt: `uv run python scripts/check_repo_bounds.py`
 | `src/fibengine/labeling/behavior_facit.py` | ~530 | Dela I/O vs validate — grandfather tills split |
 | `scripts/behavior_facit.py` | ~220 | Tunn CLI-wrapper — grandfather tills split |
 | `scripts/compare_mtf_disambiguation.py` | ~245 | Dela argparse vs report — grandfather tills split |
+| `src/fibengine/research/human_review_level_events.py` | ~610 | Dela pack writer vs runner — grandfather (main PR #11) |
 
 Lägg **inte** till funktioner i grandfathered filer; fixa genom split.
 
diff --git a/archive/INDEX.md b/archive/INDEX.md
@@ -5,4 +5,4 @@
 | 2026-05-28 | `experiments/ledgers/2026-05-28-pre-results-migration/` | `experiments/*.jsonl` (rot) | Flyttad till `experiments/results/`; gamla kopior behålls |
 | 2026-05-28 | `experiments/label_review/*` (se undermappar) | `experiments/label_review/` (rot-dubletter) | Canonical källa: `experiments/label_review/batches/` |
 | 2026-05-28 | — | `FIB_BACKTEST_PLAN.md` (repo-rot) | Raderad stub; canonical: `docs/FIB_BACKTEST_PLAN.md` |
-| 2026-05-28 | — | Optuna (kod, varianter, ledgers, körhistorik) | Raderat — strider mot principen att labels är referens, inte optimeringsmål |
+| 2026-05-28 | `experiments/optuna/`, `config_variants/optuna_2026-05-28_trial31.yaml` | Optuna (kod ur `src/fibengine/tuning/`, variant, ledgers) | Koden borttagen ur drift (strider mot principen labels=referens, inte optimeringsmål); artefakterna behålls här som historik |
diff --git a/docs/LEVEL_EVENTS.md b/docs/LEVEL_EVENTS.md
@@ -0,0 +1,120 @@
+# Fibonacci Level Interaction Events (research-only)
+
+Status: **RESEARCH** — implements issue #8.
+
+Where a swing previously carried a single behavior label per Fibonacci level, this
+overlay records **an event stream per level**: every time price interacts with a level
+it emits a *candidate* event with a timestamp and supporting evidence, for human review.
+
+> To human-validate these candidates on a phone, see
+> [LEVEL_EVENT_HUMAN_REVIEW.md](LEVEL_EVENT_HUMAN_REVIEW.md).
+
+## What it does
+
+For a selected swing, `detect_level_events()` scans the bars **after the leg's end**
+(the retracement window) and, for each Fibonacci level, emits events classified as:
+
+| candidate                | meaning                                                  |
+|--------------------------|----------------------------------------------------------|
+| `continuation_candidate` | broke through the level and continued                    |
+| `rejection_candidate`    | touched the level and rejected back to the approach side |
+| `failure_candidate`      | accepted beyond the level, then reversed back across it  |
+| `reaction_candidate`     | reacted at the level without a clear breakout/rejection  |
+
+Each event records `touch_type` (`wick_below` / `wick_above` / `close_above` /
+`close_below`), `approach_side` (`above` / `below`), and `evidence`
+(`forward_bars`, `closes_beyond`, `closes_back`, `max_penetration_atr`).
+
+## Guardrails
+
+- **Candidates, never facts.** The `*_candidate` naming is deliberate — events are inputs
+  to human review, never auto-accepted.
+- **Look-ahead is intentional.** Classification inspects a forward window of bars after a
+  touch, so this is strictly **post-hoc annotation, never a live trading signal**.
+- **Additive only.** It does not change swing selection, fib anchors/prices, evaluation,
+  recall or promotion. Output goes to a new file; no existing artifacts are mutated.
+
+## Configuration (`config/settings.yaml` → `level_events`)
+
+| key                        | default | meaning                                                       |
+|----------------------------|---------|---------------------------------------------------------------|
+| `levels`                   | `[]`    | fib ratios to scan; empty inherits `fib.levels`               |
+| `touch_tolerance_atr`      | `0.10`  | band half-width around a level = this × ATR at the bar         |
+| `forward_window`           | `5`     | bars after a touch used for classification                    |
+| `acceptance_closes`        | `2`     | closes beyond the level required to count as "accepted"       |
+| `immediate_rejection_bars` | `2`     | window for a quick close back to the approach side            |
+| `debounce_bars`            | `3`     | bars price must leave the band before a new event is counted  |
+
+## Run
+
+```sh
+uv run python -m fibengine.research.level_events                 # single snapshot
+uv run python -m fibengine.research.level_events --mode walk-forward
+uv run python -m fibengine.research.level_events --mode walk-forward --dedupe
+```
+
+**`single`** selects one swing on the full series and detects events after its leg.
+Appends a record to `experiments/results/level_events.jsonl` (`run_id`, config/symbol
+metadata, the selected `swing`, the per-level event streams, and `n_events`).
+
+Note: a single live "as-of-now" run usually picks a leg ending at the present, leaving no
+forward window — so it often reports **0 events**. The interactions the issue cares about
+require a leg that has had time to "live". That is what walk-forward mode provides.
+
+## Walk-forward mode (answers research Q4)
+
+**`walk-forward`** steps the cursor through history (`backtest.warmup_bars` / `backtest.step`),
+selecting swings *causally* (no future leaks into selection), and aggregates level events
+across every distinct **confirmed** leg via `walk_forward_level_events()`. It reuses
+`backtest.stability.walk_forward_selection()`. Output goes to
+`experiments/results/level_events_walkforward.jsonl`:
+
+```json
+{
+  "n_legs": 224, "n_events": 4835, "events_per_leg": 21.58,
+  "per_level": [{"level": "0.382", "events": 964,
+                 "by_candidate": {"continuation": 308, "failure": 113,
+                                  "reaction": 214, "rejection": 329}}, ...],
+  "legs": [{"first_confirmed_t": ..., "start_bar": ..., "end_bar": ...,
+            "direction": ..., "n_events": ...}, ...]
+}
+```
+
+**Caveat — overlapping legs inflate absolute totals.** With `step=1` nearly every bar
+yields a (slightly drifted) confirmed leg, and in the default (`forward`) attribution each
+leg's events are counted over the full forward history, so the same price action is counted
+under many overlapping legs. The absolute `n_events` is then sensitive to `step`.
+
+**Use `--dedupe` (non-overlapping attribution) for the trustworthy census.** Each bar is
+attributed to exactly one leg — the one that was the live confirmed selection at that bar
+(window `[confirmation cursor t, next leg's t)`) — so no event is double-counted. This
+matters: on Kraken BTC/USD daily the `forward` mode shows a misleadingly *flat* per-level
+distribution (~19-22% each, 4835 events), while `--dedupe` reveals the real gradient —
+shallow levels dominate (0.236/0.382 ≈ 28% each) and deep levels are rare
+(0.786 ≈ 10%), across 142 distinct interactions. Prefer `--dedupe` when answering
+"how many events per level".
+
+## Data / running
+
+Candles are fetched on demand and cached locally by `load_candles()` (under `data/raw/`,
+which is **not** versioned — see the repo data policy). The first run for a symbol/timeframe
+needs network; subsequent runs read the local cache. Point the config at any symbol:
+
+```python
+from fibengine.core.config import load_settings
+from fibengine.research.level_events import run_walk_forward_level_events
+
+s = load_settings()
+s = s.model_copy(update={"data": s.data.model_copy(update={
+    "exchange": "kraken", "symbol": "BTC/USD", "timeframe": "1d"})})
+run_walk_forward_level_events(s, non_overlapping=True)
+```
+
+Config is supplied via `LevelEventConfig` (defaults are used unless you pass your own);
+it is intentionally **not** part of canonical `Settings`, so `Settings.config_hash()` and
+the Promotion surface stay untouched.
+
+Note: the repo default exchange is Binance, which is geo-restricted from some hosted
+sandboxes (HTTP 451); Kraken/Coinbase/Bitstamp/Bitfinex are reachable alternatives there.
+On a normal machine the Binance default works as usual. Tests rely only on synthetic data,
+so they need no network.
diff --git a/docs/LEVEL_EVENT_HUMAN_REVIEW.md b/docs/LEVEL_EVENT_HUMAN_REVIEW.md
@@ -0,0 +1,119 @@
+# Fibonacci Level Event — Human Review (v1)
+
+Research-only workflow that turns the auto-detected Fibonacci *level event
+candidates* (see [LEVEL_EVENTS.md](LEVEL_EVENTS.md)) into a small, mobile-friendly
+package a human can review — including from an iPhone.
+
+## Purpose
+
+The level-event detector emits *candidates*, never facts. Before any of that
+work could ever inform anything downstream, a human needs to confirm: **does the
+auto-detected event actually match what the chart shows?** This workflow makes
+that confirmation cheap. It samples a bounded set of candidates, renders one
+chart per event, and writes a review sheet with blank columns the reviewer fills
+in. No TradingView, no manual chart hunting.
+
+## Mobile-friendly workflow
+
+Each run produces a self-contained folder of PNG charts plus a markdown index
+and a review sheet (CSV + JSONL). The reviewer:
+
+1. Opens `REVIEW_INDEX.md` on their phone — it embeds every chart inline.
+2. For each event, looks at the chart and decides whether the auto label is right.
+3. Fills in three columns for that `review_id` in `review_sample.csv` (or the
+   JSONL): `human_label`, `human_confidence`, `human_note`.
+
+Charts are intentionally simple (close-line by default, ~7×5in, dpi 130) so they
+load fast and read well on a small screen.
+
+## CLI usage
+
+```bash
+# Default: balanced sample of up to 40 events across candidate types & fib levels.
+uv run python -m fibengine.research.human_review_level_events --max-events 40 --seed 7
+
+# Single currently-selected swing instead of all walk-forward legs:
+uv run python -m fibengine.research.human_review_level_events --mode single
+
+# Non-overlapping attribution (each bar counted under one leg):
+uv run python -m fibengine.research.human_review_level_events --dedupe
+
+# Caps & filters:
+uv run python -m fibengine.research.human_review_level_events \
+  --max-per-candidate 10 --max-per-level 8 \
+  --candidate-type continuation_candidate --candidate-type rejection_candidate \
+  --level 0.5 --level 0.618 --seed 7
+
+# Candlesticks instead of close-line:
+uv run python -m fibengine.research.human_review_level_events --candlestick
+```
+
+Flags:
+
+| Flag | Meaning | Default |
+|------|---------|---------|
+| `--mode` | `single` (one selected swing) or `walk-forward` (all confirmed legs) | `walk-forward` |
+| `--dedupe` | Walk-forward non-overlapping attribution | off |
+| `--max-events` | Max sampled events total | `40` |
+| `--max-per-candidate` | Cap per candidate type | none |
+| `--max-per-level` | Cap per fib level | none |
+| `--candidate-type` | Filter to a candidate type (repeatable) | all |
+| `--level` | Filter to a fib level e.g. `0.5` (repeatable) | all |
+| `--seed` | Random seed → reproducible sample | none |
+| `--candlestick` | Candlesticks instead of close-line | off |
+| `--context-before` / `--context-after` | Bars shown around the event | `30` / `15` |
+
+## Artifact structure
+
+```
+experiments/review/fib_level_events/<run_id>/
+    review_sample.csv      # one row per sampled candidate (+ blank human_* cols)
+    review_sample.jsonl    # same rows, one JSON object per line
+    REVIEW_INDEX.md        # instructions + summary + one chart block per event
+    charts/<review_id>.png # one chart per sampled event
+```
+
+`<run_id>` is `review_<UTC timestamp>`. The whole `experiments/review/` tree is
+git-ignored: these are generated artifacts, not committed repo data.
+
+Each review row contains: `review_id, symbol, timeframe, exchange, fib_level,
+fib_price, event_bar, event_time, auto_candidate, touch_type, approach_side,
+note, evidence_forward_bars, evidence_closes_beyond, evidence_closes_back,
+evidence_max_penetration_atr, swing_start_time, swing_end_time, swing_direction,
+swing_start_bar, swing_end_bar, chart_path, human_label, human_confidence,
+human_note`.
+
+## Label schema
+
+`human_label` — pick exactly one:
+
+- `agree` — the `auto_candidate` type matches what the chart shows.
+- `wrong_type` — there is an event here, but it is a different candidate type.
+- `missed_context` — technically a touch, but trend/structure makes it misleading.
+- `noise` — not a meaningful interaction with the level.
+- `unclear` — cannot tell from the chart / ambiguous.
+
+`human_confidence` — pick exactly one: `high`, `medium`, `low`.
+
+`human_note` — free text (optional).
+
+## How to read a chart
+
+- **Dashed blue line** — the fib level price.
+- **Orange marker / vertical line** — the event bar (the touch being judged).
+- **Purple ▲ / ▼** — swing start / end (the leg the fib is drawn from), when in view.
+- **Title** — symbol, timeframe, fib level, `auto_candidate`, event time.
+
+## What this validates
+
+- Whether the detector's per-event classification matches a human's read of the
+  chart, broken down by candidate type and fib level.
+- A labeled dataset of human agreement for later qualitative analysis.
+
+## What this does NOT validate
+
+- Any trading edge, profitability, or signal quality — there is none here.
+- Anything live: the detector looks at a forward window, so labels are strictly
+  **post-hoc annotation**, never a real-time signal.
+- It does not promote, accept, or feed any candidate back into swing selection,
+  fib prices, evaluation, recall, or the canonical config. Purely research.
diff --git a/experiments/results/level_events.jsonl b/experiments/results/level_events.jsonl
@@ -0,0 +1 @@
+{"config_hash": "ef0f946bf9a6", "exchange": "kraken", "levels": [{"events": [], "level": "0.236", "price": 78762.7188}, {"events": [], "level": "0.382", "price": 82721.9906}, {"events": [], "level": "0.5", "price": 85921.95}, {"events": [], "level": "0.618", "price": 89121.9094}, {"events": [], "level": "0.786", "price": 93677.7838}], "n_events": 0, "run_id": "levelev_20260530T161718Z", "swing": {"bars": 462, "direction": "down", "end": {"index": 719, "kind": "low", "price": 72362.8, "prominence": 2.8557, "timestamp": "2026-05-29T00:00:00+00:00"}, "features": {"cleanliness": 0.033, "duration": 22.1, "magnitude": 0.8801, "prominence": 0.8702, "recency": 0.9986, "round_number": 0.1561, "scale_confluence": 0.5, "structure_alignment": 1.0}, "price_range": 27118.3, "score": -5.0407, "start": {"index": 257, "kind": "high", "price": 99481.1, "prominence": 2.4802, "timestamp": "2025-02-21T00:00:00+00:00"}, "status": "provisional"}, "symbol": "BTC/USD", "timeframe": "1d", "timestamp": "2026-05-30T16:17:20.870266+00:00"}
diff --git a/experiments/results/level_events_walkforward.jsonl b/experiments/results/level_events_walkforward.jsonl
diff --git a/premortem/reflections/2026-05-28-remove-optuna.md b/premortem/reflections/2026-05-28-remove-optuna.md
@@ -1,9 +1,29 @@
-# 2026-05-28 Ta bort Optuna
+# 2026-05-28 Ta bort Optuna (slutstädning)
 
-Hypotes: Optuna har ingen legitim roll om vikter ska sättas på principer och labels endast är referens.
+Uppföljning på `2026-05-28-optuna-rollback.md` (som drog tillbaka själva runnern).
+Den här noten städar de sista spåren så repot inte längre låtsas att Optuna finns.
+
+Hypotes:
+- Optuna har ingen legitim roll om vikter sätts på principer och labels endast är
+  referens. Kvarvarande referenser i kod/docs skapar förvirring och frestelse.
+
+Scope:
+- Delsystem: dokumentation + repo-bokföring (ingen körning).
+- Ytor: `README.md`, `docs/TRACKS.md`, `REPO_POLICY.md`, `docs/FIB_BACKTEST_PLAN.md`,
+  `archive/INDEX.md`, `config/variants/`.
+
+Observationer:
+- `src/fibengine/tuning/` och `optuna`-dependencyn var redan borttagna (rollback);
+  det som återstod var doc-referenser och arkiv-bokföring.
+- Optuna-artefakterna behålls i `archive/` som historik (raderas inte) — `archive/`
+  finns till för ersatt/legacy-material (REPO_POLICY §1).
+- `config/variants/` ramas om till principmotiverade profiler (ingen auto-tuning).
 
 Beslut:
-- Ta bort `src/fibengine/tuning/`, `optuna`-dependency, varianter, ledgers och körhistorik (ingen legacy-yta).
-- Uppdatera README, TRACKS, REPO_POLICY, FIB_BACKTEST_PLAN.
+- Ta bort kvarvarande Optuna-referenser i README/TRACKS/REPO_POLICY/FIB_BACKTEST_PLAN.
+- Behåll arkiverade artefakter; håll `archive/INDEX.md` ärlig om vad som finns kvar.
+- Vikter sätts manuellt på principgrund; ingen optimering mot `agreement`/labels.
 
-Nästa: Validate via stabilitetsmatris + pivot recall; manuella viktändringar med premortem.
+Nästa steg:
+- Validate via stabilitetsmatris + pivot recall; manuella viktändringar motiveras i
+  premortem, aldrig auto-tunade mot ritningar.
diff --git a/scripts/check_repo_bounds.py b/scripts/check_repo_bounds.py
@@ -26,6 +26,7 @@
     "src/fibengine/labeling/behavior_facit.py": "Schema v3 + I/O; plan: split load/save",
     "scripts/behavior_facit.py": "CLI för behavior facit; plan: tunn wrapper",
     "scripts/compare_mtf_disambiguation.py": "Research compare CLI; plan: dela argparse vs report",
+    "src/fibengine/research/human_review_level_events.py": "PR #11 review pack; plan: split",
 }
 
 
diff --git a/src/fibengine/research/__init__.py b/src/fibengine/research/__init__.py
@@ -0,0 +1 @@
+"""Research-only overlays. Additiva analyser ovanpå Lager A — aldrig facit."""
diff --git a/src/fibengine/research/human_review_level_events.py b/src/fibengine/research/human_review_level_events.py
diff --git a/src/fibengine/research/level_events.py b/src/fibengine/research/level_events.py
diff --git a/tests/research/test_human_review_level_events.py b/tests/research/test_human_review_level_events.py
diff --git a/tests/research/test_level_events.py b/tests/research/test_level_events.py

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	+{"config_hash": "ef0f946bf9a6", "exchange": "kraken", "levels": [{"events": [], "level": "0.236", "price": 78762.7188}, {"events": [], "level": "0.382", "price": 82721.9906}, {"events": [], "level": "0.5", "price": 85921.95}, {"events": [], "level": "0.618", "price": 89121.9094}, {"events": [], "level": "0.786", "price": 93677.7838}], "n_events": 0, "run_id": "levelev_20260530T161718Z", "swing": {"bars": 462, "direction": "down", "end": {"index": 719, "kind": "low", "price": 72362.8, "prominence": 2.8557, "timestamp": "2026-05-29T00:00:00+00:00"}, "features": {"cleanliness": 0.033, "duration": 22.1, "magnitude": 0.8801, "prominence": 0.8702, "recency": 0.9986, "round_number": 0.1561, "scale_confluence": 0.5, "structure_alignment": 1.0}, "price_range": 27118.3, "score": -5.0407, "start": {"index": 257, "kind": "high", "price": 99481.1, "prominence": 2.4802, "timestamp": "2025-02-21T00:00:00+00:00"}, "status": "provisional"}, "symbol": "BTC/USD", "timeframe": "1d", "timestamp": "2026-05-30T16:17:20.870266+00:00"}