Skip to content

Commit 784dc10

Browse files
merge: sync origin/main (level_events, human review, AGENTS.md)
Resolve conflicts keeping multi-leg labeling and MTF research. Fix UTF-8 writes in human review index on Windows. Grandfather human_review_level_events per REPO_POLICY. Co-authored-by: Cursor <cursoragent@cursor.com>
2 parents 394a1a2 + eade139 commit 784dc10

16 files changed

Lines changed: 1795 additions & 6 deletions

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,3 +228,5 @@ data/screenshots/
228228
experiments/results/mtf_*.json
229229
experiments/results/*_compare_*T*.json
230230
experiments/runs/
231+
# Genererade human-review-paket (charts/sheets) är artefakter, inte källa.
232+
experiments/review/

AGENTS.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# AGENTS.md
2+
3+
## Cursor Cloud specific instructions
4+
5+
### Product
6+
7+
**fibengine** is a Python research engine for human-like Fibonacci swing selection (Layer A). There is no web server or database—workflows are CLI modules (`experiment`, `backtest`, `labeling`) plus an optional Matplotlib labeling GUI.
8+
9+
### Dependencies (automatic on VM startup)
10+
11+
The update script runs `uv sync --extra dev`, which creates/updates `.venv` from `pyproject.toml` / `uv.lock`. Python **3.11+** is required (CI and local use 3.12).
12+
13+
### Lint / test / build (match CI)
14+
15+
From repo root:
16+
17+
```bash
18+
uv run ruff check src tests
19+
uv run ruff format --check src tests
20+
uv run pytest -q
21+
uv build
22+
```
23+
24+
Optional local gate (same hooks as documented in README): `uv run pre-commit run --all-files`.
25+
26+
### Running the main pipeline (hello-world)
27+
28+
1. **Candles**`uv run python -m fibengine.data.fetch` caches OHLCV under `data/raw/` (gitignored). This needs outbound HTTPS to Binance via CCXT. If the API is blocked in the VM, either request egress for `api.binance.com` or populate `data/raw/` manually before running pipelines that call `load_candles()`.
29+
2. **Experiment**`uv run python -m fibengine.experiment` runs swing selection for all human labels in `data/labels/`, writes plots and `metrics.json` under `experiments/runs/experiment/<date>/<run_id>/`, and appends to `experiments/results/leaderboard.jsonl`.
30+
3. **Labeling worklist**`uv run python -m fibengine.labeling.worklist` (no network).
31+
4. **Interactive labeler**`uv run python -m fibengine.labeling.tool` needs a display/GUI backend (not typical in headless cloud VMs).
32+
33+
### Services
34+
35+
| Component | Required for | Notes |
36+
|-----------|----------------|-------|
37+
| `.venv` via `uv sync` | Everything | No Docker Compose in repo |
38+
| `pytest` | CI / dev | Uses synthetic fixtures; no network |
39+
| Binance (CCXT) | Live fetch / fresh caches | Optional if `data/raw/` already populated |
40+
| Matplotlib GUI | `labeling.tool` | Optional |
41+
42+
### Gotchas
43+
44+
- `load_candles(..., fetch_if_missing=True)` will call the exchange when cache is missing—failures look like CCXT `NetworkError` / SSL errors if egress is blocked.
45+
- Long timeframes use higher `timeframe_limits` in `config/settings.yaml`; labels can be `out_of_window` if history is too short (see experiment logs).
46+
- Coverage gate is **60%** via pytest `addopts` in `pyproject.toml`; `labeling/tool.py` is omitted from coverage.

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ uv run python -m fibengine.backtest.runner --config config/variants/<profil>.yam
5555
> Obs: automatisk vikt-optimering (Optuna) togs medvetet bort — den optimerade mot
5656
> de manuella labelsen, vilket bryter mot filosofin (labels = referens, inte domare).
5757
> Se `premortem/reflections/2026-05-28-remove-optuna.md`. Vikter sätts på principgrund.
58+
> Arkiverade Optuna-artefakter finns kvar under `archive/` som historik.
5859
5960
## Pipeline (Lager A)
6061

REPO_POLICY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,7 @@ Kör lokalt: `uv run python scripts/check_repo_bounds.py`
118118
| `src/fibengine/labeling/behavior_facit.py` | ~530 | Dela I/O vs validate — grandfather tills split |
119119
| `scripts/behavior_facit.py` | ~220 | Tunn CLI-wrapper — grandfather tills split |
120120
| `scripts/compare_mtf_disambiguation.py` | ~245 | Dela argparse vs report — grandfather tills split |
121+
| `src/fibengine/research/human_review_level_events.py` | ~610 | Dela pack writer vs runner — grandfather (main PR #11) |
121122

122123
Lägg **inte** till funktioner i grandfathered filer; fixa genom split.
123124

archive/INDEX.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@
55
| 2026-05-28 | `experiments/ledgers/2026-05-28-pre-results-migration/` | `experiments/*.jsonl` (rot) | Flyttad till `experiments/results/`; gamla kopior behålls |
66
| 2026-05-28 | `experiments/label_review/*` (se undermappar) | `experiments/label_review/` (rot-dubletter) | Canonical källa: `experiments/label_review/batches/` |
77
| 2026-05-28 || `FIB_BACKTEST_PLAN.md` (repo-rot) | Raderad stub; canonical: `docs/FIB_BACKTEST_PLAN.md` |
8-
| 2026-05-28 | | Optuna (kod, varianter, ledgers, körhistorik) | Raderat — strider mot principen att labels är referens, inte optimeringsmål |
8+
| 2026-05-28 | `experiments/optuna/`, `config_variants/optuna_2026-05-28_trial31.yaml` | Optuna (kod ur `src/fibengine/tuning/`, variant, ledgers) | Koden borttagen ur drift (strider mot principen labels=referens, inte optimeringsmål); artefakterna behålls här som historik |

docs/LEVEL_EVENTS.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Fibonacci Level Interaction Events (research-only)
2+
3+
Status: **RESEARCH** — implements issue #8.
4+
5+
Where a swing previously carried a single behavior label per Fibonacci level, this
6+
overlay records **an event stream per level**: every time price interacts with a level
7+
it emits a *candidate* event with a timestamp and supporting evidence, for human review.
8+
9+
> To human-validate these candidates on a phone, see
10+
> [LEVEL_EVENT_HUMAN_REVIEW.md](LEVEL_EVENT_HUMAN_REVIEW.md).
11+
12+
## What it does
13+
14+
For a selected swing, `detect_level_events()` scans the bars **after the leg's end**
15+
(the retracement window) and, for each Fibonacci level, emits events classified as:
16+
17+
| candidate | meaning |
18+
|--------------------------|----------------------------------------------------------|
19+
| `continuation_candidate` | broke through the level and continued |
20+
| `rejection_candidate` | touched the level and rejected back to the approach side |
21+
| `failure_candidate` | accepted beyond the level, then reversed back across it |
22+
| `reaction_candidate` | reacted at the level without a clear breakout/rejection |
23+
24+
Each event records `touch_type` (`wick_below` / `wick_above` / `close_above` /
25+
`close_below`), `approach_side` (`above` / `below`), and `evidence`
26+
(`forward_bars`, `closes_beyond`, `closes_back`, `max_penetration_atr`).
27+
28+
## Guardrails
29+
30+
- **Candidates, never facts.** The `*_candidate` naming is deliberate — events are inputs
31+
to human review, never auto-accepted.
32+
- **Look-ahead is intentional.** Classification inspects a forward window of bars after a
33+
touch, so this is strictly **post-hoc annotation, never a live trading signal**.
34+
- **Additive only.** It does not change swing selection, fib anchors/prices, evaluation,
35+
recall or promotion. Output goes to a new file; no existing artifacts are mutated.
36+
37+
## Configuration (`config/settings.yaml``level_events`)
38+
39+
| key | default | meaning |
40+
|----------------------------|---------|---------------------------------------------------------------|
41+
| `levels` | `[]` | fib ratios to scan; empty inherits `fib.levels` |
42+
| `touch_tolerance_atr` | `0.10` | band half-width around a level = this × ATR at the bar |
43+
| `forward_window` | `5` | bars after a touch used for classification |
44+
| `acceptance_closes` | `2` | closes beyond the level required to count as "accepted" |
45+
| `immediate_rejection_bars` | `2` | window for a quick close back to the approach side |
46+
| `debounce_bars` | `3` | bars price must leave the band before a new event is counted |
47+
48+
## Run
49+
50+
```sh
51+
uv run python -m fibengine.research.level_events # single snapshot
52+
uv run python -m fibengine.research.level_events --mode walk-forward
53+
uv run python -m fibengine.research.level_events --mode walk-forward --dedupe
54+
```
55+
56+
**`single`** selects one swing on the full series and detects events after its leg.
57+
Appends a record to `experiments/results/level_events.jsonl` (`run_id`, config/symbol
58+
metadata, the selected `swing`, the per-level event streams, and `n_events`).
59+
60+
Note: a single live "as-of-now" run usually picks a leg ending at the present, leaving no
61+
forward window — so it often reports **0 events**. The interactions the issue cares about
62+
require a leg that has had time to "live". That is what walk-forward mode provides.
63+
64+
## Walk-forward mode (answers research Q4)
65+
66+
**`walk-forward`** steps the cursor through history (`backtest.warmup_bars` / `backtest.step`),
67+
selecting swings *causally* (no future leaks into selection), and aggregates level events
68+
across every distinct **confirmed** leg via `walk_forward_level_events()`. It reuses
69+
`backtest.stability.walk_forward_selection()`. Output goes to
70+
`experiments/results/level_events_walkforward.jsonl`:
71+
72+
```json
73+
{
74+
"n_legs": 224, "n_events": 4835, "events_per_leg": 21.58,
75+
"per_level": [{"level": "0.382", "events": 964,
76+
"by_candidate": {"continuation": 308, "failure": 113,
77+
"reaction": 214, "rejection": 329}}, ...],
78+
"legs": [{"first_confirmed_t": ..., "start_bar": ..., "end_bar": ...,
79+
"direction": ..., "n_events": ...}, ...]
80+
}
81+
```
82+
83+
**Caveat — overlapping legs inflate absolute totals.** With `step=1` nearly every bar
84+
yields a (slightly drifted) confirmed leg, and in the default (`forward`) attribution each
85+
leg's events are counted over the full forward history, so the same price action is counted
86+
under many overlapping legs. The absolute `n_events` is then sensitive to `step`.
87+
88+
**Use `--dedupe` (non-overlapping attribution) for the trustworthy census.** Each bar is
89+
attributed to exactly one leg — the one that was the live confirmed selection at that bar
90+
(window `[confirmation cursor t, next leg's t)`) — so no event is double-counted. This
91+
matters: on Kraken BTC/USD daily the `forward` mode shows a misleadingly *flat* per-level
92+
distribution (~19-22% each, 4835 events), while `--dedupe` reveals the real gradient —
93+
shallow levels dominate (0.236/0.382 ≈ 28% each) and deep levels are rare
94+
(0.786 ≈ 10%), across 142 distinct interactions. Prefer `--dedupe` when answering
95+
"how many events per level".
96+
97+
## Data / running
98+
99+
Candles are fetched on demand and cached locally by `load_candles()` (under `data/raw/`,
100+
which is **not** versioned — see the repo data policy). The first run for a symbol/timeframe
101+
needs network; subsequent runs read the local cache. Point the config at any symbol:
102+
103+
```python
104+
from fibengine.core.config import load_settings
105+
from fibengine.research.level_events import run_walk_forward_level_events
106+
107+
s = load_settings()
108+
s = s.model_copy(update={"data": s.data.model_copy(update={
109+
"exchange": "kraken", "symbol": "BTC/USD", "timeframe": "1d"})})
110+
run_walk_forward_level_events(s, non_overlapping=True)
111+
```
112+
113+
Config is supplied via `LevelEventConfig` (defaults are used unless you pass your own);
114+
it is intentionally **not** part of canonical `Settings`, so `Settings.config_hash()` and
115+
the Promotion surface stay untouched.
116+
117+
Note: the repo default exchange is Binance, which is geo-restricted from some hosted
118+
sandboxes (HTTP 451); Kraken/Coinbase/Bitstamp/Bitfinex are reachable alternatives there.
119+
On a normal machine the Binance default works as usual. Tests rely only on synthetic data,
120+
so they need no network.

docs/LEVEL_EVENT_HUMAN_REVIEW.md

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Fibonacci Level Event — Human Review (v1)
2+
3+
Research-only workflow that turns the auto-detected Fibonacci *level event
4+
candidates* (see [LEVEL_EVENTS.md](LEVEL_EVENTS.md)) into a small, mobile-friendly
5+
package a human can review — including from an iPhone.
6+
7+
## Purpose
8+
9+
The level-event detector emits *candidates*, never facts. Before any of that
10+
work could ever inform anything downstream, a human needs to confirm: **does the
11+
auto-detected event actually match what the chart shows?** This workflow makes
12+
that confirmation cheap. It samples a bounded set of candidates, renders one
13+
chart per event, and writes a review sheet with blank columns the reviewer fills
14+
in. No TradingView, no manual chart hunting.
15+
16+
## Mobile-friendly workflow
17+
18+
Each run produces a self-contained folder of PNG charts plus a markdown index
19+
and a review sheet (CSV + JSONL). The reviewer:
20+
21+
1. Opens `REVIEW_INDEX.md` on their phone — it embeds every chart inline.
22+
2. For each event, looks at the chart and decides whether the auto label is right.
23+
3. Fills in three columns for that `review_id` in `review_sample.csv` (or the
24+
JSONL): `human_label`, `human_confidence`, `human_note`.
25+
26+
Charts are intentionally simple (close-line by default, ~7×5in, dpi 130) so they
27+
load fast and read well on a small screen.
28+
29+
## CLI usage
30+
31+
```bash
32+
# Default: balanced sample of up to 40 events across candidate types & fib levels.
33+
uv run python -m fibengine.research.human_review_level_events --max-events 40 --seed 7
34+
35+
# Single currently-selected swing instead of all walk-forward legs:
36+
uv run python -m fibengine.research.human_review_level_events --mode single
37+
38+
# Non-overlapping attribution (each bar counted under one leg):
39+
uv run python -m fibengine.research.human_review_level_events --dedupe
40+
41+
# Caps & filters:
42+
uv run python -m fibengine.research.human_review_level_events \
43+
--max-per-candidate 10 --max-per-level 8 \
44+
--candidate-type continuation_candidate --candidate-type rejection_candidate \
45+
--level 0.5 --level 0.618 --seed 7
46+
47+
# Candlesticks instead of close-line:
48+
uv run python -m fibengine.research.human_review_level_events --candlestick
49+
```
50+
51+
Flags:
52+
53+
| Flag | Meaning | Default |
54+
|------|---------|---------|
55+
| `--mode` | `single` (one selected swing) or `walk-forward` (all confirmed legs) | `walk-forward` |
56+
| `--dedupe` | Walk-forward non-overlapping attribution | off |
57+
| `--max-events` | Max sampled events total | `40` |
58+
| `--max-per-candidate` | Cap per candidate type | none |
59+
| `--max-per-level` | Cap per fib level | none |
60+
| `--candidate-type` | Filter to a candidate type (repeatable) | all |
61+
| `--level` | Filter to a fib level e.g. `0.5` (repeatable) | all |
62+
| `--seed` | Random seed → reproducible sample | none |
63+
| `--candlestick` | Candlesticks instead of close-line | off |
64+
| `--context-before` / `--context-after` | Bars shown around the event | `30` / `15` |
65+
66+
## Artifact structure
67+
68+
```
69+
experiments/review/fib_level_events/<run_id>/
70+
review_sample.csv # one row per sampled candidate (+ blank human_* cols)
71+
review_sample.jsonl # same rows, one JSON object per line
72+
REVIEW_INDEX.md # instructions + summary + one chart block per event
73+
charts/<review_id>.png # one chart per sampled event
74+
```
75+
76+
`<run_id>` is `review_<UTC timestamp>`. The whole `experiments/review/` tree is
77+
git-ignored: these are generated artifacts, not committed repo data.
78+
79+
Each review row contains: `review_id, symbol, timeframe, exchange, fib_level,
80+
fib_price, event_bar, event_time, auto_candidate, touch_type, approach_side,
81+
note, evidence_forward_bars, evidence_closes_beyond, evidence_closes_back,
82+
evidence_max_penetration_atr, swing_start_time, swing_end_time, swing_direction,
83+
swing_start_bar, swing_end_bar, chart_path, human_label, human_confidence,
84+
human_note`.
85+
86+
## Label schema
87+
88+
`human_label` — pick exactly one:
89+
90+
- `agree` — the `auto_candidate` type matches what the chart shows.
91+
- `wrong_type` — there is an event here, but it is a different candidate type.
92+
- `missed_context` — technically a touch, but trend/structure makes it misleading.
93+
- `noise` — not a meaningful interaction with the level.
94+
- `unclear` — cannot tell from the chart / ambiguous.
95+
96+
`human_confidence` — pick exactly one: `high`, `medium`, `low`.
97+
98+
`human_note` — free text (optional).
99+
100+
## How to read a chart
101+
102+
- **Dashed blue line** — the fib level price.
103+
- **Orange marker / vertical line** — the event bar (the touch being judged).
104+
- **Purple ▲ / ▼** — swing start / end (the leg the fib is drawn from), when in view.
105+
- **Title** — symbol, timeframe, fib level, `auto_candidate`, event time.
106+
107+
## What this validates
108+
109+
- Whether the detector's per-event classification matches a human's read of the
110+
chart, broken down by candidate type and fib level.
111+
- A labeled dataset of human agreement for later qualitative analysis.
112+
113+
## What this does NOT validate
114+
115+
- Any trading edge, profitability, or signal quality — there is none here.
116+
- Anything live: the detector looks at a forward window, so labels are strictly
117+
**post-hoc annotation**, never a real-time signal.
118+
- It does not promote, accept, or feed any candidate back into swing selection,
119+
fib prices, evaluation, recall, or the canonical config. Purely research.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"config_hash": "ef0f946bf9a6", "exchange": "kraken", "levels": [{"events": [], "level": "0.236", "price": 78762.7188}, {"events": [], "level": "0.382", "price": 82721.9906}, {"events": [], "level": "0.5", "price": 85921.95}, {"events": [], "level": "0.618", "price": 89121.9094}, {"events": [], "level": "0.786", "price": 93677.7838}], "n_events": 0, "run_id": "levelev_20260530T161718Z", "swing": {"bars": 462, "direction": "down", "end": {"index": 719, "kind": "low", "price": 72362.8, "prominence": 2.8557, "timestamp": "2026-05-29T00:00:00+00:00"}, "features": {"cleanliness": 0.033, "duration": 22.1, "magnitude": 0.8801, "prominence": 0.8702, "recency": 0.9986, "round_number": 0.1561, "scale_confluence": 0.5, "structure_alignment": 1.0}, "price_range": 27118.3, "score": -5.0407, "start": {"index": 257, "kind": "high", "price": 99481.1, "prominence": 2.4802, "timestamp": "2025-02-21T00:00:00+00:00"}, "status": "provisional"}, "symbol": "BTC/USD", "timeframe": "1d", "timestamp": "2026-05-30T16:17:20.870266+00:00"}

experiments/results/level_events_walkforward.jsonl

Lines changed: 2 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,29 @@
1-
# 2026-05-28 Ta bort Optuna
1+
# 2026-05-28 Ta bort Optuna (slutstädning)
22

3-
Hypotes: Optuna har ingen legitim roll om vikter ska sättas på principer och labels endast är referens.
3+
Uppföljning på `2026-05-28-optuna-rollback.md` (som drog tillbaka själva runnern).
4+
Den här noten städar de sista spåren så repot inte längre låtsas att Optuna finns.
5+
6+
Hypotes:
7+
- Optuna har ingen legitim roll om vikter sätts på principer och labels endast är
8+
referens. Kvarvarande referenser i kod/docs skapar förvirring och frestelse.
9+
10+
Scope:
11+
- Delsystem: dokumentation + repo-bokföring (ingen körning).
12+
- Ytor: `README.md`, `docs/TRACKS.md`, `REPO_POLICY.md`, `docs/FIB_BACKTEST_PLAN.md`,
13+
`archive/INDEX.md`, `config/variants/`.
14+
15+
Observationer:
16+
- `src/fibengine/tuning/` och `optuna`-dependencyn var redan borttagna (rollback);
17+
det som återstod var doc-referenser och arkiv-bokföring.
18+
- Optuna-artefakterna behålls i `archive/` som historik (raderas inte) — `archive/`
19+
finns till för ersatt/legacy-material (REPO_POLICY §1).
20+
- `config/variants/` ramas om till principmotiverade profiler (ingen auto-tuning).
421

522
Beslut:
6-
- Ta bort `src/fibengine/tuning/`, `optuna`-dependency, varianter, ledgers och körhistorik (ingen legacy-yta).
7-
- Uppdatera README, TRACKS, REPO_POLICY, FIB_BACKTEST_PLAN.
23+
- Ta bort kvarvarande Optuna-referenser i README/TRACKS/REPO_POLICY/FIB_BACKTEST_PLAN.
24+
- Behåll arkiverade artefakter; håll `archive/INDEX.md` ärlig om vad som finns kvar.
25+
- Vikter sätts manuellt på principgrund; ingen optimering mot `agreement`/labels.
826

9-
Nästa: Validate via stabilitetsmatris + pivot recall; manuella viktändringar med premortem.
27+
Nästa steg:
28+
- Validate via stabilitetsmatris + pivot recall; manuella viktändringar motiveras i
29+
premortem, aldrig auto-tunade mot ritningar.

0 commit comments

Comments
 (0)