fibengine-next: repo structure, Optuna cleanup (docs/artifacts), machine labeling#7
Conversation
Move modules under tests/{core,backtest,data,...} and remove duplicate root test files.
Co-authored-by: Cursor <cursoragent@cursor.com>
Ruff and pytest run on commit (pytest skippable during partial staging) and on GitHub Actions. Co-authored-by: Cursor <cursoragent@cursor.com>
Move backtest roadmap to docs/ and document Research/Validate/Promotion tracks. Co-authored-by: Cursor <cursoragent@cursor.com>
Move jsonl ledgers to experiments/results/, label batches under batches/, and legacy assets to archive/. Co-authored-by: Cursor <cursoragent@cursor.com>
Add hierarchical label store under data/labels/binance/ and remove flat legacy JSON path. Co-authored-by: Cursor <cursoragent@cursor.com>
Consolidate config, models, scoring, and related helpers under src/fibengine/core/ and update imports. Co-authored-by: Cursor <cursoragent@cursor.com>
Hierarchical label paths, multi-timeframe queue, and batch manifest workflow. Co-authored-by: Cursor <cursoragent@cursor.com>
Tune scoring weights against labels; store results under experiments/results/ and config/variants/. Co-authored-by: Cursor <cursoragent@cursor.com>
Switch to fractal pivots and retune scoring weights from manual review. Co-authored-by: Cursor <cursoragent@cursor.com>
Document quality gate links, ignore tmp/, and disable Cursor Pyright for this workspace. Co-authored-by: Cursor <cursoragent@cursor.com>
Optuna optimized weights toward agreement with manual label drawings, which conflicts with the project principle that labels are reference, not an objective. Remove the tuning package, optuna dependency, active variants, and result ledgers; update docs and tracks accordingly. Experiment-run history in leaderboard.jsonl is kept as an audit trail. Co-authored-by: Cursor <cursoragent@cursor.com>
…gate) Co-authored-by: Cursor <cursoragent@cursor.com>
… facit
Add an opt-in machine-labeling flow that lowers the cost of filling label
gaps without corrupting the golden set:
- SwingLabel.source ('human' default / 'machine'); list_labels(source=...)
filter. Existing labels stay human.
- Integrity guards: pivot_recall and experiment evaluate human labels only
(machine excluded, so the engine is never scored against its own output);
worklist counts human toward the 20-30 target and lists machine candidates
separately for review.
- labeling.autolabel builds a candidate from the engine's selection, tags it
source=machine, and never overwrites a human label.
- Promotion path: review a candidate in labeling.tool and save -> becomes
source=human.
- Docs (README, data/labels) + reflection. 85 tests green, lint clean.
Granskarens guideIntroducera maskin-genererade swing-etiketter taggade med ett nytt Flödesschema för maskinmärkning och utvärdering enbart med mänskliga etiketterflowchart TD
Start([Start])
subgraph Machine_label_generation
AutolabelMain["Run fibengine.labeling.autolabel.main"]
RunAutolabel["run_autolabel(settings, exchange, symbols, timeframes, overwrite)"]
AutolabelOne["autolabel_one(settings, exchange, symbol, timeframe, overwrite)"]
FindLabel["find_label(exchange, symbol, timeframe)"]
HasHuman{existing.source == human?}
BuildLabel["label_from_swing(swing, exchange, symbol, timeframe)\nsource = machine"]
SaveLabel["save_label(label)"]
end
subgraph Evaluation_and_worklist
PivotRecall["run_pivot_recall(settings)"]
Experiment["run_experiment(settings)"]
Worklist["coverage_report(exchange, symbols, timeframes, target)"]
ListHuman["list_labels(source = human)"]
ListMachine["list_labels(source = machine)"]
end
Start --> AutolabelMain --> RunAutolabel --> AutolabelOne --> FindLabel --> HasHuman
HasHuman -- Yes --> AutolabelOne
HasHuman -- No --> BuildLabel --> SaveLabel --> RunAutolabel
Start --> PivotRecall --> ListHuman
Start --> Experiment --> ListHuman
Start --> Worklist
Worklist --> ListHuman
Worklist --> ListMachine
Ändringar på filnivå
Tips och kommandonInteragera med Sourcery
Anpassa din upplevelseGå till din dashboard för att:
Få hjälp
Original review guide in EnglishReviewer's GuideIntroduce machine-generated swing labels tagged with a new Flow diagram for machine labeling and human-only evaluationflowchart TD
Start([Start])
subgraph Machine_label_generation
AutolabelMain["Run fibengine.labeling.autolabel.main"]
RunAutolabel["run_autolabel(settings, exchange, symbols, timeframes, overwrite)"]
AutolabelOne["autolabel_one(settings, exchange, symbol, timeframe, overwrite)"]
FindLabel["find_label(exchange, symbol, timeframe)"]
HasHuman{existing.source == human?}
BuildLabel["label_from_swing(swing, exchange, symbol, timeframe)\nsource = machine"]
SaveLabel["save_label(label)"]
end
subgraph Evaluation_and_worklist
PivotRecall["run_pivot_recall(settings)"]
Experiment["run_experiment(settings)"]
Worklist["coverage_report(exchange, symbols, timeframes, target)"]
ListHuman["list_labels(source = human)"]
ListMachine["list_labels(source = machine)"]
end
Start --> AutolabelMain --> RunAutolabel --> AutolabelOne --> FindLabel --> HasHuman
HasHuman -- Yes --> AutolabelOne
HasHuman -- No --> BuildLabel --> SaveLabel --> RunAutolabel
Start --> PivotRecall --> ListHuman
Start --> Experiment --> ListHuman
Start --> Worklist
Worklist --> ListHuman
Worklist --> ListMachine
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a18c79fd3e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return {"symbol": symbol, "timeframe": timeframe, "status": "no_swing"} | ||
|
|
||
| label = label_from_swing(swing, exchange, symbol, timeframe) | ||
| path = save_label(label) |
There was a problem hiding this comment.
Keep machine candidates in the label-tool review queue
When this writes a source="machine" file to the normal label path, labeling.tool still treats that combination as already labeled because its queue/title logic only checks find_label(...) is not None. After running autolabel over the default matrix, pressing n in the tool will skip these unreviewed candidates (and the count shows them as labeled), so the intended review-and-promote workflow can silently leave machine outputs unreviewed unless the user goes through worklist commands one by one.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR restructures labeling/evaluation around explicit human vs. machine label sources and removes Optuna-driven tuning references in favor of principle-motivated variants.
Changes:
- Adds
source="human"|"machine"label handling, machine autolabel generation, and worklist review reporting. - Updates experiment and pivot recall evaluation to use only human labels as ground truth.
- Removes/updates Optuna-related policy, docs, plans, and archived-result references.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/fibengine/labeling/store.py |
Adds label source field and source filtering. |
src/fibengine/labeling/autolabel.py |
Adds machine-label generation CLI/workflow. |
src/fibengine/labeling/worklist.py |
Separates human coverage from machine candidates needing review. |
src/fibengine/experiment.py |
Excludes machine labels from agreement evaluation. |
src/fibengine/evaluation/pivot_recall.py |
Excludes machine labels from recall ground truth. |
tests/... |
Adds/updates tests for machine-label behavior and evaluation exclusion. |
README.md, REPO_POLICY.md, docs/*, config/variants/*, experiments/*, archive/INDEX.md, premortem/reflections/*, data/labels/README.md |
Updates documentation and policy around Optuna removal and machine-label semantics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| created_at: str = "" | ||
| # "human" = manuellt facit (golden set). "machine" = maskingenererad kandidat | ||
| # (provisorisk, EXKLUDERAS från recall/agreement — får aldrig bli domare). | ||
| source: str = "human" |
There was a problem hiding this comment.
Hey - jag har hittat 3 problem och lämnat lite övergripande feedback:
- I ställen som
coverage_report,run_pivot_recallochrun_experimentanropar dulist_labelsflera gånger med olikasource-värden, vilket läser in alla labels på nytt varje gång. Överväg att läsa in en gång och partitionera i minnet för att undvika upprepad disk-I/O och parsing. source-värdena ("human"/"machine") används nu i flera moduler (store, worklist, autolabel, evaluation, experiment). Att införa delade konstanter eller ett enum för dessa skulle minska risken för felskrivningar och hålla beteendet konsekvent.
Prompt för AI-agenter
Please address the comments from this code review:
## Overall Comments
- I ställen som `coverage_report`, `run_pivot_recall` och `run_experiment` anropar du `list_labels` flera gånger med olika `source`-värden, vilket läser in alla labels på nytt varje gång. Överväg att läsa in en gång och partitionera i minnet för att undvika upprepad disk-I/O och parsing.
- `source`-värdena (`"human"` / `"machine"`) används nu i flera moduler (store, worklist, autolabel, evaluation, experiment). Att införa delade konstanter eller ett enum för dessa skulle minska risken för felskrivningar och hålla beteendet konsekvent.
## Individual Comments
### Comment 1
<location path="src/fibengine/evaluation/pivot_recall.py" line_range="123-125" />
<code_context>
- labels = list_labels()
+ # Endast mänskligt facit får vara ground truth. Maskin-labels är kandidater
+ # och EXKLUDERAS — annars mäter vi motorn mot sig själv (cirkulärt).
+ labels = list_labels(source="human")
+ n_machine = len(list_labels(source="machine"))
+ if n_machine:
+ log.info("Hoppar över {} maskin-labels (ej ground truth för recall)", n_machine)
</code_context>
<issue_to_address>
**suggestion (performance):** Att ladda labels två gånger träffar label-storen två gånger; överväg att partitionera en enda laddning för bättre effektivitet.
Detta triggar nu två fullständiga katalogskanningar/JSON-laddningar (`source="human"` och `source="machine"`). När mängden labels växer kan den extra I/O:n bli märkbar. Överväg att ladda en gång (t.ex. `all_labels = list_labels()`) och sedan partitionera till human vs. machine i minnet för att behålla samma loggning men undvika upprepad I/O.
</issue_to_address>
### Comment 2
<location path="src/fibengine/experiment.py" line_range="130-132" />
<code_context>
- labels = list_labels()
+ # Endast mänskligt facit får vara ground truth. Maskin-labels är kandidater
+ # och EXKLUDERAS — annars mäter vi motorn mot sig själv (cirkulärt).
+ labels = list_labels(source="human")
+ n_machine = len(list_labels(source="machine"))
+ if n_machine:
+ log.info("Hoppar över {} maskin-labels (ej ground truth för recall)", n_machine)
</code_context>
<issue_to_address>
**suggestion (performance):** Samma dubbel-laddningsmönster som i `pivot_recall`; du kan återanvända ett enda anrop till `list_labels()` och partitionera i minnet.
För att undvika upprepad I/O och hålla detta konsekvent med `pivot_recall`, överväg att anropa `list_labels()` en gång och dela upp resultatet i human- respektive machine-labels i minnet, särskilt eftersom experiment kan köras ofta mot en växande label-mängd.
Suggested implementation:
```python
# Agreement mäts bara mot mänskligt facit; maskin-labels exkluderas (kandidater,
# inte domare) så vi inte råkar mäta motorn mot sin egen output.
all_labels = list_labels()
labels = [lbl for lbl in all_labels if getattr(lbl, "source", None) == "human"]
n_machine = sum(1 for lbl in all_labels if getattr(lbl, "source", None) == "machine")
if n_machine:
log.info("Hoppar över {} maskin-labels i agreement (ej ground truth)", n_machine)
results: list[dict] = []
if not labels:
log.warning("Inga labels i data/labels/ — kör enbart prediktion på konfig-symbolen.")
```
Om `list_labels()` redan returnerar separata strukturer eller tupler (t.ex. `(human_labels, machine_labels)`), eller om label-objekten exponerar källan under ett annat attribut eller nyckel än `.source`, justera `getattr(lbl, "source", None)`-kontrollerna därefter för att matcha det faktiska API:et (till exempel `lbl["source"]` eller `lbl.meta.source`). Det viktiga är: behåll ett enda anrop till `list_labels()` och härled `labels` (human) och `n_machine` (antal machine) från det i minnet.
</issue_to_address>
### Comment 3
<location path="src/fibengine/labeling/autolabel.py" line_range="84-95" />
<code_context>
+ for timeframe in timeframes:
+ try:
+ result = autolabel_one(settings, exchange, symbol, timeframe, overwrite)
+ except Exception as exc: # noqa: BLE001 - rapportera per kombination.
+ result = {
+ "symbol": symbol,
+ "timeframe": timeframe,
</code_context>
<issue_to_address>
**suggestion:** Den breda exception-hanteringen tappar stacktracen; överväg att logga undantaget med stack trace för enklare felsökning.
Att fånga `Exception` här är rimligt för att kunna fortsätta bearbetningen per (symbol, timeframe), men att bara spara `str(exc)` i `result` gör djupare problem (t.ex. felaktig konfiguration/data) svåra att debugga. Överväg att anropa `log.exception("Autolabel failed for %s %s", symbol, timeframe)` i `except`-blocket så att du behåller fulla stacktraces i loggarna samtidigt som du fortfarande returnerar det strukturerade `result`.
```suggestion
for timeframe in timeframes:
try:
result = autolabel_one(settings, exchange, symbol, timeframe, overwrite)
except Exception as exc: # noqa: BLE001 - rapportera per kombination.
log.exception("Autolabel failed for %s %s", symbol, timeframe)
result = {
"symbol": symbol,
"timeframe": timeframe,
"status": "error",
"error": str(exc),
}
log.info("{} {} -> {}", symbol, timeframe, result["status"])
results.append(result)
```
</issue_to_address>Sourcery är gratis för open source – om du gillar våra reviews får du gärna dela dem ✨
Original comment in English
Hey - I've found 3 issues, and left some high level feedback:
- In places like
coverage_report,run_pivot_recall, andrun_experimentyou calllist_labelsmultiple times with differentsourcevalues, which re-reads all labels each time; consider loading once and partitioning in memory to avoid repeated disk IO and parsing. - The
sourcevalues ("human"/"machine") are now used across multiple modules (store, worklist, autolabel, evaluation, experiment); introducing shared constants or an enum for these would reduce the risk of typos and keep behavior consistent.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In places like `coverage_report`, `run_pivot_recall`, and `run_experiment` you call `list_labels` multiple times with different `source` values, which re-reads all labels each time; consider loading once and partitioning in memory to avoid repeated disk IO and parsing.
- The `source` values (`"human"` / `"machine"`) are now used across multiple modules (store, worklist, autolabel, evaluation, experiment); introducing shared constants or an enum for these would reduce the risk of typos and keep behavior consistent.
## Individual Comments
### Comment 1
<location path="src/fibengine/evaluation/pivot_recall.py" line_range="123-125" />
<code_context>
- labels = list_labels()
+ # Endast mänskligt facit får vara ground truth. Maskin-labels är kandidater
+ # och EXKLUDERAS — annars mäter vi motorn mot sig själv (cirkulärt).
+ labels = list_labels(source="human")
+ n_machine = len(list_labels(source="machine"))
+ if n_machine:
+ log.info("Hoppar över {} maskin-labels (ej ground truth för recall)", n_machine)
</code_context>
<issue_to_address>
**suggestion (performance):** Loading labels twice hits the label store twice; consider partitioning a single load for better efficiency.
This now triggers two full directory scans/JSON loads (`source="human"` and `source="machine"`). As the label set grows this extra I/O can add up. Consider loading once (e.g. `all_labels = list_labels()`) and then partitioning into human vs machine in memory to keep the same logging while avoiding repeated I/O.
</issue_to_address>
### Comment 2
<location path="src/fibengine/experiment.py" line_range="130-132" />
<code_context>
- labels = list_labels()
+ # Endast mänskligt facit får vara ground truth. Maskin-labels är kandidater
+ # och EXKLUDERAS — annars mäter vi motorn mot sig själv (cirkulärt).
+ labels = list_labels(source="human")
+ n_machine = len(list_labels(source="machine"))
+ if n_machine:
+ log.info("Hoppar över {} maskin-labels (ej ground truth för recall)", n_machine)
</code_context>
<issue_to_address>
**suggestion (performance):** Same double-load pattern as in `pivot_recall`; you can reuse a single `list_labels()` call and partition in memory.
To avoid repeated I/O and keep this consistent with `pivot_recall`, consider calling `list_labels()` once and splitting the result into human vs. machine labels in memory, especially since experiments may run often over an increasing label set.
Suggested implementation:
```python
# Agreement mäts bara mot mänskligt facit; maskin-labels exkluderas (kandidater,
# inte domare) så vi inte råkar mäta motorn mot sin egen output.
all_labels = list_labels()
labels = [lbl for lbl in all_labels if getattr(lbl, "source", None) == "human"]
n_machine = sum(1 for lbl in all_labels if getattr(lbl, "source", None) == "machine")
if n_machine:
log.info("Hoppar över {} maskin-labels i agreement (ej ground truth)", n_machine)
results: list[dict] = []
if not labels:
log.warning("Inga labels i data/labels/ — kör enbart prediktion på konfig-symbolen.")
```
If `list_labels()` already returns separate structures or tuples (e.g. `(human_labels, machine_labels)`), or if the label objects expose the source under a different attribute or key than `.source`, adjust the `getattr(lbl, "source", None)` checks accordingly to match the actual API (for example `lbl["source"]` or `lbl.meta.source`). The important part is: keep a single `list_labels()` call and derive `labels` (human) and `n_machine` (machine count) from that in memory.
</issue_to_address>
### Comment 3
<location path="src/fibengine/labeling/autolabel.py" line_range="84-95" />
<code_context>
+ for timeframe in timeframes:
+ try:
+ result = autolabel_one(settings, exchange, symbol, timeframe, overwrite)
+ except Exception as exc: # noqa: BLE001 - rapportera per kombination.
+ result = {
+ "symbol": symbol,
+ "timeframe": timeframe,
</code_context>
<issue_to_address>
**suggestion:** The broad exception handling loses the traceback; consider logging the exception with stack trace for easier debugging.
Catching `Exception` here is reasonable to keep processing per (symbol, timeframe), but only storing `str(exc)` in `result` makes deeper issues (e.g. bad config/data) hard to debug. Consider calling `log.exception("Autolabel failed for %s %s", symbol, timeframe)` in the `except` block so you retain full tracebacks in logs while still returning the structured `result`.
```suggestion
for timeframe in timeframes:
try:
result = autolabel_one(settings, exchange, symbol, timeframe, overwrite)
except Exception as exc: # noqa: BLE001 - rapportera per kombination.
log.exception("Autolabel failed for %s %s", symbol, timeframe)
result = {
"symbol": symbol,
"timeframe": timeframe,
"status": "error",
"error": str(exc),
}
log.info("{} {} -> {}", symbol, timeframe, result["status"])
results.append(result)
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| labels = list_labels(source="human") | ||
| n_machine = len(list_labels(source="machine")) | ||
| if n_machine: |
There was a problem hiding this comment.
suggestion (performance): Att ladda labels två gånger träffar label-storen två gånger; överväg att partitionera en enda laddning för bättre effektivitet.
Detta triggar nu två fullständiga katalogskanningar/JSON-laddningar (source="human" och source="machine"). När mängden labels växer kan den extra I/O:n bli märkbar. Överväg att ladda en gång (t.ex. all_labels = list_labels()) och sedan partitionera till human vs. machine i minnet för att behålla samma loggning men undvika upprepad I/O.
Original comment in English
suggestion (performance): Loading labels twice hits the label store twice; consider partitioning a single load for better efficiency.
This now triggers two full directory scans/JSON loads (source="human" and source="machine"). As the label set grows this extra I/O can add up. Consider loading once (e.g. all_labels = list_labels()) and then partitioning into human vs machine in memory to keep the same logging while avoiding repeated I/O.
| labels = list_labels(source="human") | ||
| n_machine = len(list_labels(source="machine")) | ||
| if n_machine: |
There was a problem hiding this comment.
suggestion (performance): Samma dubbel-laddningsmönster som i pivot_recall; du kan återanvända ett enda anrop till list_labels() och partitionera i minnet.
För att undvika upprepad I/O och hålla detta konsekvent med pivot_recall, överväg att anropa list_labels() en gång och dela upp resultatet i human- respektive machine-labels i minnet, särskilt eftersom experiment kan köras ofta mot en växande label-mängd.
Suggested implementation:
# Agreement mäts bara mot mänskligt facit; maskin-labels exkluderas (kandidater,
# inte domare) så vi inte råkar mäta motorn mot sin egen output.
all_labels = list_labels()
labels = [lbl for lbl in all_labels if getattr(lbl, "source", None) == "human"]
n_machine = sum(1 for lbl in all_labels if getattr(lbl, "source", None) == "machine")
if n_machine:
log.info("Hoppar över {} maskin-labels i agreement (ej ground truth)", n_machine)
results: list[dict] = []
if not labels:
log.warning("Inga labels i data/labels/ — kör enbart prediktion på konfig-symbolen.")Om list_labels() redan returnerar separata strukturer eller tupler (t.ex. (human_labels, machine_labels)), eller om label-objekten exponerar källan under ett annat attribut eller nyckel än .source, justera getattr(lbl, "source", None)-kontrollerna därefter för att matcha det faktiska API:et (till exempel lbl["source"] eller lbl.meta.source). Det viktiga är: behåll ett enda anrop till list_labels() och härled labels (human) och n_machine (antal machine) från det i minnet.
Original comment in English
suggestion (performance): Same double-load pattern as in pivot_recall; you can reuse a single list_labels() call and partition in memory.
To avoid repeated I/O and keep this consistent with pivot_recall, consider calling list_labels() once and splitting the result into human vs. machine labels in memory, especially since experiments may run often over an increasing label set.
Suggested implementation:
# Agreement mäts bara mot mänskligt facit; maskin-labels exkluderas (kandidater,
# inte domare) så vi inte råkar mäta motorn mot sin egen output.
all_labels = list_labels()
labels = [lbl for lbl in all_labels if getattr(lbl, "source", None) == "human"]
n_machine = sum(1 for lbl in all_labels if getattr(lbl, "source", None) == "machine")
if n_machine:
log.info("Hoppar över {} maskin-labels i agreement (ej ground truth)", n_machine)
results: list[dict] = []
if not labels:
log.warning("Inga labels i data/labels/ — kör enbart prediktion på konfig-symbolen.")If list_labels() already returns separate structures or tuples (e.g. (human_labels, machine_labels)), or if the label objects expose the source under a different attribute or key than .source, adjust the getattr(lbl, "source", None) checks accordingly to match the actual API (for example lbl["source"] or lbl.meta.source). The important part is: keep a single list_labels() call and derive labels (human) and n_machine (machine count) from that in memory.
| for timeframe in timeframes: | ||
| try: | ||
| result = autolabel_one(settings, exchange, symbol, timeframe, overwrite) | ||
| except Exception as exc: # noqa: BLE001 - rapportera per kombination. | ||
| result = { | ||
| "symbol": symbol, | ||
| "timeframe": timeframe, | ||
| "status": "error", | ||
| "error": str(exc), | ||
| } | ||
| log.info("{} {} -> {}", symbol, timeframe, result["status"]) | ||
| results.append(result) |
There was a problem hiding this comment.
suggestion: Den breda exception-hanteringen tappar stacktracen; överväg att logga undantaget med stack trace för enklare felsökning.
Att fånga Exception här är rimligt för att kunna fortsätta bearbetningen per (symbol, timeframe), men att bara spara str(exc) i result gör djupare problem (t.ex. felaktig konfiguration/data) svåra att debugga. Överväg att anropa log.exception("Autolabel failed for %s %s", symbol, timeframe) i except-blocket så att du behåller fulla stacktraces i loggarna samtidigt som du fortfarande returnerar det strukturerade result.
| for timeframe in timeframes: | |
| try: | |
| result = autolabel_one(settings, exchange, symbol, timeframe, overwrite) | |
| except Exception as exc: # noqa: BLE001 - rapportera per kombination. | |
| result = { | |
| "symbol": symbol, | |
| "timeframe": timeframe, | |
| "status": "error", | |
| "error": str(exc), | |
| } | |
| log.info("{} {} -> {}", symbol, timeframe, result["status"]) | |
| results.append(result) | |
| for timeframe in timeframes: | |
| try: | |
| result = autolabel_one(settings, exchange, symbol, timeframe, overwrite) | |
| except Exception as exc: # noqa: BLE001 - rapportera per kombination. | |
| log.exception("Autolabel failed for %s %s", symbol, timeframe) | |
| result = { | |
| "symbol": symbol, | |
| "timeframe": timeframe, | |
| "status": "error", | |
| "error": str(exc), | |
| } | |
| log.info("{} {} -> {}", symbol, timeframe, result["status"]) | |
| results.append(result) |
Original comment in English
suggestion: The broad exception handling loses the traceback; consider logging the exception with stack trace for easier debugging.
Catching Exception here is reasonable to keep processing per (symbol, timeframe), but only storing str(exc) in result makes deeper issues (e.g. bad config/data) hard to debug. Consider calling log.exception("Autolabel failed for %s %s", symbol, timeframe) in the except block so you retain full tracebacks in logs while still returning the structured result.
| for timeframe in timeframes: | |
| try: | |
| result = autolabel_one(settings, exchange, symbol, timeframe, overwrite) | |
| except Exception as exc: # noqa: BLE001 - rapportera per kombination. | |
| result = { | |
| "symbol": symbol, | |
| "timeframe": timeframe, | |
| "status": "error", | |
| "error": str(exc), | |
| } | |
| log.info("{} {} -> {}", symbol, timeframe, result["status"]) | |
| results.append(result) | |
| for timeframe in timeframes: | |
| try: | |
| result = autolabel_one(settings, exchange, symbol, timeframe, overwrite) | |
| except Exception as exc: # noqa: BLE001 - rapportera per kombination. | |
| log.exception("Autolabel failed for %s %s", symbol, timeframe) | |
| result = { | |
| "symbol": symbol, | |
| "timeframe": timeframe, | |
| "status": "error", | |
| "error": str(exc), | |
| } | |
| log.info("{} {} -> {}", symbol, timeframe, result["status"]) | |
| results.append(result) |
|
@claude check this PR if the reviews are legitimate and if so, implement them if. |
- TRACKS.md: restore the Validate stability_gate criteria (drift as a first-class metric + explicit out-of-window recall reporting) that the feature branch dropped; the gate code and README still describe them. - archive/INDEX.md + README: the Optuna artifacts under archive/ were not actually deleted, so correct the index note (code removed from src; the artifacts are retained as history) and restore the README archive pointer. Docs only; no code/behavior change. 85 tests green, lint clean.
Granskning + doc-fixar (commit 97a93a9)Granskade PR:n. Kärnkoden (maskin-labeling) är solid och väl testad (85 gröna, ~76% coverage); inga buggar. Fixade två dokument-/konsistensregressioner direkt på branchen:
Endast docs, ingen kod-/beteendeändring. Kvarvarande review-noter (ej åtgärdade — din bedömning)
Generated by Claude Code |
- leaderboard.jsonl: drop the 4 rows this branch appended (3 of them are label-optimized Optuna trials with off-principle negative weights). This PR should not enshrine label-optimized run records; revert to main's ledger. - remove-optuna reflection: rewrite to satisfy REPO_POLICY 11.4 (add Scope + Observationer), correct it (tuning code/dep were already gone), note the archived artifacts are retained, and cross-link the optuna-rollback note. Finding 5 (PR title accuracy) handled via the PR title. Docs/data only.
Review-noter 3–5 åtgärdade (commit 73f42ec)
Docs/data only, ingen kodändring. 85 tester gröna, lint rent. Generated by Claude Code |
Summary
source=human|machine): candidates for review only; excluded from recall/agreement and 20–30 target.Test plan
uv run pytest— 85 passed, ~76% coverageuv run ruff check src testsmainbefore mergeSummary by Sourcery
Inför explicita källor för mänskliga vs. maskinella etiketter, lägg till arbetsflöde för maskinmärkning och ta bort Optuna-baserade tuning‑artefakter till förmån för principstyrda varianter.
New Features:
sourcepå swing‑etiketter för att skilja mänsklig ground truth från maskin‑genererade kandidater.labeling.autolabel‑CLI för att generera preliminära maskinetiketter för angivna symbol/tidsfönster‑galler.Enhancements:
pivot_recall) och experimentens överensstämmelsemetrik endast använder mänskliga etiketter som ground truth och uttryckligen exkluderar maskin‑genererade etiketter.Documentation:
source‑värden och integritetsregler, i README och etikettdata‑dokumentation.config/variantssom principstyrda profiler och ta bort referenser till Optuna‑artefakter och tuning‑kataloger.Tests:
Chores:
Original summary in English
Summary by Sourcery
Introduce explicit human vs machine label sources, add machine labeling workflow, and remove Optuna-based tuning artifacts in favor of principle-driven variants.
New Features:
sourcefield on swing labels to distinguish human ground truth from machine-generated candidates.labeling.autolabelCLI to generate provisional machine labels for specified symbol/timeframe grids.Enhancements:
pivot_recall) and experiment agreement metrics use only human labels as ground truth, explicitly excluding machine-generated labels.Documentation:
sourcevalues and integrity rules, in README and label data docs.config/variantsas principle-driven profiles and remove references to Optuna artifacts and tuning directories.Tests:
Chores: