Skip to content

Research scope check: original fib-level labeling plan vs validation/edge drift #12

@JohnCCarter

Description

@JohnCCarter

Purpose

Document the original research plan, the research track we were on, and the adjacent research tracks we have started to drift into after PR #7, issue #8, PR #9, and PR #11.

This is a scope-control / research-direction issue, not an implementation request.

Agent roles for this issue

This issue should be readable and actionable by:

  • Cursor — repository/code navigation and local artifact inspection.
  • Copilot — implementation-risk and PR-scope review.
  • Claude — research-design critique and scope-drift analysis.

All agents should treat this issue as non-authorizing. It does not approve new implementation by itself.

Evidence chain

PR #7 — machine labeling foundation

PR #7 introduced explicit source=human|machine semantics and machine-labeling as candidates for review only.

Observed intent:

  • separate human ground truth from machine-generated candidates
  • generate preliminary machine labels
  • surface machine candidates for review
  • exclude machine labels from recall/agreement and labeling targets
  • protect human labels from being overwritten

Interpretation:

The project was already moving toward:

machine-assisted labeling

not:

manual labeling at scale

Issue #8 — original fib-level event plan

Issue #8 was created because one Fibonacci level may be visited multiple times over the life of a leg.

Original problem:

0.382
├─ continuation
├─ reaction
├─ rejection
└─ continuation

Original goal:

Automatically detect and annotate significant interactions between price and Fibonacci levels.

Issue #8 was explicitly research-only:

  • no trading logic
  • no promotion logic
  • no change to swing selection
  • no change to fib anchors/prices
  • no change to evaluation
  • no change to recall
  • candidates are not accepted facts

Research questions included:

  1. Can the engine automatically identify meaningful level interactions?
  2. Can auto-detected events approximate human review?
  3. Should behavior be modeled as a single label per level or an event stream per level?
  4. How many events occur per level on HTF fibs viewed on Daily?

Interpretation:

The original #8 plan was:

fib level
→ price interaction
→ machine candidate label
→ human spot-check/review if needed

It was not intended as:

human manually labels every event from scratch

PR #9 — event stream implementation

PR #9 implemented a research-only event stream per fib level.

Key additions:

  • detect_level_events()
  • walk_forward_level_events()
  • candidate labels: continuation / rejection / reaction / failure
  • touch_type
  • approach_side
  • evidence fields
  • --dedupe non-overlapping attribution

Interpretation:

PR #9 is largely aligned with issue #8. It gives the machine its first ability to propose event-level labels on fib lines.

However, PR #9 also moved slightly beyond narrow detection into:

aggregation / census / per-level distributions

This is useful, but should be treated as observation output, not proof of edge.

PR #11 — human review package

PR #11 added a mobile-friendly human review workflow:

  • balanced sampling
  • PNG chart per sampled event
  • CSV/JSONL review sheets
  • blank human fields
  • markdown index
  • mobile-friendly review artifacts

Interpretation:

PR #11 is useful if treated as:

spot-checking machine labels

It becomes scope drift if treated as:

a new large-scale manual labeling project

Original research track

The original track was:

Machine-assisted Fibonacci level/event labeling

Primary research question:

Can the machine label Fibonacci-level interactions well enough
that the human only needs to review/correct candidates,
instead of manually marking every interaction?

Primary output:

auto_candidate labels + evidence per fib-level event

Human role:

spot-check
correct
identify failure modes

Not:

manual labeling at scale

Tracks we are drifting into

Drift Track A — full human validation workflow

Question:

Can a human visually validate each event candidate?

This is related and useful, but should remain bounded as spot-checking.

Risk:

Human review becomes the main workflow.

instead of:

Machine-assisted labeling remains the main workflow.

Drift Track B — outcome mapping / event topology

Question:

What happens after each fib-level event?

Examples:

  • rejection at 0.382
  • continuation at 0.618
  • failure at 0.786

This is a later research phase. It is not required to answer issue #8.

Drift Track C — predictive value / edge research

Question:

Do some fib-level event types have predictive value?

This is a separate research track and should not be started until machine-label quality is inspected.

Drift Track D — HTF/LTF architecture design

Issue #8 does contain the original context of:

HTF fibs viewed on Daily

However, expanding this into a new architecture/design track should wait until we have inspected actual outputs from the current detector/review package.

Current active hypothesis

Before opening more implementation tracks, explicitly choose the active hypothesis.

Possible hypotheses:

Hypothesis A — current primary hypothesis

Can machine-generated fib-level event candidates approximate human review?

Hypothesis B

Is an event-stream-per-level model more useful than a single-label-per-level model?

Hypothesis C

Do certain fib-level event types have predictive value?

Only one hypothesis should be treated as primary at a time.

Recommended current primary hypothesis:

Hypothesis A

Recommended stop-point

Before opening new implementation work:

  1. Run the existing detector/review package on a small real-data sample.
  2. Inspect the machine-generated labels.
  3. Treat PR Add bounded Human Review v1 workflow for fib level event candidates #11 output as spot-check artifacts, not a manual labeling campaign.
  4. Record observations:
    • labels that look correct
    • labels with wrong type
    • noisy/meaningless events
    • missing context
    • unclear events
  5. Decide whether label rules need refinement.

Do not start outcome mapping, predictive-value research, or larger HTF/LTF architecture work until this evidence exists.

Proposed next action

Run a bounded review pass:

Small sample
Real data
Machine labels first
Human spot-check only
No trading logic
No edge claims

Suggested review target:

20–40 sampled events total
balanced across candidate types and fib levels where available

Output should be a short observation note, not a new engine.

Non-goals for this issue

  • No code changes required by this issue.
  • No trading signals.
  • No strategy logic.
  • No Genesis integration.
  • No promotion/readiness claims.
  • No edge claims.
  • No large-scale manual labeling campaign.

Acceptance criteria for resolving this issue

This issue can be closed when the project has documented:

  1. Which hypothesis is currently active.
  2. Whether PR Add bounded Human Review v1 workflow for fib level event candidates #11 will be used as spot-checking or full manual review.
  3. A small evidence plan for inspecting machine-generated fib-level labels.
  4. A decision boundary for when to open outcome mapping / predictive value / HTF-LTF architecture issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions