Research scope check: original fib-level labeling plan vs validation/edge drift

## Purpose

Document the original research plan, the research track we were on, and the adjacent research tracks we have started to drift into after PR #7, issue #8, PR #9, and PR #11.

This is a **scope-control / research-direction issue**, not an implementation request.

## Agent roles for this issue

This issue should be readable and actionable by:

- **Cursor** — repository/code navigation and local artifact inspection.
- **Copilot** — implementation-risk and PR-scope review.
- **Claude** — research-design critique and scope-drift analysis.

All agents should treat this issue as **non-authorizing**. It does not approve new implementation by itself.

## Evidence chain

### PR #7 — machine labeling foundation

PR #7 introduced explicit `source=human|machine` semantics and machine-labeling as **candidates for review only**.

Observed intent:

- separate human ground truth from machine-generated candidates
- generate preliminary machine labels
- surface machine candidates for review
- exclude machine labels from recall/agreement and labeling targets
- protect human labels from being overwritten

Interpretation:

The project was already moving toward:

```text
machine-assisted labeling
```

not:

```text
manual labeling at scale
```

### Issue #8 — original fib-level event plan

Issue #8 was created because one Fibonacci level may be visited multiple times over the life of a leg.

Original problem:

```text
0.382
├─ continuation
├─ reaction
├─ rejection
└─ continuation
```

Original goal:

```text
Automatically detect and annotate significant interactions between price and Fibonacci levels.
```

Issue #8 was explicitly research-only:

- no trading logic
- no promotion logic
- no change to swing selection
- no change to fib anchors/prices
- no change to evaluation
- no change to recall
- candidates are not accepted facts

Research questions included:

1. Can the engine automatically identify meaningful level interactions?
2. Can auto-detected events approximate human review?
3. Should behavior be modeled as a single label per level or an event stream per level?
4. How many events occur per level on HTF fibs viewed on Daily?

Interpretation:

The original #8 plan was:

```text
fib level
→ price interaction
→ machine candidate label
→ human spot-check/review if needed
```

It was not intended as:

```text
human manually labels every event from scratch
```

### PR #9 — event stream implementation

PR #9 implemented a research-only event stream per fib level.

Key additions:

- `detect_level_events()`
- `walk_forward_level_events()`
- candidate labels: continuation / rejection / reaction / failure
- `touch_type`
- `approach_side`
- evidence fields
- `--dedupe` non-overlapping attribution

Interpretation:

PR #9 is largely aligned with issue #8. It gives the machine its first ability to propose event-level labels on fib lines.

However, PR #9 also moved slightly beyond narrow detection into:

```text
aggregation / census / per-level distributions
```

This is useful, but should be treated as observation output, not proof of edge.

### PR #11 — human review package

PR #11 added a mobile-friendly human review workflow:

- balanced sampling
- PNG chart per sampled event
- CSV/JSONL review sheets
- blank human fields
- markdown index
- mobile-friendly review artifacts

Interpretation:

PR #11 is useful if treated as:

```text
spot-checking machine labels
```

It becomes scope drift if treated as:

```text
a new large-scale manual labeling project
```

## Original research track

The original track was:

```text
Machine-assisted Fibonacci level/event labeling
```

Primary research question:

```text
Can the machine label Fibonacci-level interactions well enough
that the human only needs to review/correct candidates,
instead of manually marking every interaction?
```

Primary output:

```text
auto_candidate labels + evidence per fib-level event
```

Human role:

```text
spot-check
correct
identify failure modes
```

Not:

```text
manual labeling at scale
```

## Tracks we are drifting into

### Drift Track A — full human validation workflow

Question:

```text
Can a human visually validate each event candidate?
```

This is related and useful, but should remain bounded as spot-checking.

Risk:

```text
Human review becomes the main workflow.
```

instead of:

```text
Machine-assisted labeling remains the main workflow.
```

### Drift Track B — outcome mapping / event topology

Question:

```text
What happens after each fib-level event?
```

Examples:

- rejection at 0.382
- continuation at 0.618
- failure at 0.786

This is a later research phase. It is not required to answer issue #8.

### Drift Track C — predictive value / edge research

Question:

```text
Do some fib-level event types have predictive value?
```

This is a separate research track and should not be started until machine-label quality is inspected.

### Drift Track D — HTF/LTF architecture design

Issue #8 does contain the original context of:

```text
HTF fibs viewed on Daily
```

However, expanding this into a new architecture/design track should wait until we have inspected actual outputs from the current detector/review package.

## Current active hypothesis

Before opening more implementation tracks, explicitly choose the active hypothesis.

Possible hypotheses:

### Hypothesis A — current primary hypothesis

```text
Can machine-generated fib-level event candidates approximate human review?
```

### Hypothesis B

```text
Is an event-stream-per-level model more useful than a single-label-per-level model?
```

### Hypothesis C

```text
Do certain fib-level event types have predictive value?
```

Only one hypothesis should be treated as primary at a time.

Recommended current primary hypothesis:

```text
Hypothesis A
```

## Recommended stop-point

Before opening new implementation work:

1. Run the existing detector/review package on a small real-data sample.
2. Inspect the machine-generated labels.
3. Treat PR #11 output as **spot-check artifacts**, not a manual labeling campaign.
4. Record observations:
   - labels that look correct
   - labels with wrong type
   - noisy/meaningless events
   - missing context
   - unclear events
5. Decide whether label rules need refinement.

Do not start outcome mapping, predictive-value research, or larger HTF/LTF architecture work until this evidence exists.

## Proposed next action

Run a bounded review pass:

```text
Small sample
Real data
Machine labels first
Human spot-check only
No trading logic
No edge claims
```

Suggested review target:

```text
20–40 sampled events total
balanced across candidate types and fib levels where available
```

Output should be a short observation note, not a new engine.

## Non-goals for this issue

- No code changes required by this issue.
- No trading signals.
- No strategy logic.
- No Genesis integration.
- No promotion/readiness claims.
- No edge claims.
- No large-scale manual labeling campaign.

## Acceptance criteria for resolving this issue

This issue can be closed when the project has documented:

1. Which hypothesis is currently active.
2. Whether PR #11 will be used as spot-checking or full manual review.
3. A small evidence plan for inspecting machine-generated fib-level labels.
4. A decision boundary for when to open outcome mapping / predictive value / HTF-LTF architecture issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research scope check: original fib-level labeling plan vs validation/edge drift #12

Purpose

Agent roles for this issue

Evidence chain

PR #7 — machine labeling foundation

Issue #8 — original fib-level event plan

PR #9 — event stream implementation

PR #11 — human review package

Original research track

Tracks we are drifting into

Drift Track A — full human validation workflow

Drift Track B — outcome mapping / event topology

Drift Track C — predictive value / edge research

Drift Track D — HTF/LTF architecture design

Current active hypothesis

Hypothesis A — current primary hypothesis

Hypothesis B

Hypothesis C

Recommended stop-point

Proposed next action

Non-goals for this issue

Acceptance criteria for resolving this issue

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Research scope check: original fib-level labeling plan vs validation/edge drift #12

Description

Purpose

Agent roles for this issue

Evidence chain

PR #7 — machine labeling foundation

Issue #8 — original fib-level event plan

PR #9 — event stream implementation

PR #11 — human review package

Original research track

Tracks we are drifting into

Drift Track A — full human validation workflow

Drift Track B — outcome mapping / event topology

Drift Track C — predictive value / edge research

Drift Track D — HTF/LTF architecture design

Current active hypothesis

Hypothesis A — current primary hypothesis

Hypothesis B

Hypothesis C

Recommended stop-point

Proposed next action

Non-goals for this issue

Acceptance criteria for resolving this issue

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions