You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix agent/provider field coverage across all enrich pipelines
claude_project_parser.py: add _detect_provider() to infer LLM provider
from model name, set agent="claude-code" and provider on all event types
(thinking, tool_use, text) — previously only tool_use had agent set.
claude_code.py: add agent="claude-code" to hook bridge record_call.
tools/backfill_agent.py: one-shot script to backfill agent/provider on
existing JSONL sessions. Agent coverage 3%→100%, provider 10%→99.8%.
Update Phase 3D corpus baseline: sessions 1315→1351, events 64K→128K.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/research/README.md
+39Lines changed: 39 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,27 @@ This directory groups the research tracks and branch studies that sit alongside
9
9
-[Phase 3C](phase3c/README.md)
10
10
-[Phase 3D](phase3d/README.md)
11
11
12
+
## Current Research Status
13
+
14
+
`causetrace` now has enough corpus scale to support validation-oriented work, but not enough metadata density to support theory finalization or default automation policy.
15
+
16
+
Current snapshot:
17
+
18
+
- sessions: `1315`
19
+
- events: `64429`
20
+
- strict research-grade sessions: `157`
21
+
- dominant_chain: `1111`
22
+
- mixed: `195`
23
+
- retry-heavy: `541`
24
+
- branchy sessions: `179`
25
+
- long sessions >=100 events: `53`
26
+
27
+
The next mainline stage is:
28
+
29
+
`Phase 3D-T2B: Intervention-aware Acquisition`
30
+
31
+
This stage continues Tier 2 acquisition while keeping workflow-intervention lanes separate from the native direct-prompt baseline.
Phase 3D now treats workflow intervention as a separate analysis axis.
57
+
58
+
| Lane | Meaning | Direct native baseline? |
59
+
| --- | --- | --- |
60
+
|`direct_prompt_native`| User/developer gave the agent a task directly | Yes |
61
+
|`routed_prompt_intervention`|`prompt-routing-skill` selected the posture first | No |
62
+
|`superpowers_workflow_intervention`| A structured workflow plugin changed the execution shape | No |
63
+
|`controlled_prompt_morphology`| Controlled prompt comparison or pilot run | No |
64
+
65
+
Rules:
66
+
67
+
- Analyze each lane independently first.
68
+
- Do not merge intervention traces into the native direct-prompt baseline.
69
+
- Cross-lane comparison may report trends only.
70
+
- Intervention-lane findings do not become universal policy without additional validation.
71
+
51
72
## Hypothesis Categories
52
73
53
74
- runtime morphology hypotheses
@@ -69,6 +90,23 @@ These are hypotheses only, not conclusions. Canonical entries live in the regist
69
90
- Record negative results alongside positive ones.
70
91
- Do not treat the hypotheses as ontology.
71
92
- Do not promote external research into conclusions without causetrace corpus evidence.
93
+
- Treat workflow intervention lanes as separate from native direct-prompt analysis.
94
+
- Do not move into Phase 4 theory finalization yet.
95
+
96
+
## Metadata Density Warning
97
+
98
+
Current corpus scale is sufficient for validation-oriented work, but metadata density remains too low for stable theory finalization or default automation policy.
99
+
100
+
Current gap summary:
101
+
102
+
- runtime missing: `1136`
103
+
- task_type missing: `1150`
104
+
- task_source missing: `1150`
105
+
- success missing: `1153`
106
+
- duration missing: `1315`
107
+
- human_intervention missing: `1219`
108
+
109
+
The next stage should continue acquisition and lane separation before any universal prompt policy or runtime theory is attempted.
@@ -25,6 +29,8 @@ It is not a validation result and it is not a conclusion. It is the starting poi
25
29
- Runtime distribution is still uneven, with `anthropic` and `claude-code` accounting for most explicit runtime labels.
26
30
- Task distribution is still uneven, with `debug_test`, `feature_add`, `exploration`, and `review` dominating the labeled subset.
27
31
- Failure and human-intervention coverage remain limited relative to the overall corpus, so failure morphology and intervention morphology should be treated as weaker, later-stage candidates.
32
+
- Agent and provider fields are now populated inline on 100% / 99.8% of events (v0.2.5 parser fix), enabling reliable per-event runtime attribution for the first time.
33
+
- Claude Code sessions predominantly use deepseek-v4-pro as the underlying model (via proxy), with doubao-seed-2.0-code as secondary.
Copy file name to clipboardExpand all lines: docs/research/phase3d/status.md
+32-2Lines changed: 32 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,6 +3,7 @@
3
3
Phase 3D is active.
4
4
5
5
It is the hypothesis registry layer for runtime morphology research. It follows the descriptive work in Phase 3A, 3B, and 3C.
6
+
The next mainline stage is `Phase 3D-T2B: Intervention-aware Acquisition`, which continues Tier 2 acquisition while keeping workflow-intervention lanes separate from the native direct-prompt baseline.
6
7
7
8
## Current Position
8
9
@@ -15,14 +16,21 @@ It is the hypothesis registry layer for runtime morphology research. It follows
15
16
16
17
## Current Corpus Baseline
17
18
18
-
- sessions: `981`
19
-
- events: `30024`
19
+
- sessions: `1351`
20
+
- events: `128534`
21
+
- metadata sessions: `981`
22
+
- annotated sessions: `53`
23
+
- explicit runtime sessions: `179`
20
24
- ready: `True`
21
25
- strict research-grade sessions: `157`
22
26
- native strict sessions: `100`
23
27
- data_origin labeled sessions: `981`
24
28
- missing data_origin: `0`
25
29
- data_origin coverage: `100%`
30
+
- agent field coverage: `100%` (inline on events)
31
+
- provider field coverage: `99.8%` (inline on events)
- model distribution (top): doubao-seed-2.0-code `264`, deepseek-v4-pro `55`, gpt-5.4-mini `13`, gpt-5.5 `13`
26
34
27
35
## Phase 3D Documents
28
36
@@ -53,12 +61,33 @@ It is the hypothesis registry layer for runtime morphology research. It follows
53
61
- Three controlled-benchmark pilot sessions have been labeled with `data_origin=controlled_benchmark` and remain separate from the native lane.
54
62
- Tier 2 remains acquisition-only until the failure / intervention subset grows.
55
63
- Human-intervention acquisition target has been met for the current native lane.
64
+
- Intervention lanes must stay separate from the native direct-prompt baseline.
56
65
57
66
## Operating Rule
58
67
59
68
- Do not turn hypotheses into conclusions without corpus-backed validation.
60
69
- Do not move into prediction, anomaly modeling, or automatic diagnosis.
61
70
- Keep controlled benchmark and external trajectories in separate lanes.
71
+
- Keep routed-prompt and superpowers workflow traces separate from direct-prompt native traces.
72
+
- Do not move into Phase 4 yet.
73
+
74
+
## Metadata Density Warning
75
+
76
+
Current corpus scale is sufficient for validation-oriented work, but metadata density remains too low for stable theory finalization or default automation policy.
77
+
78
+
Current gap summary (explicit sidecar metadata):
79
+
80
+
- runtime missing: `1172`
81
+
- task_type missing: `1186`
82
+
- task_source missing: `1186`
83
+
- success missing: `1189`
84
+
- duration missing: `1351`
85
+
- human_intervention missing: `1255`
86
+
- model missing: `1331`
87
+
- repo_language missing: `1331`
88
+
- repo_size missing: `1331`
89
+
90
+
Note: agent and provider fields are now populated inline on all events (100% / 99.8% coverage), distinct from sidecar metadata tracked here.
62
91
63
92
## Next Action
64
93
@@ -71,3 +100,4 @@ Continue Tier 2 acquisition:
71
100
- non-native AskUserQuestion sessions have been marked as human_intervention=true, but they do not alter the native strict gate
72
101
- proxy failure candidates may be reviewed separately, but they do not change the native strict readiness gate
73
102
- follow the acquisition sprint note for the next batch of native samples
103
+
- treat `direct_prompt_native`, `routed_prompt_intervention`, `superpowers_workflow_intervention`, and `controlled_prompt_morphology` as separate lanes in analysis
0 commit comments