Skip to content

Commit 9c828bd

Browse files
Your Nameclaude
andcommitted
Fix agent/provider field coverage across all enrich pipelines
claude_project_parser.py: add _detect_provider() to infer LLM provider from model name, set agent="claude-code" and provider on all event types (thinking, tool_use, text) — previously only tool_use had agent set. claude_code.py: add agent="claude-code" to hook bridge record_call. tools/backfill_agent.py: one-shot script to backfill agent/provider on existing JSONL sessions. Agent coverage 3%→100%, provider 10%→99.8%. Update Phase 3D corpus baseline: sessions 1315→1351, events 64K→128K. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 93df2db commit 9c828bd

7 files changed

Lines changed: 292 additions & 4 deletions

File tree

causetrace/hooks/claude_code.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ def main() -> None:
7272
parent_event_id=parent_id,
7373
model=_CC_MODEL,
7474
provider=_CC_PROVIDER,
75+
agent="claude-code",
7576
duration_ms=duration,
7677
)
7778
_save_last_event_id(session_id, event.event_id)

causetrace/hooks/claude_project_parser.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,24 @@ def list_sessions() -> List[Dict[str, Any]]:
108108
return sessions
109109

110110

111+
def _detect_provider(model: Optional[str]) -> str:
112+
"""Infer LLM provider from model name."""
113+
if not model:
114+
return "anthropic"
115+
m = model.lower()
116+
if m.startswith("claude"):
117+
return "anthropic"
118+
if m.startswith("deepseek"):
119+
return "deepseek"
120+
if m.startswith("ark-") or m.startswith("doubao"):
121+
return "bytedance"
122+
if m.startswith("gpt-") or m.startswith("o1") or m.startswith("o3"):
123+
return "openai"
124+
if m.startswith("minimax"):
125+
return "minimax"
126+
return "anthropic"
127+
128+
111129
def parse_session(session_id: str) -> List[ToolEvent]:
112130
"""Parse a Claude Code project session JSONL into causally-linked events.
113131
@@ -147,6 +165,8 @@ def parse_session(session_id: str) -> List[ToolEvent]:
147165
parent_event_id=last_event_id,
148166
timestamp=obj.get("timestamp"),
149167
model=msg.get("model"),
168+
provider=_detect_provider(msg.get("model")),
169+
agent="claude-code",
150170
)
151171
events.append(event)
152172
last_event_id = event.event_id
@@ -162,6 +182,7 @@ def parse_session(session_id: str) -> List[ToolEvent]:
162182
parent_event_id=last_event_id,
163183
timestamp=obj.get("timestamp"),
164184
model=msg.get("model"),
185+
provider=_detect_provider(msg.get("model")),
165186
agent="claude-code",
166187
)
167188
events.append(event)
@@ -177,6 +198,8 @@ def parse_session(session_id: str) -> List[ToolEvent]:
177198
parent_event_id=last_event_id,
178199
timestamp=obj.get("timestamp"),
179200
model=msg.get("model"),
201+
provider=_detect_provider(msg.get("model")),
202+
agent="claude-code",
180203
)
181204
events.append(event)
182205
last_event_id = event.event_id

docs/research/README.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,27 @@ This directory groups the research tracks and branch studies that sit alongside
99
- [Phase 3C](phase3c/README.md)
1010
- [Phase 3D](phase3d/README.md)
1111

12+
## Current Research Status
13+
14+
`causetrace` now has enough corpus scale to support validation-oriented work, but not enough metadata density to support theory finalization or default automation policy.
15+
16+
Current snapshot:
17+
18+
- sessions: `1315`
19+
- events: `64429`
20+
- strict research-grade sessions: `157`
21+
- dominant_chain: `1111`
22+
- mixed: `195`
23+
- retry-heavy: `541`
24+
- branchy sessions: `179`
25+
- long sessions >=100 events: `53`
26+
27+
The next mainline stage is:
28+
29+
`Phase 3D-T2B: Intervention-aware Acquisition`
30+
31+
This stage continues Tier 2 acquisition while keeping workflow-intervention lanes separate from the native direct-prompt baseline.
32+
1233
## Cross-project Branch Studies
1334

1435
- [Cross-project Prompt Morphology Study](branches/cross_project_prompt_morphology/README.md)
@@ -48,6 +69,24 @@ Rules:
4869
- Do not merge routed traces into the native direct-prompt baseline.
4970
- If routed traces are analyzed, label them explicitly as routed.
5071

72+
## Workflow Intervention Lanes
73+
74+
Treat workflow intervention as a separate experimental axis from prompt posture.
75+
76+
| Type | Meaning | Mix into native direct-prompt conclusions? |
77+
| --- | --- | --- |
78+
| direct_prompt_native | User/developer gave the agent a task directly | Yes, under native rules |
79+
| routed_prompt_intervention | `prompt-routing-skill` selected the posture first | No, not directly |
80+
| superpowers_workflow_intervention | A structured workflow plugin changed the execution shape | No, not directly |
81+
| controlled_prompt_morphology | Controlled prompt comparison or pilot run | Controlled / intervention lane |
82+
83+
Rules:
84+
85+
- Analyze each workflow lane independently first.
86+
- Do not merge intervention traces into the native direct-prompt baseline.
87+
- Cross-lane comparison may report trends only.
88+
- Intervention-lane findings do not become universal policy without additional validation.
89+
5190
## Boundary
5291

5392
Branch studies and skills may inform hypotheses and workflow choices, but they do not change the `causetrace` core boundary by themselves.

docs/research/phase3d/README.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ The registry is not for:
3030
- taxonomy changes
3131
- external trajectory ingestion
3232

33+
The next mainline stage is `Phase 3D-T2B: Intervention-aware Acquisition`.
34+
It keeps Tier 2 acquisition active while separating workflow-intervention lanes from the native direct-prompt baseline.
35+
3336
## Working Documents
3437

3538
- [Execution summary](execution_summary_v0.2.5.md)
@@ -48,6 +51,24 @@ The registry is not for:
4851
- [Hypothesis prioritization](hypothesis_prioritization_v0.2.5.md)
4952
- [Cross-project Prompt Morphology Study](../branches/cross_project_prompt_morphology/README.md)
5053

54+
## Intervention Lanes
55+
56+
Phase 3D now treats workflow intervention as a separate analysis axis.
57+
58+
| Lane | Meaning | Direct native baseline? |
59+
| --- | --- | --- |
60+
| `direct_prompt_native` | User/developer gave the agent a task directly | Yes |
61+
| `routed_prompt_intervention` | `prompt-routing-skill` selected the posture first | No |
62+
| `superpowers_workflow_intervention` | A structured workflow plugin changed the execution shape | No |
63+
| `controlled_prompt_morphology` | Controlled prompt comparison or pilot run | No |
64+
65+
Rules:
66+
67+
- Analyze each lane independently first.
68+
- Do not merge intervention traces into the native direct-prompt baseline.
69+
- Cross-lane comparison may report trends only.
70+
- Intervention-lane findings do not become universal policy without additional validation.
71+
5172
## Hypothesis Categories
5273

5374
- runtime morphology hypotheses
@@ -69,6 +90,23 @@ These are hypotheses only, not conclusions. Canonical entries live in the regist
6990
- Record negative results alongside positive ones.
7091
- Do not treat the hypotheses as ontology.
7192
- Do not promote external research into conclusions without causetrace corpus evidence.
93+
- Treat workflow intervention lanes as separate from native direct-prompt analysis.
94+
- Do not move into Phase 4 theory finalization yet.
95+
96+
## Metadata Density Warning
97+
98+
Current corpus scale is sufficient for validation-oriented work, but metadata density remains too low for stable theory finalization or default automation policy.
99+
100+
Current gap summary:
101+
102+
- runtime missing: `1136`
103+
- task_type missing: `1150`
104+
- task_source missing: `1150`
105+
- success missing: `1153`
106+
- duration missing: `1315`
107+
- human_intervention missing: `1219`
108+
109+
The next stage should continue acquisition and lane separation before any universal prompt policy or runtime theory is attempted.
72110

73111
## Upstream Reference
74112

docs/research/phase3d/baseline_v0.2.5.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,20 @@ It is not a validation result and it is not a conclusion. It is the starting poi
66

77
## Corpus Snapshot
88

9-
- sessions: `981`
10-
- events: `30024`
9+
- sessions: `1351`
10+
- events: `128534`
1111
- ready: `True`
1212
- strict research-grade sessions: `157`
1313
- native strict sessions: `100`
1414
- data_origin labeled sessions: `981`
1515
- missing data_origin: `0`
1616
- data_origin coverage: `100%`
17+
- agent field coverage: `100%` (inline on events)
18+
- provider field coverage: `99.8%` (inline on events)
1719
- runtime breadth: `7`
1820
- task breadth: `9`
21+
- runtime counts: opencode `1131`, claude-code `179`, codex `29`, aider `2`
22+
- model counts (top): doubao-seed-2.0-code `264`, deepseek-v4-pro `55`, gpt-5.4-mini `13`, gpt-5.5 `13`
1923

2024
## Baseline Observations
2125

@@ -25,6 +29,8 @@ It is not a validation result and it is not a conclusion. It is the starting poi
2529
- Runtime distribution is still uneven, with `anthropic` and `claude-code` accounting for most explicit runtime labels.
2630
- Task distribution is still uneven, with `debug_test`, `feature_add`, `exploration`, and `review` dominating the labeled subset.
2731
- Failure and human-intervention coverage remain limited relative to the overall corpus, so failure morphology and intervention morphology should be treated as weaker, later-stage candidates.
32+
- Agent and provider fields are now populated inline on 100% / 99.8% of events (v0.2.5 parser fix), enabling reliable per-event runtime attribution for the first time.
33+
- Claude Code sessions predominantly use deepseek-v4-pro as the underlying model (via proxy), with doubao-seed-2.0-code as secondary.
2834

2935
## Tier 1 Hypotheses To Check First
3036

docs/research/phase3d/status.md

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
Phase 3D is active.
44

55
It is the hypothesis registry layer for runtime morphology research. It follows the descriptive work in Phase 3A, 3B, and 3C.
6+
The next mainline stage is `Phase 3D-T2B: Intervention-aware Acquisition`, which continues Tier 2 acquisition while keeping workflow-intervention lanes separate from the native direct-prompt baseline.
67

78
## Current Position
89

@@ -15,14 +16,21 @@ It is the hypothesis registry layer for runtime morphology research. It follows
1516

1617
## Current Corpus Baseline
1718

18-
- sessions: `981`
19-
- events: `30024`
19+
- sessions: `1351`
20+
- events: `128534`
21+
- metadata sessions: `981`
22+
- annotated sessions: `53`
23+
- explicit runtime sessions: `179`
2024
- ready: `True`
2125
- strict research-grade sessions: `157`
2226
- native strict sessions: `100`
2327
- data_origin labeled sessions: `981`
2428
- missing data_origin: `0`
2529
- data_origin coverage: `100%`
30+
- agent field coverage: `100%` (inline on events)
31+
- provider field coverage: `99.8%` (inline on events)
32+
- runtime distribution: opencode `1131`, claude-code `179`, codex `29`, aider `2`
33+
- model distribution (top): doubao-seed-2.0-code `264`, deepseek-v4-pro `55`, gpt-5.4-mini `13`, gpt-5.5 `13`
2634

2735
## Phase 3D Documents
2836

@@ -53,12 +61,33 @@ It is the hypothesis registry layer for runtime morphology research. It follows
5361
- Three controlled-benchmark pilot sessions have been labeled with `data_origin=controlled_benchmark` and remain separate from the native lane.
5462
- Tier 2 remains acquisition-only until the failure / intervention subset grows.
5563
- Human-intervention acquisition target has been met for the current native lane.
64+
- Intervention lanes must stay separate from the native direct-prompt baseline.
5665

5766
## Operating Rule
5867

5968
- Do not turn hypotheses into conclusions without corpus-backed validation.
6069
- Do not move into prediction, anomaly modeling, or automatic diagnosis.
6170
- Keep controlled benchmark and external trajectories in separate lanes.
71+
- Keep routed-prompt and superpowers workflow traces separate from direct-prompt native traces.
72+
- Do not move into Phase 4 yet.
73+
74+
## Metadata Density Warning
75+
76+
Current corpus scale is sufficient for validation-oriented work, but metadata density remains too low for stable theory finalization or default automation policy.
77+
78+
Current gap summary (explicit sidecar metadata):
79+
80+
- runtime missing: `1172`
81+
- task_type missing: `1186`
82+
- task_source missing: `1186`
83+
- success missing: `1189`
84+
- duration missing: `1351`
85+
- human_intervention missing: `1255`
86+
- model missing: `1331`
87+
- repo_language missing: `1331`
88+
- repo_size missing: `1331`
89+
90+
Note: agent and provider fields are now populated inline on all events (100% / 99.8% coverage), distinct from sidecar metadata tracked here.
6291

6392
## Next Action
6493

@@ -71,3 +100,4 @@ Continue Tier 2 acquisition:
71100
- non-native AskUserQuestion sessions have been marked as human_intervention=true, but they do not alter the native strict gate
72101
- proxy failure candidates may be reviewed separately, but they do not change the native strict readiness gate
73102
- follow the acquisition sprint note for the next batch of native samples
103+
- treat `direct_prompt_native`, `routed_prompt_intervention`, `superpowers_workflow_intervention`, and `controlled_prompt_morphology` as separate lanes in analysis

0 commit comments

Comments
 (0)