Fix agent/provider field coverage across all enrich pipelines

Your Name · claude · Your Name · commit 9c828bdf8567 · 2026-06-13T10:57:03.000+08:00
claude_project_parser.py: add _detect_provider() to infer LLM provider
from model name, set agent="claude-code" and provider on all event types
(thinking, tool_use, text) — previously only tool_use had agent set.

claude_code.py: add agent="claude-code" to hook bridge record_call.

tools/backfill_agent.py: one-shot script to backfill agent/provider on
existing JSONL sessions. Agent coverage 3%→100%, provider 10%→99.8%.

Update Phase 3D corpus baseline: sessions 1315→1351, events 64K→128K.

Co-Authored-By: Claude Opus 4.7 &lt;noreply@anthropic.com&gt;
diff --git a/causetrace/hooks/claude_code.py b/causetrace/hooks/claude_code.py
@@ -72,6 +72,7 @@ def main() -> None:
             parent_event_id=parent_id,
             model=_CC_MODEL,
             provider=_CC_PROVIDER,
+            agent="claude-code",
             duration_ms=duration,
         )
         _save_last_event_id(session_id, event.event_id)
diff --git a/causetrace/hooks/claude_project_parser.py b/causetrace/hooks/claude_project_parser.py
@@ -108,6 +108,24 @@ def list_sessions() -> List[Dict[str, Any]]:
     return sessions
 
 
+def _detect_provider(model: Optional[str]) -> str:
+    """Infer LLM provider from model name."""
+    if not model:
+        return "anthropic"
+    m = model.lower()
+    if m.startswith("claude"):
+        return "anthropic"
+    if m.startswith("deepseek"):
+        return "deepseek"
+    if m.startswith("ark-") or m.startswith("doubao"):
+        return "bytedance"
+    if m.startswith("gpt-") or m.startswith("o1") or m.startswith("o3"):
+        return "openai"
+    if m.startswith("minimax"):
+        return "minimax"
+    return "anthropic"
+
+
 def parse_session(session_id: str) -> List[ToolEvent]:
     """Parse a Claude Code project session JSONL into causally-linked events.
 
@@ -147,6 +165,8 @@ def parse_session(session_id: str) -> List[ToolEvent]:
                     parent_event_id=last_event_id,
                     timestamp=obj.get("timestamp"),
                     model=msg.get("model"),
+                    provider=_detect_provider(msg.get("model")),
+                    agent="claude-code",
                 )
                 events.append(event)
                 last_event_id = event.event_id
@@ -162,6 +182,7 @@ def parse_session(session_id: str) -> List[ToolEvent]:
                     parent_event_id=last_event_id,
                     timestamp=obj.get("timestamp"),
                     model=msg.get("model"),
+                    provider=_detect_provider(msg.get("model")),
                     agent="claude-code",
                 )
                 events.append(event)
@@ -177,6 +198,8 @@ def parse_session(session_id: str) -> List[ToolEvent]:
                         parent_event_id=last_event_id,
                         timestamp=obj.get("timestamp"),
                         model=msg.get("model"),
+                        provider=_detect_provider(msg.get("model")),
+                        agent="claude-code",
                     )
                     events.append(event)
                     last_event_id = event.event_id
diff --git a/docs/research/README.md b/docs/research/README.md
@@ -9,6 +9,27 @@ This directory groups the research tracks and branch studies that sit alongside
 - [Phase 3C](phase3c/README.md)
 - [Phase 3D](phase3d/README.md)
 
+## Current Research Status
+
+`causetrace` now has enough corpus scale to support validation-oriented work, but not enough metadata density to support theory finalization or default automation policy.
+
+Current snapshot:
+
+- sessions: `1315`
+- events: `64429`
+- strict research-grade sessions: `157`
+- dominant_chain: `1111`
+- mixed: `195`
+- retry-heavy: `541`
+- branchy sessions: `179`
+- long sessions >=100 events: `53`
+
+The next mainline stage is:
+
+`Phase 3D-T2B: Intervention-aware Acquisition`
+
+This stage continues Tier 2 acquisition while keeping workflow-intervention lanes separate from the native direct-prompt baseline.
+
 ## Cross-project Branch Studies
 
 - [Cross-project Prompt Morphology Study](branches/cross_project_prompt_morphology/README.md)
@@ -48,6 +69,24 @@ Rules:
 - Do not merge routed traces into the native direct-prompt baseline.
 - If routed traces are analyzed, label them explicitly as routed.
 
+## Workflow Intervention Lanes
+
+Treat workflow intervention as a separate experimental axis from prompt posture.
+
+| Type | Meaning | Mix into native direct-prompt conclusions? |
+| --- | --- | --- |
+| direct_prompt_native | User/developer gave the agent a task directly | Yes, under native rules |
+| routed_prompt_intervention | `prompt-routing-skill` selected the posture first | No, not directly |
+| superpowers_workflow_intervention | A structured workflow plugin changed the execution shape | No, not directly |
+| controlled_prompt_morphology | Controlled prompt comparison or pilot run | Controlled / intervention lane |
+
+Rules:
+
+- Analyze each workflow lane independently first.
+- Do not merge intervention traces into the native direct-prompt baseline.
+- Cross-lane comparison may report trends only.
+- Intervention-lane findings do not become universal policy without additional validation.
+
 ## Boundary
 
 Branch studies and skills may inform hypotheses and workflow choices, but they do not change the `causetrace` core boundary by themselves.
diff --git a/docs/research/phase3d/README.md b/docs/research/phase3d/README.md
@@ -30,6 +30,9 @@ The registry is not for:
 - taxonomy changes
 - external trajectory ingestion
 
+The next mainline stage is `Phase 3D-T2B: Intervention-aware Acquisition`.
+It keeps Tier 2 acquisition active while separating workflow-intervention lanes from the native direct-prompt baseline.
+
 ## Working Documents
 
 - [Execution summary](execution_summary_v0.2.5.md)
@@ -48,6 +51,24 @@ The registry is not for:
 - [Hypothesis prioritization](hypothesis_prioritization_v0.2.5.md)
 - [Cross-project Prompt Morphology Study](../branches/cross_project_prompt_morphology/README.md)
 
+## Intervention Lanes
+
+Phase 3D now treats workflow intervention as a separate analysis axis.
+
+| Lane | Meaning | Direct native baseline? |
+| --- | --- | --- |
+| `direct_prompt_native` | User/developer gave the agent a task directly | Yes |
+| `routed_prompt_intervention` | `prompt-routing-skill` selected the posture first | No |
+| `superpowers_workflow_intervention` | A structured workflow plugin changed the execution shape | No |
+| `controlled_prompt_morphology` | Controlled prompt comparison or pilot run | No |
+
+Rules:
+
+- Analyze each lane independently first.
+- Do not merge intervention traces into the native direct-prompt baseline.
+- Cross-lane comparison may report trends only.
+- Intervention-lane findings do not become universal policy without additional validation.
+
 ## Hypothesis Categories
 
 - runtime morphology hypotheses
@@ -69,6 +90,23 @@ These are hypotheses only, not conclusions. Canonical entries live in the regist
 - Record negative results alongside positive ones.
 - Do not treat the hypotheses as ontology.
 - Do not promote external research into conclusions without causetrace corpus evidence.
+- Treat workflow intervention lanes as separate from native direct-prompt analysis.
+- Do not move into Phase 4 theory finalization yet.
+
+## Metadata Density Warning
+
+Current corpus scale is sufficient for validation-oriented work, but metadata density remains too low for stable theory finalization or default automation policy.
+
+Current gap summary:
+
+- runtime missing: `1136`
+- task_type missing: `1150`
+- task_source missing: `1150`
+- success missing: `1153`
+- duration missing: `1315`
+- human_intervention missing: `1219`
+
+The next stage should continue acquisition and lane separation before any universal prompt policy or runtime theory is attempted.
 
 ## Upstream Reference
 
diff --git a/docs/research/phase3d/baseline_v0.2.5.md b/docs/research/phase3d/baseline_v0.2.5.md
@@ -6,16 +6,20 @@ It is not a validation result and it is not a conclusion. It is the starting poi
 
 ## Corpus Snapshot
 
-- sessions: `981`
-- events: `30024`
+- sessions: `1351`
+- events: `128534`
 - ready: `True`
 - strict research-grade sessions: `157`
 - native strict sessions: `100`
 - data_origin labeled sessions: `981`
 - missing data_origin: `0`
 - data_origin coverage: `100%`
+- agent field coverage: `100%` (inline on events)
+- provider field coverage: `99.8%` (inline on events)
 - runtime breadth: `7`
 - task breadth: `9`
+- runtime counts: opencode `1131`, claude-code `179`, codex `29`, aider `2`
+- model counts (top): doubao-seed-2.0-code `264`, deepseek-v4-pro `55`, gpt-5.4-mini `13`, gpt-5.5 `13`
 
 ## Baseline Observations
 
@@ -25,6 +29,8 @@ It is not a validation result and it is not a conclusion. It is the starting poi
 - Runtime distribution is still uneven, with `anthropic` and `claude-code` accounting for most explicit runtime labels.
 - Task distribution is still uneven, with `debug_test`, `feature_add`, `exploration`, and `review` dominating the labeled subset.
 - Failure and human-intervention coverage remain limited relative to the overall corpus, so failure morphology and intervention morphology should be treated as weaker, later-stage candidates.
+- Agent and provider fields are now populated inline on 100% / 99.8% of events (v0.2.5 parser fix), enabling reliable per-event runtime attribution for the first time.
+- Claude Code sessions predominantly use deepseek-v4-pro as the underlying model (via proxy), with doubao-seed-2.0-code as secondary.
 
 ## Tier 1 Hypotheses To Check First
 
diff --git a/docs/research/phase3d/status.md b/docs/research/phase3d/status.md
@@ -3,6 +3,7 @@
 Phase 3D is active.
 
 It is the hypothesis registry layer for runtime morphology research. It follows the descriptive work in Phase 3A, 3B, and 3C.
+The next mainline stage is `Phase 3D-T2B: Intervention-aware Acquisition`, which continues Tier 2 acquisition while keeping workflow-intervention lanes separate from the native direct-prompt baseline.
 
 ## Current Position
 
@@ -15,14 +16,21 @@ It is the hypothesis registry layer for runtime morphology research. It follows
 
 ## Current Corpus Baseline
 
-- sessions: `981`
-- events: `30024`
+- sessions: `1351`
+- events: `128534`
+- metadata sessions: `981`
+- annotated sessions: `53`
+- explicit runtime sessions: `179`
 - ready: `True`
 - strict research-grade sessions: `157`
 - native strict sessions: `100`
 - data_origin labeled sessions: `981`
 - missing data_origin: `0`
 - data_origin coverage: `100%`
+- agent field coverage: `100%` (inline on events)
+- provider field coverage: `99.8%` (inline on events)
+- runtime distribution: opencode `1131`, claude-code `179`, codex `29`, aider `2`
+- model distribution (top): doubao-seed-2.0-code `264`, deepseek-v4-pro `55`, gpt-5.4-mini `13`, gpt-5.5 `13`
 
 ## Phase 3D Documents
 
@@ -53,12 +61,33 @@ It is the hypothesis registry layer for runtime morphology research. It follows
 - Three controlled-benchmark pilot sessions have been labeled with `data_origin=controlled_benchmark` and remain separate from the native lane.
 - Tier 2 remains acquisition-only until the failure / intervention subset grows.
 - Human-intervention acquisition target has been met for the current native lane.
+- Intervention lanes must stay separate from the native direct-prompt baseline.
 
 ## Operating Rule
 
 - Do not turn hypotheses into conclusions without corpus-backed validation.
 - Do not move into prediction, anomaly modeling, or automatic diagnosis.
 - Keep controlled benchmark and external trajectories in separate lanes.
+- Keep routed-prompt and superpowers workflow traces separate from direct-prompt native traces.
+- Do not move into Phase 4 yet.
+
+## Metadata Density Warning
+
+Current corpus scale is sufficient for validation-oriented work, but metadata density remains too low for stable theory finalization or default automation policy.
+
+Current gap summary (explicit sidecar metadata):
+
+- runtime missing: `1172`
+- task_type missing: `1186`
+- task_source missing: `1186`
+- success missing: `1189`
+- duration missing: `1351`
+- human_intervention missing: `1255`
+- model missing: `1331`
+- repo_language missing: `1331`
+- repo_size missing: `1331`
+
+Note: agent and provider fields are now populated inline on all events (100% / 99.8% coverage), distinct from sidecar metadata tracked here.
 
 ## Next Action
 
@@ -71,3 +100,4 @@ Continue Tier 2 acquisition:
 - non-native AskUserQuestion sessions have been marked as human_intervention=true, but they do not alter the native strict gate
 - proxy failure candidates may be reviewed separately, but they do not change the native strict readiness gate
 - follow the acquisition sprint note for the next batch of native samples
+- treat `direct_prompt_native`, `routed_prompt_intervention`, `superpowers_workflow_intervention`, and `controlled_prompt_morphology` as separate lanes in analysis
diff --git a/tools/backfill_agent.py b/tools/backfill_agent.py

Original file line number	Diff line number	Diff line change
`@@ -72,6 +72,7 @@ def main() -> None:`
`72`	`72`	`parent_event_id=parent_id,`
`73`	`73`	`model=_CC_MODEL,`
`74`	`74`	`provider=_CC_PROVIDER,`
	`75`	`+ agent="claude-code",`
`75`	`76`	`duration_ms=duration,`
`76`	`77`	`)`
`77`	`78`	`_save_last_event_id(session_id, event.event_id)`