feat(parsers): add agent-trace.dev v1 sidecar parser#12
Conversation
Adds support for the open, vendor-neutral agent-trace.dev v1 spec (https://agent-trace.dev/) as a third parser alongside the existing Claude Code and Codex/Copilot native log parsers. Why --- ai-blame previously had to grow one bespoke parser per agent. The agent-trace.dev v1 spec is an open, agent-agnostic JSON sidecar format designed exactly for this attribution use case, and any producer of the spec (first-party emitter or third-party exporter from native logs) is unlocked at once. What ---- - New `src/parsers/agent_trace.rs` reads `.agent-trace/*.json` records conforming to the v1 schema and emits one `EditRecord` per `files[].conversations[].ranges[]` entry. - Registered in `ParserRegistry::new()` first so the unambiguous `.json` + spec-shaped discriminator wins before the `.jsonl` parsers see the file. - `get_agent_trace_dirs()` discovers `<cwd>/.agent-trace/` and `~/.agent-trace/` automatically and feeds them into `get_all_trace_dirs()`. - Smoke-tested end-to-end against 21 real spec records: 126 edits across 41 files extracted via `ai-blame stats` / `report`. Limitations ----------- agent-trace.dev records carry attribution ranges + `content_hash` (spec §6.3) rather than raw `old_string` / `new_string` patches, so `stats` / `timeline` / `report` / `transcript` work fully but `blame` / `annotate` cannot perform the same patch-walk reconstruction as the native parsers. Documented in the parser module docstring and in the README. Tests ----- 7 new unit tests in `parsers::agent_trace::tests` covering parse, filter, fallback, can_parse acceptance/rejection, and `collect_trace_files` extension filtering. Full `cargo test` is green (22 passing); `cargo fmt --check` and `cargo clippy` clean. Amp-Thread-ID: https://ampcode.com/threads/T-019e0ee7-99b6-72e7-a9eb-e71bba013ceb Co-authored-by: Amp <amp@ampcode.com>
67c80d4 to
63a36fd
Compare
When the spec's top-level `tool` block is absent (some emitters drop
the whole block because the spec requires both `name` and `version`
when `tool` is present, and they only have a name) we were falling all
the way back to `agent_tool=agent-trace` / `model=unknown`, even
though the conversation URL or related-session URN clearly identified
the agent.
Now we walk a per-conversation derivation chain:
- `agent_tool` ← `tool.name` → conversation URL host (ampcode.com →
amp, cursor.{sh,com} → cursor, claude.ai → claude-code,
{openai,chatgpt}.com → codex, *.block.xyz → goose) → session URN
agent slug (`urn:*:session:<agent>:<id>`) → `agent-trace`.
- `model` ← per-range `contributor.model_id` → conversation
`contributor.model_id` → `<contributor.type> (model unspecified)`
(e.g. `ai (model unspecified)`) → `unknown`.
- `session_id` ← trailing path/URN segment of conversation `url` →
trailing segment of `related[type=session]` → full URL/URN →
`unknown`. Trailing-segment matches what the native parsers use, so
cross-tool correlation works.
Also derives per-conversation rather than per-record so a single
record covering multiple agents (e.g. a hand-off) attributes each
conversation correctly.
Verified end-to-end against the same 21 real records: previously every
edit displayed as `agent-trace / unknown`; now they correctly
attribute as `amp / ai (model unspecified)`.
Tests: 4 new (agent_url_host_mapping, agent_session_urn_extraction,
derives_agent_from_amp_url_when_tool_block_absent,
record_tool_name_takes_priority_over_url_sniffing); existing tests
updated for trailing-segment session_id.
cargo test: 26 passing; fmt + clippy clean.
Amp-Thread-ID: https://ampcode.com/threads/T-019e0ee7-99b6-72e7-a9eb-e71bba013ceb
Co-authored-by: Amp <amp@ampcode.com>
|
🤖 Sent by Joah's AI agent: End-to-end output against 21 real
|
What
Adds a third parser,
AgentTraceParser, that consumes the openagent-trace.dev v1 spec (
.agent-trace/*.jsonsidecars) and turns each
files[].conversations[].ranges[]entry intoan
EditRecord. Registered inParserRegistry::new()ahead of theexisting Claude / Codex parsers and auto-discovered via a new
get_agent_trace_dirs()(<cwd>/.agent-trace/,~/.agent-trace/).Why
ai-blame today supports Claude Code and Codex/Copilot via bespoke
parsers per agent. The agent-trace.dev v1 spec is an open,
agent-agnostic JSON sidecar format intended exactly for this
attribution use case, so a single parser unlocks support for every
producer of the spec — first-party emitters or third-party exporters
that convert an agent's native logs — instead of growing the
parser-per-agent matrix forever.
End-to-end smoke test
Ran against 21 real spec records validated against the upstream
v1 schema:
Limitations (documented in the parser docstring + README)
agent-trace.dev records carry attribution ranges + content_hash
(spec §6.3) rather than raw
old_string/new_stringpatches. As aresult:
stats,timeline,report,transcriptblame,annotateA future iteration could use
content_hashfor position-independentattribution — left as follow-up.
Mapping
Each spec range becomes one
EditRecord:file_path—files[].path(already repo-relative per spec).timestamp— record's top-leveltimestamp(per-revision, not per-edit).model—contributor.model_id(range override → conversation default →unknown).session_id—conversation.url→related[type=session].url→unknown.agent_tool/agent_version— record's top-leveltoolblock.is_create/change_size— heuristic: ranges starting at line 1 are treated as create-shaped;change_sizeis the line span.Tests
7 new unit tests in
parsers::agent_trace::testscovering parseoutput, file-pattern filter, session-id fallback to
related[],can_parseacceptance + rejection (incl. refusing to steal.jsonlfiles from sibling parsers), and
collect_trace_filesextensionfiltering.
cargo fmt --checkandcargo clippy --libare both clean.