Skip to content

feat(parsers): add agent-trace.dev v1 sidecar parser#12

Closed
joahg wants to merge 2 commits into
ai4curation:mainfrom
joahg:joah/agent-trace-dev-parser
Closed

feat(parsers): add agent-trace.dev v1 sidecar parser#12
joahg wants to merge 2 commits into
ai4curation:mainfrom
joahg:joah/agent-trace-dev-parser

Conversation

@joahg

@joahg joahg commented May 10, 2026

Copy link
Copy Markdown

What

Adds a third parser, AgentTraceParser, that consumes the open
agent-trace.dev v1 spec (.agent-trace/*.json
sidecars) and turns each files[].conversations[].ranges[] entry into
an EditRecord. Registered in ParserRegistry::new() ahead of the
existing Claude / Codex parsers and auto-discovered via a new
get_agent_trace_dirs() (<cwd>/.agent-trace/, ~/.agent-trace/).

Why

ai-blame today supports Claude Code and Codex/Copilot via bespoke
parsers per agent. The agent-trace.dev v1 spec is an open,
agent-agnostic JSON sidecar format intended exactly for this
attribution use case, so a single parser unlocks support for every
producer of the spec — first-party emitters or third-party exporters
that convert an agent's native logs — instead of growing the
parser-per-agent matrix forever.

End-to-end smoke test

Ran against 21 real spec records validated against the upstream
v1 schema:

$ ai-blame stats -t .agent-trace
Files with edits (all files): 41
Total successful edits: 126

$ ai-blame report -t .agent-trace
=== Summary ===
File                                               | Edits | First Edit       | Last Edit       
---------------------------------------------------------------------------------------------------
i18n.md                                            |     2 | 2026-04-13 16:28 | 2026-04-13 16:28
SKILL.md                                           |     6 | 2026-04-17 16:54 | 2026-04-17 16:54
verify-nx-required-tasks.test.ts                   |     6 | 2026-05-08 22:30 | 2026-05-08 22:30
…

Limitations (documented in the parser docstring + README)

agent-trace.dev records carry attribution ranges + content_hash
(spec §6.3) rather than raw old_string / new_string patches. As a
result:

Command Works on agent-trace records?
stats, timeline, report, transcript ✅ Fully
blame, annotate ⚠️ Marks touched lines but cannot do the patch-walk reconstruction the native parsers do

A future iteration could use content_hash for position-independent
attribution — left as follow-up.

Mapping

Each spec range becomes one EditRecord:

  • file_pathfiles[].path (already repo-relative per spec).
  • timestamp — record's top-level timestamp (per-revision, not per-edit).
  • modelcontributor.model_id (range override → conversation default → unknown).
  • session_idconversation.urlrelated[type=session].urlunknown.
  • agent_tool / agent_version — record's top-level tool block.
  • is_create / change_size — heuristic: ranges starting at line 1 are treated as create-shaped; change_size is the line span.

Tests

7 new unit tests in parsers::agent_trace::tests covering parse
output, file-pattern filter, session-id fallback to related[],
can_parse acceptance + rejection (incl. refusing to steal .jsonl
files from sibling parsers), and collect_trace_files extension
filtering.

$ cargo test --lib
test result: ok. 22 passed; 0 failed; 0 ignored

cargo fmt --check and cargo clippy --lib are both clean.

@joahg joahg marked this pull request as ready for review May 10, 2026 01:16
Adds support for the open, vendor-neutral agent-trace.dev v1 spec
(https://agent-trace.dev/) as a third parser alongside the existing
Claude Code and Codex/Copilot native log parsers.

Why
---

ai-blame previously had to grow one bespoke parser per agent. The
agent-trace.dev v1 spec is an open, agent-agnostic JSON sidecar format
designed exactly for this attribution use case, and any producer of
the spec (first-party emitter or third-party exporter from native
logs) is unlocked at once.

What
----

- New `src/parsers/agent_trace.rs` reads `.agent-trace/*.json` records
  conforming to the v1 schema and emits one `EditRecord` per
  `files[].conversations[].ranges[]` entry.
- Registered in `ParserRegistry::new()` first so the unambiguous
  `.json` + spec-shaped discriminator wins before the `.jsonl`
  parsers see the file.
- `get_agent_trace_dirs()` discovers `<cwd>/.agent-trace/` and
  `~/.agent-trace/` automatically and feeds them into
  `get_all_trace_dirs()`.
- Smoke-tested end-to-end against 21 real spec records: 126 edits
  across 41 files extracted via `ai-blame stats` / `report`.

Limitations
-----------

agent-trace.dev records carry attribution ranges + `content_hash`
(spec §6.3) rather than raw `old_string` / `new_string` patches, so
`stats` / `timeline` / `report` / `transcript` work fully but
`blame` / `annotate` cannot perform the same patch-walk
reconstruction as the native parsers. Documented in the parser module
docstring and in the README.

Tests
-----

7 new unit tests in `parsers::agent_trace::tests` covering parse,
filter, fallback, can_parse acceptance/rejection, and
`collect_trace_files` extension filtering. Full `cargo test` is
green (22 passing); `cargo fmt --check` and `cargo clippy` clean.

Amp-Thread-ID: https://ampcode.com/threads/T-019e0ee7-99b6-72e7-a9eb-e71bba013ceb
Co-authored-by: Amp <amp@ampcode.com>
@joahg joahg force-pushed the joah/agent-trace-dev-parser branch from 67c80d4 to 63a36fd Compare May 10, 2026 01:19
When the spec's top-level `tool` block is absent (some emitters drop
the whole block because the spec requires both `name` and `version`
when `tool` is present, and they only have a name) we were falling all
the way back to `agent_tool=agent-trace` / `model=unknown`, even
though the conversation URL or related-session URN clearly identified
the agent.

Now we walk a per-conversation derivation chain:

- `agent_tool` ← `tool.name` → conversation URL host (ampcode.com →
  amp, cursor.{sh,com} → cursor, claude.ai → claude-code,
  {openai,chatgpt}.com → codex, *.block.xyz → goose) → session URN
  agent slug (`urn:*:session:<agent>:<id>`) → `agent-trace`.
- `model` ← per-range `contributor.model_id` → conversation
  `contributor.model_id` → `<contributor.type> (model unspecified)`
  (e.g. `ai (model unspecified)`) → `unknown`.
- `session_id` ← trailing path/URN segment of conversation `url` →
  trailing segment of `related[type=session]` → full URL/URN →
  `unknown`. Trailing-segment matches what the native parsers use, so
  cross-tool correlation works.

Also derives per-conversation rather than per-record so a single
record covering multiple agents (e.g. a hand-off) attributes each
conversation correctly.

Verified end-to-end against the same 21 real records: previously every
edit displayed as `agent-trace / unknown`; now they correctly
attribute as `amp / ai (model unspecified)`.

Tests: 4 new (agent_url_host_mapping, agent_session_urn_extraction,
derives_agent_from_amp_url_when_tool_block_absent,
record_tool_name_takes_priority_over_url_sniffing); existing tests
updated for trailing-segment session_id.

cargo test: 26 passing; fmt + clippy clean.

Amp-Thread-ID: https://ampcode.com/threads/T-019e0ee7-99b6-72e7-a9eb-e71bba013ceb
Co-authored-by: Amp <amp@ampcode.com>
@joahg

joahg commented May 10, 2026

Copy link
Copy Markdown
Author

🤖 Sent by Joah's AI agent:

End-to-end output against 21 real .agent-trace/*.json v1 sidecars (126 edits across 41 files):

stats

$ ai-blame stats -t .agent-trace
Trace directory: ".agent-trace"
Trace files: 0          ← legacy .jsonl-only counter
  Session traces: 0
  Agent traces: 0

Files with edits (all files): 41
Total successful edits: 126

timeline -n 0

All 126 entries correctly attributed (representative sample):

=== Timeline of Actions ===
Showing 126 most recent edits

Timestamp            Action     File                                               Model                     Agent
-----------------------------------------------------------------------------------------------------------------------------
2026-05-08 22:30:52  CREATED    .../square-web-platform-stamps/apps/OWNERS.yaml    ai (model unspecified)    amp
2026-05-08 22:30:52  CREATED    .../verify-nx-required-tasks.ts                    ai (model unspecified)    amp
2026-05-08 22:30:52  CREATED    .../square-web-platform-stamps/scripts/ci.sh       ai (model unspecified)    amp
2026-05-08 22:30:52  CREATED    .../package-json-change-detector.spec.ts           ai (model unspecified)    amp
2026-05-08 22:30:52  CREATED    .../platform/infra-project-scaffolder/src/new.ts   ai (model unspecified)    amp
…
2026-04-23 22:49:47  CREATED    .../MenuShow/hooks/useFetchMenuItems.ts            ai (model unspecified)    amp
2026-04-17 16:54:52  CREATED    .agents/skills/saas-experiments/SKILL.md           ai (model unspecified)    amp
2026-04-17 16:54:52  CREATED    .agents/skills/feature-flags/SKILL.md              ai (model unspecified)    amp
2026-04-17 16:54:52  CREATED    .agents/skills/tracking-events/SKILL.md            ai (model unspecified)    amp
2026-04-13 18:27:13  CREATED    libs/trust/shared-ui/stylelint.config.mjs          ai (model unspecified)    amp
2026-04-13 18:27:13  CREATED    .../components/address/{ca,us}/address.module.css  ai (model unspecified)    amp
2026-04-13 17:37:42  CREATED    apps/managerbot/managerbot-e2e/tests/e2e/…         ai (model unspecified)    amp
2026-04-13 17:37:42  CREATED    libs/shared/util-tests/src/market/actions.ts       ai (model unspecified)    amp
2026-04-13 17:10:27  CREATED    MODULES.yaml                                       ai (model unspecified)    amp
2026-04-13 16:28:17  CREATED    .agents/checks/{i18n,testing}.md                   ai (model unspecified)    amp
2026-04-13 14:12:21  CREATED    .ai-usage-marker                                   ai (model unspecified)    amp
2026-04-13 14:10:09  CREATED    libs/shared/util-tests/src/initialize.ts           ai (model unspecified)    amp
2026-04-10 22:02:01  CREATED    libs/shared/types-protos/protogen.config.ts        ai (model unspecified)    amp

Total edits found: 126

report

Per-file summary + output plan + sidecar previews:

$ ai-blame report -t .agent-trace
Scanning traces in: ".agent-trace"

=== Summary ===
File                                               | Edits | First Edit       | Last Edit
---------------------------------------------------------------------------------------------
i18n.md                                            |     2 | 2026-04-13 16:28 | 2026-04-13 16:28
testing.md                                         |     2 | 2026-04-13 16:28 | 2026-04-13 16:28
SKILL.md                                           |     6 | 2026-04-17 16:54 | 2026-04-17 16:54
SKILL.md                                           |     4 | 2026-04-17 16:54 | 2026-04-17 16:54
…41 rows total…
initialize.ts                                      |    10 | 2026-04-13 14:10 | 2026-04-13 14:10
actions.ts                                         |     8 | 2026-04-13 17:37 | 2026-04-13 17:37
verify-owners-file.ts                              |     2 | 2026-04-13 14:12 | 2026-04-13 14:12

=== Output Plan ===
File                                               | Policy     | destination
---------------------------------------------------------------------------------------------
i18n.md                                            | sidecar    | .agents/checks/i18n.history.yaml
SKILL.md                                           | sidecar    | .agents/skills/saas-experiments/SKILL.history.yaml
ci.sh                                              | sidecar    | …/scripts/ci.history.yaml
OWNERS.yaml                                        | append     | in-place
…41 rows total…

=== YAML Preview: .agents/checks/i18n.md ===
edit_history:
- timestamp: 2026-04-13T16:28:17.539Z
  model: ai (model unspecified)
  action: CREATED
  agent_tool: amp
- timestamp: 2026-04-13T16:28:17.539Z
  model: ai (model unspecified)
  action: EDITED
  agent_tool: amp

=== YAML Preview: .agents/skills/ci-analytics/SKILL.md ===
edit_history:
- timestamp: 2026-04-17T16:54:52.894Z
  model: ai (model unspecified)
  action: CREATED
  agent_tool: amp
- timestamp: 2026-04-17T16:54:52.894Z
  model: ai (model unspecified)
  action: EDITED
  agent_tool: amp
…
… and 36 more files (use --show-all to see all)

The model column reads ai (model unspecified) because these particular sidecars set contributor.type: "ai" but no model_id. Records that include a model_id (e.g. anthropic/claude-opus-4-5) populate the column literally — covered by the parses_spec_record_into_edits test.

The Agent: amp column is recovered from the conversation URL host (ampcode.com) even though these records have no top-level tool block — the new derivation chain on the second commit (agent-trace: derive agent + model from any available signal) is what unlocks this.

@joahg joahg closed this May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant