feat: v0.22.0 — semantic session diff (compare outcomes between two runs) by Siddhant-K-code · Pull Request #38 · Siddhant-K-code/agent-trace

Siddhant-K-code · 2026-04-11T16:33:28Z

Closes #28

What

Extends diff.py with a --semantic mode that compares two sessions at the outcome level rather than phase structure.

Output

Semantic diff: a1b2c3d4e5f6 vs b7c8d9e0f1a2
─────────────────────────────────────────────────────────────────────
                                 Session A    Session B    Change
─────────────────────────────────────────────────────────────────────
  Duration                          3m22s        2m14s      -33%
  Cost                            $0.0067      $0.0041      -39%
  Errors                                2            0      -100%
  Tool calls                           18           14       -22%
  LLM requests                          6            5       -17%
  Retries                               3            1       -67%
─────────────────────────────────────────────────────────────────────
  Files read (both)    src/main.py, tests/test_foo.py
  Files written (A only)  dist/bundle.js
  Files written (B only)  dist/bundle.min.js
  Commands (both)      pytest
─────────────────────────────────────────────────────────────────────
  Verdict: B is better

Verdict logic

B is better when it wins on more metrics (errors, cost, duration, retries) than A with no regressions. Ties → inconclusive.

Eval integration

When --eval-config points to a .agent-evals.yaml, eval scores for both sessions are included in the diff table.

CLI

agent-strace diff <session-a> <session-b> --semantic [--eval-config .agent-evals.yaml]

Tests

tests/test_semantic_diff.py — 10 tests.

diff.py gains a --semantic mode that compares two sessions at the outcome level: cost, duration, errors, retries, files read/written, commands run, and optional eval scores. Reports which files/commands were unique to each session and gives a verdict (A is better / B is better / inconclusive) based on errors, cost, duration, and retries. CLI: agent-strace diff <session-a> <session-b> --semantic [--eval-config] Closes #28 Co-authored-by: Ona <no-reply@ona.com>

Siddhant-K-code mentioned this pull request Apr 11, 2026

feat: wire policy, dashboard, annotate, token-budget into CLI (v0.13–v0.22) #39

Merged

Siddhant-K-code merged commit 13a916d into main Apr 11, 2026
4 checks passed

Siddhant-K-code deleted the feat/v0.22.0-semantic-diff branch April 11, 2026 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: v0.22.0 — semantic session diff (compare outcomes between two runs)#38

feat: v0.22.0 — semantic session diff (compare outcomes between two runs)#38
Siddhant-K-code merged 1 commit into
mainfrom
feat/v0.22.0-semantic-diff

Siddhant-K-code commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Siddhant-K-code commented Apr 11, 2026

What

Output

Verdict logic

Eval integration

CLI

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant