Skip to content

Releases: Siddhant-K-code/agent-trace

v0.42.1

23 May 02:39
c3fbdaa

Choose a tag to compare

  • docs(agents-md): add AGENTS.md integration guide and repo AGENTS.md (#109)

v0.42.0

23 May 02:37
f7fb67e

Choose a tag to compare

  • feat(anonymize): add trace anonymization for export (#108)

v0.41.0

23 May 02:33
c1054cc

Choose a tag to compare

  • feat(sample): add dataset auto-sampler for regression suite export (#107)

v0.40.0

23 May 02:30
079a57a

Choose a tag to compare

  • feat(retention): add session data retention management (#106)

v0.39.1

23 May 02:26
1f7faf2

Choose a tag to compare

  • fix(replay): add --limit flag and progress indicator for large sessions (#105)

v0.39.0

23 May 02:23
47da174

Choose a tag to compare

  • feat(watch): add --timeout, --budget, --on-death flags with post-mortem JSON (#104)
  • ci: add workflow_dispatch to publish workflow for manual re-runs

VS Code extension v0.2.1

23 May 09:08
11efc5d

Choose a tag to compare

Patch release to update the extension README on Open VSX and VS Marketplace.

  • README rewritten with full v0.2.0 feature coverage: session browser, post-mortem viewer, watchdog integration
  • Complete command table (all 8 commands)
  • Complete settings table (all 6 settings with correct names and defaults)
  • No functional changes to the extension

v0.38.1

20 May 09:29
ebb97d9

Choose a tag to compare

  • Fix latest session selection by start time (#80) | Thanks to @grp06

v0.38.0 — MCP Server: Debug Agent Sessions Conversationally

17 May 13:29

Choose a tag to compare

Query agent traces through Claude Code or Cursor using natural language.

agent-strace mcp starts a JSON-RPC 2.0 MCP server over stdio. No external dependencies — implements the MCP protocol using stdlib only.

Five tools

Tool Description
list_sessions Sessions with metadata, cost estimate, agent name filter
get_session Full event stream with optional event type filter
search_events Filter by tool name, file path, exit code, or error flag across sessions
get_session_summary Plain-English phase breakdown — what the agent did, files touched, retries
diff_sessions Tool call delta, file overlap, cost/token/error delta between two sessions

Claude Code config

{
  "mcpServers": {
    "agent-trace": {
      "command": "agent-strace",
      "args": ["mcp"]
    }
  }
}

Cursor config

{
  "mcpServers": {
    "agent-trace": {
      "command": "agent-strace",
      "args": ["mcp"]
    }
  }
}

Also in this release

  • examples/ci/agent-eval.yml: GitHub Actions eval CI workflow — runs eval ci with baseline comparison on PRs touching agent config files, posts score summary as a PR comment
  • 40 new tests; 740 total across Python 3.10-3.13

v0.37.0 — LLM-as-Judge Scorer, Dataset Auto-Sampling, Eval CI Baseline

23 May 10:38

Choose a tag to compare

Three eval loop features: score sessions with an LLM, auto-populate datasets from signal filters, and track score regressions in CI.

What's new

LLM-as-judge scorer

Score agent sessions using any OpenAI-compatible endpoint:

# .agent-evals.yaml
scorers:
  - type: llm_judge
    threshold: 0.8
    prompt: "Did the agent complete the task without unnecessary file writes?"
    base_url: "http://localhost:11434/v1"
    model: llama3

Dataset auto-sampling

Automatically populate eval datasets from recent sessions using signal-based filters:

agent-strace eval dataset auto --filter has-errors --since-days 7
agent-strace eval dataset auto --filter high-retry
agent-strace eval dataset auto --filter cost-above:1.00
agent-strace eval dataset auto --filter wide-blast
agent-strace eval dataset auto --filter long-duration:300s
agent-strace eval dataset auto --filter low-eval-score:0.5

Eval CI baseline

Track and enforce score regressions across runs:

# Save baseline
agent-strace eval ci --save-baseline .agent-traces/baseline.json

# Fail CI on regression beyond 5%
agent-strace eval ci --baseline .agent-traces/baseline.json --tolerance 0.05

# Write GitHub Actions PR comment summary
agent-strace eval ci --baseline .agent-traces/baseline.json --github-summary

The --github-summary flag writes a Markdown table with score, baseline, delta (e.g. +20%), and pass/fail per scorer — ready to post as a PR comment.