Releases · Siddhant-K-code/agent-trace

README rewritten with full v0.2.0 feature coverage: session browser, post-mortem viewer, watchdog integration
Complete command table (all 8 commands)
Complete settings table (all 6 settings with correct names and defaults)
No functional changes to the extension

Assets 2

20 May 09:29

github-actions

v0.38.1

ebb97d9

v0.38.1

Fix latest session selection by start time (#80) | Thanks to @grp06

Contributors

grp06

Assets 2

17 May 13:29

github-actions

v0.38.0

edb59c3

v0.38.0 — MCP Server: Debug Agent Sessions Conversationally

Query agent traces through Claude Code or Cursor using natural language.

agent-strace mcp starts a JSON-RPC 2.0 MCP server over stdio. No external dependencies — implements the MCP protocol using stdlib only.

Five tools

Tool	Description
`list_sessions`	Sessions with metadata, cost estimate, agent name filter
`get_session`	Full event stream with optional event type filter
`search_events`	Filter by tool name, file path, exit code, or error flag across sessions
`get_session_summary`	Plain-English phase breakdown — what the agent did, files touched, retries
`diff_sessions`	Tool call delta, file overlap, cost/token/error delta between two sessions

Claude Code config

{
  "mcpServers": {
    "agent-trace": {
      "command": "agent-strace",
      "args": ["mcp"]
    }
  }
}

Cursor config

{
  "mcpServers": {
    "agent-trace": {
      "command": "agent-strace",
      "args": ["mcp"]
    }
  }
}

Also in this release

examples/ci/agent-eval.yml: GitHub Actions eval CI workflow — runs eval ci with baseline comparison on PRs touching agent config files, posts score summary as a PR comment
40 new tests; 740 total across Python 3.10-3.13

Assets 2

23 May 10:38

Siddhant-K-code

v0.37.0

08b7fb7

v0.37.0 — LLM-as-Judge Scorer, Dataset Auto-Sampling, Eval CI Baseline

Three eval loop features: score sessions with an LLM, auto-populate datasets from signal filters, and track score regressions in CI.

What's new

LLM-as-judge scorer

Score agent sessions using any OpenAI-compatible endpoint:

# .agent-evals.yaml
scorers:
  - type: llm_judge
    threshold: 0.8
    prompt: "Did the agent complete the task without unnecessary file writes?"
    base_url: "http://localhost:11434/v1"
    model: llama3

Dataset auto-sampling

Automatically populate eval datasets from recent sessions using signal-based filters:

agent-strace eval dataset auto --filter has-errors --since-days 7
agent-strace eval dataset auto --filter high-retry
agent-strace eval dataset auto --filter cost-above:1.00
agent-strace eval dataset auto --filter wide-blast
agent-strace eval dataset auto --filter long-duration:300s
agent-strace eval dataset auto --filter low-eval-score:0.5

Eval CI baseline

Track and enforce score regressions across runs:

# Save baseline
agent-strace eval ci --save-baseline .agent-traces/baseline.json

# Fail CI on regression beyond 5%
agent-strace eval ci --baseline .agent-traces/baseline.json --tolerance 0.05

# Write GitHub Actions PR comment summary
agent-strace eval ci --baseline .agent-traces/baseline.json --github-summary

The --github-summary flag writes a Markdown table with score, baseline, delta (e.g. +20%), and pass/fail per scorer — ready to post as a PR comment.

Assets 2

Uh oh!

Releases: Siddhant-K-code/agent-trace

v0.42.1

Uh oh!

v0.42.0

Uh oh!

v0.41.0

Uh oh!

v0.40.0

Uh oh!

v0.39.1

Uh oh!

v0.39.0

Uh oh!

VS Code extension v0.2.1

Uh oh!

v0.38.1

Contributors

Uh oh!

v0.38.0 — MCP Server: Debug Agent Sessions Conversationally

Five tools

Claude Code config

Cursor config

Also in this release

Uh oh!

v0.37.0 — LLM-as-Judge Scorer, Dataset Auto-Sampling, Eval CI Baseline

What's new

LLM-as-judge scorer

Dataset auto-sampling

Eval CI baseline

Uh oh!