Releases: Siddhant-K-code/agent-trace
v0.42.1
v0.42.0
v0.41.0
v0.40.0
v0.39.1
v0.39.0
VS Code extension v0.2.1
Patch release to update the extension README on Open VSX and VS Marketplace.
- README rewritten with full v0.2.0 feature coverage: session browser, post-mortem viewer, watchdog integration
- Complete command table (all 8 commands)
- Complete settings table (all 6 settings with correct names and defaults)
- No functional changes to the extension
v0.38.1
v0.38.0 — MCP Server: Debug Agent Sessions Conversationally
Query agent traces through Claude Code or Cursor using natural language.
agent-strace mcp starts a JSON-RPC 2.0 MCP server over stdio. No external dependencies — implements the MCP protocol using stdlib only.
Five tools
| Tool | Description |
|---|---|
list_sessions |
Sessions with metadata, cost estimate, agent name filter |
get_session |
Full event stream with optional event type filter |
search_events |
Filter by tool name, file path, exit code, or error flag across sessions |
get_session_summary |
Plain-English phase breakdown — what the agent did, files touched, retries |
diff_sessions |
Tool call delta, file overlap, cost/token/error delta between two sessions |
Claude Code config
{
"mcpServers": {
"agent-trace": {
"command": "agent-strace",
"args": ["mcp"]
}
}
}Cursor config
{
"mcpServers": {
"agent-trace": {
"command": "agent-strace",
"args": ["mcp"]
}
}
}Also in this release
examples/ci/agent-eval.yml: GitHub Actions eval CI workflow — runseval ciwith baseline comparison on PRs touching agent config files, posts score summary as a PR comment- 40 new tests; 740 total across Python 3.10-3.13
v0.37.0 — LLM-as-Judge Scorer, Dataset Auto-Sampling, Eval CI Baseline
Three eval loop features: score sessions with an LLM, auto-populate datasets from signal filters, and track score regressions in CI.
What's new
LLM-as-judge scorer
Score agent sessions using any OpenAI-compatible endpoint:
# .agent-evals.yaml
scorers:
- type: llm_judge
threshold: 0.8
prompt: "Did the agent complete the task without unnecessary file writes?"
base_url: "http://localhost:11434/v1"
model: llama3Dataset auto-sampling
Automatically populate eval datasets from recent sessions using signal-based filters:
agent-strace eval dataset auto --filter has-errors --since-days 7
agent-strace eval dataset auto --filter high-retry
agent-strace eval dataset auto --filter cost-above:1.00
agent-strace eval dataset auto --filter wide-blast
agent-strace eval dataset auto --filter long-duration:300s
agent-strace eval dataset auto --filter low-eval-score:0.5
Eval CI baseline
Track and enforce score regressions across runs:
# Save baseline
agent-strace eval ci --save-baseline .agent-traces/baseline.json
# Fail CI on regression beyond 5%
agent-strace eval ci --baseline .agent-traces/baseline.json --tolerance 0.05
# Write GitHub Actions PR comment summary
agent-strace eval ci --baseline .agent-traces/baseline.json --github-summary
The --github-summary flag writes a Markdown table with score, baseline, delta (e.g. +20%), and pass/fail per scorer — ready to post as a PR comment.