Skip to content

Commit 00da58c

Browse files
feat: MCP server — expose session traces as queryable tools (#79)
Implements agent-strace mcp: a stdio JSON-RPC MCP server that lets any MCP-compatible client (Claude Code, Cursor) query agent traces conversationally. Five tools: - list_sessions: sessions with metadata, cost estimate, agent filter - get_session: full event stream with optional event type filter - search_events: cross-session filter by tool name, file path, exit code, error flag - get_session_summary: plain-English phase breakdown (wraps explain_session) - diff_sessions: tool call delta, file overlap, cost/token/error delta No external dependencies — implements MCP JSON-RPC 2.0 over stdio using stdlib only. Also adds: - examples/ci/agent-eval.yml: GitHub Actions workflow for eval CI gate with baseline comparison and PR comment posting - README: Debug with MCP section with Claude Code / Cursor config examples 40 new tests; 740 total, all passing. Co-authored-by: Ona <no-reply@ona.com>
1 parent 08b7fb7 commit 00da58c

5 files changed

Lines changed: 1071 additions & 0 deletions

File tree

README.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,7 @@ print(f"Replay with: agent-strace replay {meta.session_id}")
185185
| `inflation` | Token inflation across model versions |
186186
| `curve` | Personal cost-efficiency curve |
187187
| `a2a-tree` | Cross-agent trace correlation (A2A protocol) |
188+
| `mcp` | MCP server — expose traces as queryable tools for a debugging agent |
188189

189190
```
190191
agent-strace setup [--redact] [--global] Generate Claude Code hooks config
@@ -1204,6 +1205,75 @@ agent-strace export <session-id> --format otlp > trace.json
12041205
| event_id | span ID |
12051206
| parent_id | parent span ID |
12061207

1208+
## Debug with MCP
1209+
1210+
`agent-strace mcp` starts an MCP server that exposes your session store as queryable tools. Any MCP-compatible client (Claude Code, Cursor, VS Code Copilot) can then query traces conversationally — the debugging agent reads its own execution history and surfaces what went wrong.
1211+
1212+
```bash
1213+
agent-strace mcp
1214+
```
1215+
1216+
**Claude Code config** (`.claude/settings.json`):
1217+
1218+
```json
1219+
{
1220+
"mcpServers": {
1221+
"agent-trace": {
1222+
"command": "agent-strace",
1223+
"args": ["mcp"]
1224+
}
1225+
}
1226+
}
1227+
```
1228+
1229+
**Cursor config** (`.cursor/mcp.json`):
1230+
1231+
```json
1232+
{
1233+
"mcpServers": {
1234+
"agent-trace": {
1235+
"command": "agent-strace",
1236+
"args": ["mcp"]
1237+
}
1238+
}
1239+
}
1240+
```
1241+
1242+
Once connected, you can ask the debugging agent questions like:
1243+
1244+
> "Look at the most recent session and tell me why it called bash three times in a row."
1245+
> "Which files did the agent write in session abc123 that it didn't write in def456?"
1246+
> "Find all sessions where the agent hit an error after calling npm test."
1247+
1248+
### MCP tools
1249+
1250+
| Tool | Description |
1251+
|---|---|
1252+
| `list_sessions` | List captured sessions with metadata (timestamp, tool calls, cost, tokens) |
1253+
| `get_session` | Full event stream for a session, with optional event type filter |
1254+
| `search_events` | Filter events by tool name, file path, exit code, or error flag across sessions |
1255+
| `get_session_summary` | Plain-English phase breakdown — what the agent did, files touched, retries |
1256+
| `diff_sessions` | Compare two sessions: tool call delta, file overlap, cost delta, error delta |
1257+
1258+
### Example interactions
1259+
1260+
```
1261+
# List recent sessions
1262+
list_sessions(limit=5)
1263+
1264+
# Get all errors from a session
1265+
search_events(session_id="abc123", has_error=true)
1266+
1267+
# Find all sessions where the agent wrote to package-lock.json
1268+
search_events(file_path="package-lock.json")
1269+
1270+
# Compare two sessions after changing AGENTS.md
1271+
diff_sessions(session_a="before_change", session_b="after_change")
1272+
1273+
# Get a plain-English summary of what went wrong
1274+
get_session_summary(session_id="abc123")
1275+
```
1276+
12071277
## How it works
12081278

12091279
### Claude Code hooks

examples/ci/agent-eval.yml

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# Agent eval CI workflow
2+
#
3+
# Runs eval scorers on every PR that touches agent config files.
4+
# Fails the PR if any scorer drops below its threshold.
5+
# Posts a score summary as a PR comment.
6+
#
7+
# Prerequisites:
8+
# 1. Capture at least one session: agent-strace record -- <your-agent-command>
9+
# 2. Save a baseline: agent-strace eval ci --save-baseline .agent-traces/baselines/main.json
10+
# 3. Commit .agent-evals.yaml and .agent-traces/baselines/main.json to the repo
11+
12+
name: Agent eval
13+
14+
on:
15+
pull_request:
16+
paths:
17+
- "AGENTS.md"
18+
- "CLAUDE.md"
19+
- ".claude/**"
20+
- ".agent-evals.yaml"
21+
- ".agent-traces/datasets/**"
22+
23+
jobs:
24+
eval:
25+
runs-on: ubuntu-latest
26+
27+
steps:
28+
- uses: actions/checkout@v4
29+
30+
- name: Set up Python
31+
uses: actions/setup-python@v5
32+
with:
33+
python-version: "3.12"
34+
35+
- name: Install agent-strace
36+
run: pip install agent-strace
37+
38+
# Score the latest session in the dataset against all configured scorers.
39+
# Exits 1 if any scorer is below threshold or regresses vs baseline.
40+
- name: Run eval
41+
env:
42+
# Required only if using the llm_judge scorer.
43+
# Remove if using heuristic scorers only (no_errors, cost_under, etc.)
44+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
45+
run: |
46+
agent-strace eval ci \
47+
--baseline .agent-traces/baselines/main.json \
48+
--tolerance 0.05 \
49+
--github-summary
50+
51+
# Post the Markdown summary as a PR comment so reviewers see the score delta.
52+
- name: Post eval summary
53+
if: always()
54+
uses: actions/github-script@v7
55+
with:
56+
script: |
57+
const fs = require('fs');
58+
const summaryPath = '.agent-traces/eval-summary.md';
59+
if (!fs.existsSync(summaryPath)) {
60+
console.log('No eval summary found — skipping comment.');
61+
return;
62+
}
63+
const summary = fs.readFileSync(summaryPath, 'utf8');
64+
await github.rest.issues.createComment({
65+
issue_number: context.issue.number,
66+
owner: context.repo.owner,
67+
repo: context.repo.repo,
68+
body: summary,
69+
});
70+
71+
# Optional: update the baseline on every merge to main.
72+
# Commit the updated baseline back to the repo so future PRs compare against it.
73+
update-baseline:
74+
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
75+
runs-on: ubuntu-latest
76+
77+
steps:
78+
- uses: actions/checkout@v4
79+
80+
- name: Set up Python
81+
uses: actions/setup-python@v5
82+
with:
83+
python-version: "3.12"
84+
85+
- name: Install agent-strace
86+
run: pip install agent-strace
87+
88+
- name: Save new baseline
89+
run: |
90+
mkdir -p .agent-traces/baselines
91+
agent-strace eval ci \
92+
--save-baseline .agent-traces/baselines/main.json
93+
94+
- name: Commit updated baseline
95+
run: |
96+
git config user.name "github-actions[bot]"
97+
git config user.email "github-actions[bot]@users.noreply.github.com"
98+
git add .agent-traces/baselines/main.json
99+
git diff --staged --quiet || git commit -m "chore: update eval baseline [skip ci]"
100+
git push

src/agent_trace/cli.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
from .hooks import hook_main
2424
from .http_proxy import HTTPProxyServer
2525
from .a2a import cmd_a2a_tree
26+
from .mcp_server import cmd_mcp
2627
from .annotate import cmd_annotate
2728
from .drift import cmd_drift
2829
from .langfuse_export import cmd_export_scores
@@ -747,6 +748,18 @@ def build_parser() -> argparse.ArgumentParser:
747748
p_standup.add_argument("--no-llm", action="store_true", dest="no_llm",
748749
help="structured output only, no LLM narrative (default)")
749750

751+
# mcp (MCP server — expose traces as queryable tools)
752+
p_mcp = sub.add_parser(
753+
"mcp",
754+
help="start an MCP server that exposes session traces as queryable tools",
755+
)
756+
p_mcp.add_argument(
757+
"--transport",
758+
choices=["stdio"],
759+
default="stdio",
760+
help="transport protocol (default: stdio)",
761+
)
762+
750763
# diff --semantic and --eval-config flags (extend existing diff parser)
751764
p_diff.add_argument("--semantic", action="store_true",
752765
help="semantic outcome-level diff (files, cost, errors)")
@@ -806,6 +819,7 @@ def main() -> None:
806819
"oncall": cmd_oncall,
807820
"freshness": cmd_freshness,
808821
"standup": cmd_standup,
822+
"mcp": cmd_mcp,
809823
}
810824

811825
handler = handlers.get(args.command)

0 commit comments

Comments
 (0)