Releases: Siddhant-K-code/agent-trace
v0.36.0 — Langfuse Export and OTLP Behavioral Metrics
Export eval scores and behavioral metrics to external observability backends.
What's new
agent-strace export now supports two backends:
Langfuse — push eval scores as Langfuse trace scores:
agent-strace export --scores --backend langfuse \
--langfuse-host https://cloud.langfuse.com \
--langfuse-public-key pk-... \
--langfuse-secret-key sk-...
OTLP gauge metrics — push behavioral metrics as OpenTelemetry gauges:
agent-strace export --metrics --backend otlp \
--otlp-endpoint http://localhost:4318
Exported metrics: agent.session.tokens, agent.session.duration_ms, agent.session.error_rate, agent.session.tool_calls, agent.session.file_writes, agent.session.cost_usd
v0.35.0 — Optimize: AGENTS.md Improvement Proposals
Automatically propose improvements to your AGENTS.md based on patterns in trace failures.
What's new
agent-strace optimize scans recent sessions for failure patterns and generates concrete AGENTS.md additions to prevent recurrence.
agent-strace optimize
agent-strace optimize --since-days 14 --output proposals.md
agent-strace optimize --llm --base-url http://localhost:11434/v1 --model llama3
- Heuristic mode (default): no LLM required — detects error-no-change loops, wide blast radius, high retry rates, long stalls
- LLM mode: sends failure summaries to any OpenAI-compatible endpoint for richer proposals
- Output: ready-to-paste AGENTS.md snippets with rationale
v0.34.0 — Eval Trend Dashboard
Track how eval scores change over time with a visual trend dashboard.
What's new
agent-strace dashboard --trend renders an HTML sparkline dashboard showing score trends across sessions for every scorer in your eval config.
agent-strace dashboard --trend
agent-strace dashboard --trend --output trend.html
agent-strace dashboard --annotate SESSION_ID "Fixed retry loop"
- Per-scorer sparklines with pass/fail coloring
- Annotations: attach notes to specific sessions (stored in
.agent-traces/annotations.json) - Terminal summary table alongside the HTML output
- Works with any eval config — no extra setup
v0.33.0 — Behavioral Drift Detection
Detect when an agent's behavior has shifted between sessions without needing an LLM.
What's new
agent-strace drift computes a behavioral fingerprint for each session across 6 dimensions — tool mix, error rate, retry rate, file blast radius, token rate, and duration — then measures Jensen-Shannon divergence between two sessions or a session and a rolling baseline.
agent-strace drift SESSION_A SESSION_B
agent-strace drift --baseline 7d SESSION_ID
- Fingerprints are <2 KB JSON, storable alongside traces
- Drift score 0.0–1.0; configurable alert threshold (
--threshold) - No LLM required — pure statistical comparison
- Output: per-dimension breakdown + overall drift score
v0.32.1 — Behavioral Drift, Eval Trend Dashboard, Optimize, Langfuse Export, LLM Judge
Batch release covering five features shipped between v0.32.0 and v0.33.0. Each feature is also available as a standalone release (v0.33.0–v0.37.0).
- Behavioral drift detection (
agent-strace drift) — Jensen-Shannon divergence across 6 behavioral dimensions - Eval trend dashboard (
agent-strace dashboard --trend) — sparkline HTML dashboard with annotations - Optimize (
agent-strace optimize) — propose AGENTS.md improvements from trace failures - Langfuse export (
agent-strace export --scores --backend langfuse) — push eval scores to Langfuse - OTLP behavioral metrics (
agent-strace export --metrics --backend otlp) — gauge metrics to any OTLP backend - LLM-as-judge scorer — score sessions via any OpenAI-compatible endpoint
- Dataset auto-sampling (
eval dataset auto) — 6 signal filters to build eval datasets from traces - Eval CI baseline (
eval ci --baseline) — regression gate with GitHub Actions PR comment output
v0.32.0 — Self-Contained HTML Session Replay Viewer
Self-Contained HTML Session Replay Viewer
agent-strace replay --format html generates a single-file HTML viewer for any recorded session. No server, no dependencies — open it in any browser.
agent-strace replay --format html
agent-strace replay --format html --output review.html SESSION_IDViewer features:
- Animated event timeline with configurable playback speed (up to 4×)
- Scrubber bar for jumping to any point in the session
- Running cost counter updated as events play
- Click-to-expand event detail (full JSON payload)
- Color-coded event types: tool calls, LLM requests, file ops, errors
- Pause/resume and show-all controls
- Dark theme, zero external dependencies (no CDN, no fonts)
All event data is embedded as a JSON constant in the HTML file. Useful for sharing sessions with teammates or attaching to PR reviews without requiring them to install anything.
v0.31.0 — Agent Standup Report from Session Trace
Agent Standup Report from Session Trace
agent-strace standup generates a structured standup report from a session trace — no LLM call required.
agent-strace standup
agent-strace standup --session SESSION_IDReport sections:
What the agent did:
- Files read and modified
- Approaches tried, including abandoned ones (detected from retry patterns)
- New dependencies added (
npm install,pip install, etc.)
What it was uncertain about:
- TODO / FIXME / assumption comments written into files
What to review carefully:
- Large changes (>100 lines), new dependencies, auth and migration patterns
Stats: tool calls, context resets, retries, errors
Useful for async teams where the agent runs overnight and a human needs a quick brief before picking up the work.
v0.30.0 — On-Call Readiness Report for Agent-Modified Files
On-Call Readiness Report for Agent-Modified Files
When an agent has been writing code, the human on call may not have read it. agent-strace oncall cross-references agent-modified files from the trace store against git history to surface cognitive gaps before a rotation.
agent-strace oncall --rotation-start 2026-04-25
agent-strace oncall --rotation-start 2026-04-25 --scope "src/payments/**"For each file the agent has written in the last N days, the report shows:
- How long ago it was modified
- How many lines changed (from
git log --numstat) - Estimated reading time (~200 lines/minute)
- Total catch-up time before rotation
--scope filters to a file glob. --since-days controls how far back to scan sessions (default: 30).
v0.29.0 — Context Freshness Check Before Starting a Session
Context Freshness Check Before Starting a Session
Before handing a task to an agent, it helps to know how stale its last view of the codebase is. agent-strace freshness compares the current state against what the agent last saw, using git diff between the last session timestamp and HEAD.
agent-strace freshness
agent-strace freshness --since 2026-04-01 --scope "src/**"Report includes:
- Files changed since the last session (or since
--sincedate) - Per-file change type (modified / added / deleted / renamed) and line count
- Freshness score 0–100 (100 = nothing changed since last session)
- Estimated catch-up reading time for in-scope files
Scope is auto-detected from CLAUDE.md / AGENTS.md scope sections, or overridden with --scope. No API calls required.
v0.28.0 — A2A Protocol Support with Cross-Agent Trace Correlation
A2A Protocol Support with Cross-Agent Trace Correlation
First-class support for agent-to-agent calls following the Google A2A spec. A2A calls are captured as TOOL_CALL events with event_subtype=a2a_call, so they are backward-compatible with all existing replay and export tooling.
agent-strace a2a-tree
agent-strace a2a-tree SESSION_ID --format jsonNew capabilities:
- Detects A2A calls by path, header, and body heuristics
- Builds the full agent call graph by following
sub_session_idlinks andparent_session_idback-references - Renders the call graph as an ASCII tree
- Exports the graph as OTLP-compatible spans for Jaeger, Tempo, or any OpenTelemetry backend
Child sessions are linked via parent_session_id and parent_event_id in session metadata.