v0.32.1 — Behavioral Drift, Eval Trend Dashboard, Optimize, Langfuse Export, LLM Judge
Batch release covering five features shipped between v0.32.0 and v0.33.0. Each feature is also available as a standalone release (v0.33.0–v0.37.0).
- Behavioral drift detection (
agent-strace drift) — Jensen-Shannon divergence across 6 behavioral dimensions - Eval trend dashboard (
agent-strace dashboard --trend) — sparkline HTML dashboard with annotations - Optimize (
agent-strace optimize) — propose AGENTS.md improvements from trace failures - Langfuse export (
agent-strace export --scores --backend langfuse) — push eval scores to Langfuse - OTLP behavioral metrics (
agent-strace export --metrics --backend otlp) — gauge metrics to any OTLP backend - LLM-as-judge scorer — score sessions via any OpenAI-compatible endpoint
- Dataset auto-sampling (
eval dataset auto) — 6 signal filters to build eval datasets from traces - Eval CI baseline (
eval ci --baseline) — regression gate with GitHub Actions PR comment output