Skip to content

v0.32.1 — Behavioral Drift, Eval Trend Dashboard, Optimize, Langfuse Export, LLM Judge

Choose a tag to compare

@github-actions github-actions released this 17 May 12:15
· 26 commits to main since this release
f404833

Batch release covering five features shipped between v0.32.0 and v0.33.0. Each feature is also available as a standalone release (v0.33.0–v0.37.0).

  • Behavioral drift detection (agent-strace drift) — Jensen-Shannon divergence across 6 behavioral dimensions
  • Eval trend dashboard (agent-strace dashboard --trend) — sparkline HTML dashboard with annotations
  • Optimize (agent-strace optimize) — propose AGENTS.md improvements from trace failures
  • Langfuse export (agent-strace export --scores --backend langfuse) — push eval scores to Langfuse
  • OTLP behavioral metrics (agent-strace export --metrics --backend otlp) — gauge metrics to any OTLP backend
  • LLM-as-judge scorer — score sessions via any OpenAI-compatible endpoint
  • Dataset auto-sampling (eval dataset auto) — 6 signal filters to build eval datasets from traces
  • Eval CI baseline (eval ci --baseline) — regression gate with GitHub Actions PR comment output