Skip to content

feat: visual eval trend dashboard (agent-strace dashboard --trend)#74

Merged
Siddhant-K-code merged 1 commit into
mainfrom
feat/dashboard-trend
May 17, 2026
Merged

feat: visual eval trend dashboard (agent-strace dashboard --trend)#74
Siddhant-K-code merged 1 commit into
mainfrom
feat/dashboard-trend

Conversation

@Siddhant-K-code
Copy link
Copy Markdown
Owner

What

Extends agent-strace dashboard with --trend mode: reads per-session eval scores from eval.json, computes error/retry/cost time-series, and renders a self-contained HTML report with inline SVG charts.

Closes #68

New flags

# Terminal trend summary
agent-strace dashboard --trend --since 30d

# Self-contained HTML report
agent-strace dashboard --trend --since 30d --html trend-report.html

# Add a timeline annotation (appears as a vertical marker on all charts)
agent-strace dashboard annotate --date 2026-05-10 --note "Added retry policy to AGENTS.md"

What the HTML report shows

Eval quality section (one sparkline per judge, requires agent-strace eval scores):

  • Pass rate trend per judge with color-coded lines
  • Annotation markers showing when config or model changed

Behavioral metrics section (four sparklines, no eval required):

  • Error rate per session
  • Retry rate per session
  • Estimated cost per session
  • Session duration

Recent sessions table with eval scores inline.

Design constraints met

  • Zero CDN dependencies — all CSS inline, all charts inline SVG
  • No JavaScript libraries — polylines rendered as SVG paths
  • Self-contained: single HTML file, openable in any browser
  • Works without eval scores (falls back to behavioral metrics only)

Files changed

  • src/agent_trace/dashboard.pyTrendPoint, TrendReport, EvalScorePoint, build_trend_report(), format_trend_terminal(), render_html_trend(), save_annotation(), load_annotations(), _svg_sparkline()
  • src/agent_trace/cli.py--trend, --since, --html flags; dashboard annotate subcommand
  • tests/test_dashboard_trend.py — 25 new tests

Test results

Ran 591 tests — OK (25 new)

Extends dashboard with --trend mode: reads per-session eval scores from
eval.json, computes error/retry/cost time-series, and renders a
self-contained HTML report with inline SVG charts.

- TrendPoint, TrendReport, EvalScorePoint data structures
- build_trend_report(): extracts metrics from events + eval.json
- format_trend_terminal(): terminal summary with pass-rate deltas
- render_html_trend(): self-contained HTML, no CDN, no JS libraries,
  inline SVG sparklines with annotation markers
- save_annotation() / load_annotations(): persist timeline markers
- CLI: --trend, --since Nd, --html FILE flags on dashboard subcommand
- dashboard annotate --date --note subcommand
- 25 new tests covering SVG rendering, annotation storage, trend
  builder, terminal formatting, and HTML output

Closes #68

Co-authored-by: Ona <no-reply@ona.com>
@Siddhant-K-code Siddhant-K-code merged commit 983a1af into main May 17, 2026
4 checks passed
@Siddhant-K-code Siddhant-K-code deleted the feat/dashboard-trend branch May 17, 2026 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: visual eval trend dashboard — quality scores, cost, and drift over time (agent-strace dashboard --trend)

1 participant