Commit 49f33ea
Release v0.10.0: evaluation framework
New commands:
agent-strace eval run <session-id> score a session
agent-strace eval compare <session-a> <b> side-by-side diff
agent-strace eval ci <session-id> CI gate (exits 1 on fail)
agent-strace eval dataset add|list|export manage eval datasets
Built-in scorers: no_errors, regex, cost_under, files_scoped,
duration_under, custom. Config via .agent-evals.yaml (stdlib YAML
parser, no PyYAML). Dataset storage is local JSONL.
Zero new dependencies. 47 new tests.
Co-authored-by: Ona <no-reply@ona.com>1 parent 635247e commit 49f33ea
1 file changed
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
0 commit comments