Skip to content

Commit c1054cc

Browse files
feat(sample): add dataset auto-sampler for regression suite export (#107)
Adds agent-strace sample command to surface worst/diverse/random/recent sessions and export them as JSONL for use in eval frameworks. Supports four sampling strategies, deduplication by tool call sequence, and reproducible random sampling via --seed. Closes #94 Co-authored-by: Ona <no-reply@ona.com>
1 parent 079a57a commit c1054cc

5 files changed

Lines changed: 746 additions & 1 deletion

File tree

README.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -218,6 +218,7 @@ agent-strace token-budget <session-id> Check token usage against model
218218
agent-strace replay [session-id] [--limit N] Replay a session (--limit caps events shown)
219219
agent-strace retention status Show session count, size, and what policy would delete
220220
agent-strace retention clean [--dry-run] Delete sessions that exceed retention limits
221+
agent-strace sample --strategy worst --n 20 Export worst/diverse/random/recent sessions as JSONL
221222
agent-strace watch [--timeout DURATION] [--budget $] [--on-death CMD] [--rules file]
222223
Watch a live session; kill/pause on rule breach
223224
agent-strace share <session-id> [-o file] Export a self-contained HTML report
@@ -797,6 +798,29 @@ agent-strace dashboard --html report.html # self-contained HTML export
797798

798799
The terminal view shows total tool calls, errors, tokens, and estimated cost, plus ASCII sparkline charts for each metric over time and a top-tools frequency table. The HTML export is self-contained. No server needed.
799800

801+
### Dataset auto-sampler
802+
803+
Export the sessions most useful for regression suites and eval datasets — without manual inspection.
804+
805+
```bash
806+
# Export the 20 worst-performing sessions (highest error/retry/cost)
807+
agent-strace sample --strategy worst --n 20 --output regression.jsonl
808+
809+
# Export 10 sessions that maximise behavioral variety
810+
agent-strace sample --strategy diverse --n 10 --output diverse.jsonl
811+
812+
# Export the 5 most recent sessions
813+
agent-strace sample --strategy recent --n 5 --output recent.jsonl
814+
815+
# Random sample, reproducible with a seed
816+
agent-strace sample --strategy random --n 15 --seed 42 --output random.jsonl
817+
818+
# Skip sessions with identical tool call sequences
819+
agent-strace sample --strategy worst --n 20 --deduplicate --output regression.jsonl
820+
```
821+
822+
Output is JSONL — one session per line — with full event data and a score breakdown. Compatible with LangSmith, Braintrust, and any custom eval framework.
823+
800824
### Eval trend dashboard
801825

802826
See whether your agent is getting better or worse over time. Reads eval scores and behavioral metrics from session events, then renders a self-contained HTML report with inline SVG charts.

src/agent_trace/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
"""agent-trace: strace for AI agents."""
22

3-
__version__ = "0.40.0"
3+
__version__ = "0.41.0"

src/agent_trace/cli.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@
4646
from .share import cmd_share
4747
from .token_budget import cmd_token_budget
4848
from .retention import cmd_retention
49+
from .sample import cmd_sample
4950
from .watch import cmd_watch
5051
from .why import cmd_why
5152
from .models import EventType, SessionMeta, TraceEvent
@@ -773,6 +774,26 @@ def build_parser() -> argparse.ArgumentParser:
773774
help="transport protocol (default: stdio)",
774775
)
775776

777+
# sample
778+
p_sample = sub.add_parser(
779+
"sample",
780+
help="export worst/diverse/random/recent sessions as a JSONL regression suite",
781+
)
782+
p_sample.add_argument(
783+
"--strategy",
784+
choices=["worst", "diverse", "random", "recent"],
785+
default="worst",
786+
help="sampling strategy (default: worst)",
787+
)
788+
p_sample.add_argument("--n", type=int, default=20, metavar="N",
789+
help="number of sessions to sample (default: 20)")
790+
p_sample.add_argument("--output", "-o", default="sample.jsonl",
791+
help="output JSONL file path (default: sample.jsonl)")
792+
p_sample.add_argument("--deduplicate", action="store_true",
793+
help="skip sessions with identical tool call sequences")
794+
p_sample.add_argument("--seed", type=int, default=None,
795+
help="random seed for reproducible random sampling")
796+
776797
# retention
777798
p_ret = sub.add_parser("retention", help="manage session data retention")
778799
ret_sub = p_ret.add_subparsers(dest="retention_command")
@@ -852,6 +873,7 @@ def main() -> None:
852873
"standup": cmd_standup,
853874
"mcp": cmd_mcp,
854875
"retention": cmd_retention,
876+
"sample": cmd_sample,
855877
}
856878

857879
handler = handlers.get(args.command)

0 commit comments

Comments
 (0)