Commit a5feb00
feat: export eval scores and behavioral metrics to Langfuse and OTLP backends (#76)
Adds agent-strace export --scores --backend langfuse|otlp:
Langfuse path (--backend langfuse):
- Sessions exported as Langfuse Traces via /api/public/ingestion
- Tool call/result pairs exported as Spans (type=SPAN)
- LLM request/response pairs exported as Generations with token counts
- eval.json judge scores exported as Langfuse Scores attached to trace
OTLP metrics path (--backend otlp):
- Behavioral metrics exported as OTLP gauge metrics to /v1/metrics
- Metrics: cost_usd, error_rate, retry_rate, blast_radius, duration_s,
tool_calls, eval.score (one per judge)
- Compatible with Datadog, Honeycomb, Grafana, New Relic
No new dependencies. All HTTP calls use urllib.request.
Credentials via env vars (LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY,
OTEL_EXPORTER_OTLP_ENDPOINT) or CLI flags.
- langfuse_export.py: LangfuseConfig, OtlpMetricsConfig, EvalScore,
export_session_to_langfuse(), export_metrics_to_otlp(), cmd_export_scores()
- cli.py: --scores, --metrics, --backend, --since, --langfuse-* and
--otlp-* flags added to export subcommand; routes to cmd_export_scores
when any of these flags are set
- 31 new tests covering config, score loading, trace/observation/score
building, gauge building, metrics extraction, and mocked HTTP export
Closes #71
Co-authored-by: Ona <no-reply@ona.com>1 parent 5e611d4 commit a5feb00
3 files changed
Lines changed: 991 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
| |||
224 | 225 | | |
225 | 226 | | |
226 | 227 | | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
227 | 232 | | |
228 | 233 | | |
229 | 234 | | |
| |||
461 | 466 | | |
462 | 467 | | |
463 | 468 | | |
464 | | - | |
| 469 | + | |
465 | 470 | | |
466 | 471 | | |
467 | 472 | | |
468 | 473 | | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
469 | 493 | | |
470 | 494 | | |
471 | 495 | | |
| |||
0 commit comments