Trace analysis is the bridge between raw product telemetry and useful eval work.
live product run
-> TraceEmitter / TraceStore
-> TraceAnalyst investigates trace corpora
-> findings become ASI, failures, replay cases, and release actionsUse TraceAnalyst when you have more than a few traces and need to answer:
- which failure modes are recurring?
- which spans explain a regression?
- did retrieval, integrations, sandbox, or policy block the run?
- are failed runs missing evidence that the optimizer needs?
- which product surfaces deserve the next fix?
Use summary tables and release confidence for promotion decisions. Use TraceAnalyst to explain the evidence behind those decisions.
import {
OtlpFileTraceStore,
analyzeTraces,
} from '@tangle-network/agent-eval'
const result = await analyzeTraces({
question: 'Why did app-runtime holdout runs fail this week?',
}, {
source: new OtlpFileTraceStore({ path: 'traces/otlp.jsonl' }),
ai,
model: 'gpt-4o-2024-11-20',
})
console.log(result.findings)Products can pass any TraceAnalysisStore; they do not need to use the file
store in production.
Every serious product run should include:
runId,projectId,scenarioId,variantId, andlayer- commit, prompt hash, config hash, model fingerprint, and dataset version
- LLM spans with model, inputs, outputs, token counts, and cost
- tool/integration spans with arguments, result summaries, and error codes
- retrieval spans with query, source ids, hit scores, and freshness metadata
- sandbox/build/test/deploy spans with exit codes and log artifacts
- custom events for knowledge readiness and integration gates
- final run outcome with pass/score/failure class
Do not put secrets, raw OAuth tokens, or unredacted PII in traces.
The product loop should not treat traces as a separate debug dump. The intended path is:
- Wrap the real workflow in
runAgentControlLoopor the product runtime. - Emit canonical spans/events while the user task runs.
- Convert the completed run to
FeedbackTrajectoryfor replay. - Convert promotion-grade runs to
RunRecordwithcontrolRunToRunRecord. - Run TraceAnalyst over failure-heavy trace sets.
- Feed findings into
ActionableSideInfo, failure clusters, and release reports.
That makes normal product usage become eval data instead of isolated logs.