agentevals consumes OpenTelemetry traces to evaluate AI agents. This document covers which OTel conventions we support, how we handle the ongoing migration from span events to log-based events, and guidance for instrumenting your own agents.
The GenAI semantic conventions define standard span attributes for LLM interactions. agentevals auto-detects this format when spans contain gen_ai.request.model or gen_ai.input.messages.
This format works with LangChain, Strands, OpenAI instrumentation, Anthropic instrumentation, and any framework that follows the GenAI semantic conventions.
| Attribute | Description |
|---|---|
gen_ai.request.model |
Model name (e.g. gpt-4o, claude-sonnet-4-6) |
gen_ai.input.messages |
JSON array of input messages |
gen_ai.output.messages |
JSON array of output messages |
gen_ai.response.finish_reasons |
Why the model stopped generating |
gen_ai.usage.input_tokens |
Input token count |
gen_ai.usage.output_tokens |
Output token count |
| Attribute | Description |
|---|---|
gen_ai.provider.name |
LLM provider (e.g. openai, anthropic). Replaces the deprecated gen_ai.system. |
gen_ai.response.model |
Model name returned in the response |
gen_ai.response.id |
Unique response identifier |
| Attribute | Description |
|---|---|
gen_ai.request.temperature |
Temperature sampling parameter |
gen_ai.request.max_tokens |
Maximum output tokens limit |
gen_ai.request.top_p |
Top-P (nucleus) sampling parameter |
gen_ai.request.top_k |
Top-K sampling parameter |
| Attribute | Description |
|---|---|
gen_ai.usage.cache_creation.input_tokens |
Tokens spent creating a prompt cache entry |
gen_ai.usage.cache_read.input_tokens |
Tokens served from an existing cache entry |
These are relevant for providers that support prompt caching (Anthropic, OpenAI). agentevals aggregates these across LLM spans and displays them in the performance summary.
| Attribute | Description |
|---|---|
gen_ai.agent.id |
Unique agent identifier |
gen_ai.agent.description |
Agent description |
gen_ai.tool.description |
Tool description |
gen_ai.tool.type |
Tool type classification |
These may contain large payloads and are typically gated behind instrumentation flags:
| Attribute | Description |
|---|---|
gen_ai.system_instructions |
System prompt text |
gen_ai.tool.definitions |
Tool schema definitions (JSON) |
gen_ai.output.type |
Classification of output content |
Google ADK emits spans under the gcp.vertex.agent OTel scope with proprietary attributes (gcp.vertex.agent.llm_request, gcp.vertex.agent.llm_response, etc.). agentevals has a dedicated converter that auto-detects this format. No GenAI semconv configuration is needed.
Format detection is automatic. When a trace contains both ADK and GenAI attributes, ADK takes priority because it provides richer structured data. The detection logic lives in src/agentevals/converter.py (get_extractor()).
GenAI message content (gen_ai.input.messages, gen_ai.output.messages) can use two JSON schemas. agentevals supports both and normalizes them internally.
Used by OpenAI and LangChain instrumentors (v2):
{"role": "user", "content": "Hello"}
{"role": "assistant", "content": "...", "tool_calls": [{"type": "function", "function": {"name": "get_weather", "arguments": "{\"city\": \"NYC\"}"}}]}Used by newer instrumentors that follow the GenAI semconv parts schema:
{"role": "user", "parts": [{"type": "text", "content": "Hello"}]}
{"role": "assistant", "parts": [{"type": "tool_call", "name": "get_weather", "arguments": {"city": "NYC"}}]}Both formats are auto-detected per message. Tool calls are normalized to {name, id, arguments} regardless of source format.
GenAI message content can arrive through three mechanisms. agentevals supports all of them:
Message content is stored directly as span attributes. This is the most straightforward approach and requires no special handling.
Message content is emitted as OTel log records correlated with spans via trace context. This is the pattern used by opentelemetry-instrumentation-openai-v2 and LangChain's GenAI instrumentation.
Requires both OTLPSpanExporter and OTLPLogExporter (or their streaming equivalents). Set OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true to enable content capture.
Message content is emitted as attributes on span events. agentevals promotes these to span-level attributes during normalization so downstream processing sees a uniform shape.
This promotion happens in three processing layers:
streaming/processor.pyfor live WebSocket spansapi/otlp_routes.pyfor OTLP HTTP receptionloader/otlp.pyfor loading OTLP JSON files
The OTel community is deprecating the Span Event API (Span.AddEvent, Span.RecordException) in favor of emitting events as log records via the Logs API. The core idea: "events are logs with names," correlated with traces through context.
No immediate action required. Existing instrumentation continues to work. The deprecation is about providing a single recommended path for new code, not about removing support for existing span event data.
For new instrumentation, prefer the logs-based pattern. Configure both OTLPSpanExporter and OTLPLogExporter, and use instrumentation libraries that emit message content as log records.
For existing span-event instrumentation (e.g. Strands with OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental), everything continues to work. When your framework releases a version that migrates to log-based events, update your exporter configuration to include OTLPLogExporter and follow the logs-based pattern.
agentevals already supports both content delivery mechanisms. The span event promotion logic will remain for backward compatibility with older instrumentation versions. As frameworks migrate, the log-based path (already fully supported) will become the primary path.
If you maintain an OTel-instrumented agent framework and want to align with the deprecation:
- Emit
gen_ai.input.messagesandgen_ai.output.messagesas log records instead of span events - Correlate logs with spans via trace context (the OTel SDK handles this automatically)
- Document that users need both
OTLPSpanExporterandOTLPLogExporter - Consider an opt-in flag (similar to
OTEL_SEMCONV_EXCEPTION_SIGNAL_OPT_IN) during the transition
agentevals runs two OTLP receivers:
- gRPC on port 4317 (standard OTLP gRPC port, configurable via
--otlp-grpc-port) - HTTP on port 4318 (standard OTLP HTTP port)
Both accept traces and logs and feed into the same session manager.
| Endpoint | Content Types |
|---|---|
/v1/traces |
application/json, application/x-protobuf |
/v1/logs |
application/json, application/x-protobuf |
Implements the standard TraceService/Export and LogsService/Export RPCs. Configuration:
| Setting | Default |
|---|---|
| Max message size | 8 MB |
| Max concurrent RPCs | 32 |
| Compression | gzip |
| TLS | off (insecure) |
For HTTP exporters:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318For gRPC exporters:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_EXPORTER_OTLP_PROTOCOL=grpcTraces and logs stream into agentevals automatically. See examples/README.md for zero-code setup instructions.