Skip to content

Commit ac2ff6d

Browse files
Merge pull request #52 from agentevals-dev/docs/otel-recommendations
Update docs on OTel best practices
2 parents 3f99d96 + ab2a4af commit ac2ff6d

4 files changed

Lines changed: 123 additions & 3 deletions

File tree

docs/otel-compatibility.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# OpenTelemetry Compatibility
2+
3+
agentevals consumes OpenTelemetry traces to evaluate AI agents. This document covers which OTel conventions we support, how we handle the ongoing migration from span events to log-based events, and guidance for instrumenting your own agents.
4+
5+
## Supported Semantic Conventions
6+
7+
### OTel GenAI Semantic Conventions (recommended)
8+
9+
The [GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) define standard span attributes for LLM interactions. agentevals auto-detects this format when spans contain `gen_ai.request.model` or `gen_ai.input.messages`.
10+
11+
Supported attributes:
12+
13+
| Attribute | Description |
14+
|-----------|-------------|
15+
| `gen_ai.request.model` | Model name (e.g. `gpt-4o`, `claude-sonnet-4-6`) |
16+
| `gen_ai.input.messages` | JSON array of input messages |
17+
| `gen_ai.output.messages` | JSON array of output messages |
18+
| `gen_ai.response.finish_reasons` | Why the model stopped generating |
19+
| `gen_ai.usage.input_tokens` | Input token count |
20+
| `gen_ai.usage.output_tokens` | Output token count |
21+
| `gen_ai.system` | AI system identifier (e.g. `openai`, `anthropic`) |
22+
23+
This format works with LangChain, Strands, OpenAI instrumentation, Anthropic instrumentation, and any framework that follows the GenAI semantic conventions.
24+
25+
### Google ADK (framework-native)
26+
27+
Google ADK emits spans under the `gcp.vertex.agent` OTel scope with proprietary attributes (`gcp.vertex.agent.llm_request`, `gcp.vertex.agent.llm_response`, etc.). agentevals has a dedicated converter that auto-detects this format. No GenAI semconv configuration is needed.
28+
29+
### Format Detection
30+
31+
Format detection is automatic. When a trace contains both ADK and GenAI attributes, ADK takes priority because it provides richer structured data. The detection logic lives in `src/agentevals/converter.py` (`get_extractor()`).
32+
33+
## Message Content Delivery
34+
35+
GenAI message content (`gen_ai.input.messages`, `gen_ai.output.messages`) can arrive through three mechanisms. agentevals supports all of them:
36+
37+
### 1. Span attributes (simplest)
38+
39+
Message content is stored directly as span attributes. This is the most straightforward approach and requires no special handling.
40+
41+
### 2. Log records (recommended for new instrumentation)
42+
43+
Message content is emitted as OTel log records correlated with spans via trace context. This is the pattern used by `opentelemetry-instrumentation-openai-v2` and LangChain's GenAI instrumentation.
44+
45+
Requires both `OTLPSpanExporter` and `OTLPLogExporter` (or their streaming equivalents). Set `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true` to enable content capture.
46+
47+
### 3. Span events (deprecated, supported for backward compatibility)
48+
49+
Message content is emitted as attributes on span events. agentevals promotes these to span-level attributes during normalization so downstream processing sees a uniform shape.
50+
51+
This promotion happens in three processing layers:
52+
- `streaming/processor.py` for live WebSocket spans
53+
- `api/otlp_routes.py` for OTLP HTTP reception
54+
- `loader/otlp.py` for loading OTLP JSON files
55+
56+
## Span Events Deprecation
57+
58+
The OTel community is [deprecating the Span Event API](https://opentelemetry.io/blog/2026/deprecating-span-events/) (`Span.AddEvent`, `Span.RecordException`) in favor of emitting events as log records via the Logs API. The core idea: "events are logs with names," correlated with traces through context.
59+
60+
### What this means for agentevals users
61+
62+
**No immediate action required.** Existing instrumentation continues to work. The deprecation is about providing a single recommended path for new code, not about removing support for existing span event data.
63+
64+
**For new instrumentation**, prefer the logs-based pattern. Configure both `OTLPSpanExporter` and `OTLPLogExporter`, and use instrumentation libraries that emit message content as log records.
65+
66+
**For existing span-event instrumentation** (e.g. Strands with `OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental`), everything continues to work. When your framework releases a version that migrates to log-based events, update your exporter configuration to include `OTLPLogExporter` and follow the logs-based pattern.
67+
68+
### What this means for agentevals internals
69+
70+
agentevals already supports both content delivery mechanisms. The span event promotion logic will remain for backward compatibility with older instrumentation versions. As frameworks migrate, the log-based path (already fully supported) will become the primary path.
71+
72+
### Migration checklist for framework authors
73+
74+
If you maintain an OTel-instrumented agent framework and want to align with the deprecation:
75+
76+
1. Emit `gen_ai.input.messages` and `gen_ai.output.messages` as log records instead of span events
77+
2. Correlate logs with spans via trace context (the OTel SDK handles this automatically)
78+
3. Document that users need both `OTLPSpanExporter` and `OTLPLogExporter`
79+
4. Consider an opt-in flag (similar to `OTEL_SEMCONV_EXCEPTION_SIGNAL_OPT_IN`) during the transition
80+
81+
## OTLP Receiver
82+
83+
agentevals runs an OTLP HTTP receiver on port 4318 (the standard OTLP HTTP port) that accepts:
84+
85+
| Endpoint | Content Types |
86+
|----------|--------------|
87+
| `/v1/traces` | `application/json`, `application/x-protobuf` |
88+
| `/v1/logs` | `application/json`, `application/x-protobuf` |
89+
90+
Point your standard OTel exporters at `http://localhost:4318` and traces will stream into agentevals automatically. See [examples/README.md](../examples/README.md) for zero-code setup instructions.

docs/streaming.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,22 @@ When a session ends (`session_end` message), the server:
106106

107107
Evaluation is triggered separately from the UI or API.
108108

109+
## GenAI Message Content: Span Events vs. Logs
110+
111+
agentevals supports two mechanisms for receiving GenAI message content (`gen_ai.input.messages`, `gen_ai.output.messages`):
112+
113+
**Log records (recommended).** Instrumentation libraries like `opentelemetry-instrumentation-openai-v2` emit message content as OTel log records correlated with spans via trace context. The server merges these back into spans during session completion (see `log_enrichment.py`).
114+
115+
**Span events (legacy, supported for backward compatibility).** Some frameworks (notably Strands with `OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental`) currently emit message content as span event attributes. The `AgentEvalsStreamingProcessor` promotes these attributes to span-level attributes so downstream converters see a uniform shape. This promotion happens in three places:
116+
117+
- `streaming/processor.py` for live WebSocket spans
118+
- `api/otlp_routes.py` for OTLP HTTP reception
119+
- `loader/otlp.py` for loading OTLP JSON files
120+
121+
The OTel community is [deprecating span events](https://opentelemetry.io/blog/2026/deprecating-span-events/) in favor of log-based events emitted via the Logs API. As frameworks migrate, the log-based path will become the standard. The span event promotion logic will remain for backward compatibility with older instrumentation versions.
122+
123+
For a full overview of supported OTel conventions and migration guidance, see [otel-compatibility.md](./otel-compatibility.md).
124+
109125
## WebSocket Protocol
110126

111127
### Endpoint: `/ws/traces`

examples/README.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -92,12 +92,14 @@ Detection checks for `gen_ai.request.model` / `gen_ai.input.messages` (GenAI sem
9292
| Example | Framework | LLM Provider | Instrumentation | Content Delivery |
9393
|---------|-----------|-------------|-----------------|-----------------|
9494
| [zero-code-examples/langchain/](./zero-code-examples/langchain/) | LangChain | OpenAI | GenAI semconv (logs) | Standard OTLP export |
95-
| [zero-code-examples/strands/](./zero-code-examples/strands/) | Strands | OpenAI | GenAI semconv (events) | Standard OTLP export |
95+
| [zero-code-examples/strands/](./zero-code-examples/strands/) | Strands | OpenAI | GenAI semconv (events*) | Standard OTLP export |
9696
| [zero-code-examples/adk/](./zero-code-examples/adk/) | Google ADK | Gemini | ADK built-in | Standard OTLP export |
9797
| [langchain_agent](./langchain_agent/) | LangChain | OpenAI | GenAI semconv (logs) | SDK WebSocket |
98-
| [strands_agent](./strands_agent/) | Strands | OpenAI | GenAI semconv (events) | SDK WebSocket |
98+
| [strands_agent](./strands_agent/) | Strands | OpenAI | GenAI semconv (events*) | SDK WebSocket |
9999
| [dice_agent](./dice_agent/) | Google ADK | Gemini | ADK built-in | SDK WebSocket |
100100

101+
*\*Span events are [being deprecated](https://opentelemetry.io/blog/2026/deprecating-span-events/) in favor of log-based events. agentevals supports both. See [docs/otel-compatibility.md](../docs/otel-compatibility.md) for details.*
102+
101103
The zero-code and SDK examples implement the same toy agent (dice rolling + prime checking) so you can compare the two approaches directly.
102104

103105
## Advanced: GenAI Semantic Convention Patterns
@@ -135,6 +137,9 @@ See [langchain_agent/README.md](./langchain_agent/README.md) for the full walkth
135137

136138
### Events-Based Content ([strands_agent](./strands_agent/))
137139

140+
> [!NOTE]
141+
> The OTel community is [deprecating span events](https://opentelemetry.io/blog/2026/deprecating-span-events/) in favor of log-based events emitted via the Logs API. Frameworks currently using span events (like Strands) are expected to migrate to log-based events in future versions. agentevals supports both patterns and will continue to handle span events for backward compatibility.
142+
138143
Used by frameworks that emit message content as **span events** rather than separate log records. The `AgentEvalsStreamingProcessor` automatically promotes `gen_ai.input.messages` and `gen_ai.output.messages` from event attributes to span attributes, so downstream processing sees a uniform shape.
139144

140145
This pattern needs only a `TracerProvider`, no `LoggerProvider` or log processor:
@@ -149,11 +154,14 @@ telemetry.tracer_provider.add_span_processor(processor)
149154

150155
### Which Pattern Should I Use?
151156

157+
- **For new instrumentation, prefer the logs-based pattern.** The OTel community recommends emitting events as log records rather than span events going forward.
152158
- **Check your framework/library docs first.** They will tell you whether message content is emitted as logs or span events.
153159
- If your instrumentation library requires a `LoggerProvider` (like `opentelemetry-instrumentation-openai-v2`), use the **logs-based** pattern.
154-
- If your framework emits GenAI span events (like Strands with `StrandsTelemetry`), use the **events-based** pattern, it's simpler.
160+
- If your framework currently emits GenAI span events (like Strands with `StrandsTelemetry`), the **events-based** pattern works today. When the framework migrates to log-based events, switch to the logs-based pattern.
155161
- If you're using **Google ADK**, skip GenAI semconv entirely. See the next section.
156162

163+
For a detailed overview of OTel compatibility and the ongoing migration, see [docs/otel-compatibility.md](../docs/otel-compatibility.md).
164+
157165
## Framework-Native Tracing (Google ADK)
158166

159167
Google ADK instruments agents automatically under the `gcp.vertex.agent` OTel scope. It emits proprietary attributes (`gcp.vertex.agent.llm_request`, `gcp.vertex.agent.llm_response`, etc.) directly on spans. agentevals has a dedicated converter for this format.

examples/strands_agent/main.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,12 @@
1010
events with gen_ai.input.messages / gen_ai.output.messages attributes, which the
1111
streaming processor promotes from span events to span attributes
1212
13+
Note: Strands currently delivers message content via span events. The OTel community
14+
is deprecating span events in favor of log-based events (see
15+
https://opentelemetry.io/blog/2026/deprecating-span-events/). Future Strands versions
16+
will likely migrate to log-based events, at which point this example should switch to
17+
the logs-based pattern used in langchain_agent/.
18+
1319
Prerequisites:
1420
1. Install dependencies:
1521
$ pip install -r requirements.txt

0 commit comments

Comments
 (0)