-
Notifications
You must be signed in to change notification settings - Fork 213
feat: Add OpenTelemetry tracing to AG-UI endpoint #1202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add OTEL instrumentation to the experimental AG-UI chat endpoint for observability and distributed tracing. Traces are exported to AWS OSIS (OpenSearch Ingestion Service) using SigV4 authentication. Signed-off-by: Megha Goyal <[email protected]>
WalkthroughIntroduces OpenTelemetry tracing infrastructure into the experimental AG-UI server. Adds tracer initialization with optional AWS SigV4 authentication for OSIS endpoints, semantic attribute constants, and integrates tracing into the server's streaming event generator to capture agent runs, tool executions, and error states. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Server as AG-UI Server
participant Tracer as OTEL Tracer
participant Exporter as OTLP/OSIS Backend
Client->>Server: Chat request
activate Server
Server->>Tracer: init_otel_tracer()
activate Tracer
Tracer->>Exporter: Configure OTLP HTTP exporter
Tracer-->>Server: Tracer ready
deactivate Tracer
Server->>Tracer: start_root_span(agent_run)
activate Tracer
note over Tracer: Set REQUEST_ID, CONVERSATION_ID<br/>AGENT_TYPE, MODEL attributes
loop For each tool invocation
Server->>Tracer: start_child_span(tool_execute)
note over Tracer: Set TOOL_NAME, TOOL_CALL_ID<br/>Record TOOL_DURATION_MS<br/>Record TOOL_OUTPUT
Server->>Tracer: end_span()
note over Tracer: Increment tool_call_count
end
alt Success
Server->>Tracer: mark_root_span_success()
note over Tracer: Set RESULT_SUCCESS = true
else Error
Server->>Tracer: set_span_error(exception)
note over Tracer: Record ERROR_MESSAGE<br/>ERROR_TYPE
end
Server->>Tracer: end_root_span()
deactivate Tracer
Tracer->>Exporter: flush_spans()
note over Exporter: Spans exported to backend
Server-->>Client: Chat response
deactivate Server
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Areas requiring extra attention:
Pre-merge checks✅ Passed checks (3 passed)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (9)
experimental/otel/tracing.py (2)
158-159: Debug logging may expose sensitive headers.The Authorization header (containing AWS credentials signature) will be logged. While debug level, this could leak to log files in production if log levels are misconfigured.
🔎 Consider masking sensitive headers:
- # Debug: show signed headers - logging.debug(f"[OTEL SIGN] Headers after signing: {dict(aws_request.headers)}") + # Debug: show signed headers (mask Authorization) + debug_headers = {k: v if k.lower() != 'authorization' else '[MASKED]' + for k, v in aws_request.headers.items()} + logging.debug(f"[OTEL SIGN] Headers after signing: {debug_headers}")
351-360: Use attribute constants for consistency.
set_span_erroruses hardcoded"error.type"and"error.message"whileattributes.pydefinesERROR_TYPEandERROR_MESSAGEconstants. Using the constants ensures consistency.🔎 Apply this diff:
+from experimental.otel.attributes import ERROR_TYPE, ERROR_MESSAGE + def set_span_error(span: trace.Span, error: Exception) -> None: """Set error status and attributes on a span. Args: span: The span to set error on error: The exception that occurred """ span.set_status(Status(StatusCode.ERROR, str(error))) - span.set_attribute("error.type", type(error).__name__) - span.set_attribute("error.message", str(error)) + span.set_attribute(ERROR_TYPE, type(error).__name__) + span.set_attribute(ERROR_MESSAGE, str(error))experimental/otel/attributes.py (1)
55-72: Truncation result exceedsmax_size.The function truncates to
max_sizethen appends"...[TRUNCATED]"(14 chars), so the result ismax_size + 14bytes. If the goal is to enforce a strict size limit, account for the marker length.🔎 Strict size enforcement:
+TRUNCATION_MARKER = "...[TRUNCATED]" + def truncate(value: Optional[str], max_size: int = MAX_ATTRIBUTE_SIZE) -> str: if value is None: return "" if len(value) <= max_size: return value - return value[:max_size] + "...[TRUNCATED]" + return value[:max_size - len(TRUNCATION_MARKER)] + TRUNCATION_MARKERexperimental/ag-ui/server-agui.py (1)
21-29: Consider proper package structure over sys.path manipulation.The
sys.path.insertis fragile and can cause import issues. Since this is experimental code, it's acceptable, but consider adding the experimental directory as a proper package or using relative imports when stabilizing.experimental/otel/test_otel.py (2)
1-9: Consider converting to pytest format.The script works for quick verification but integrating with pytest would provide better CI/CD integration, fixtures for setup/teardown, and consistent test discovery.
113-117: Direct module state manipulation is fragile.Modifying
tracing._initializedand_tracer_providerdirectly works but is brittle. When converting to pytest, consider usingimportlib.reload()or providing a properreset()function in the tracing module for testing.experimental/otel/__init__.py (1)
1-51: Clean package API surface design.This module correctly aggregates and re-exports the public OTEL instrumentation API. The explicit
__all__makes the public interface clear.Optionally, consider sorting
__all__alphabetically for easier maintenance as the list grows (per static analysis hint RUF022).experimental/otel/test_otel_integration.py (2)
111-116: Fragile coupling to private module state.Directly resetting
tracing._initializedandtracing._tracer_providercouples this test to internal implementation details. If the module internals change (e.g., renamed variables, different state management), this test will break silently or require updates.Consider exposing a
reset_tracer()test utility in the tracing module if this pattern is needed elsewhere, or document this coupling with a comment.
255-261: Direct access to private_tracer_provider.Similar to
test_tracer_initialization, this accesses private module state. Consider whethershutdown_otel_tracer()(which is publicly exported) could be used, or expose aforce_flush()wrapper in the public API.The 10-second timeout for
force_flushis reasonable for integration tests.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
experimental/ag-ui/server-agui.py(8 hunks)experimental/otel/__init__.py(1 hunks)experimental/otel/attributes.py(1 hunks)experimental/otel/test_otel.py(1 hunks)experimental/otel/test_otel_integration.py(1 hunks)experimental/otel/tracing.py(1 hunks)pyproject.toml(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
experimental/otel/test_otel.py (3)
experimental/otel/tracing.py (1)
_extract_region_from_endpoint(71-90)experimental/otel/attributes.py (1)
truncate(55-72)holmes/core/tracing.py (2)
start_span(104-105)end(110-111)
experimental/otel/test_otel_integration.py (2)
experimental/otel/tracing.py (3)
init_otel_tracer(241-323)get_tracer(326-340)set_span_error(351-360)experimental/otel/attributes.py (1)
truncate(55-72)
experimental/ag-ui/server-agui.py (3)
experimental/otel/tracing.py (3)
init_otel_tracer(241-323)get_tracer(326-340)set_span_error(351-360)holmes/core/tracing.py (2)
start_span(104-105)end(110-111)experimental/otel/attributes.py (1)
truncate(55-72)
experimental/otel/__init__.py (2)
experimental/otel/tracing.py (4)
init_otel_tracer(241-323)get_tracer(326-340)shutdown_otel_tracer(343-348)set_span_error(351-360)experimental/otel/attributes.py (1)
truncate(55-72)
experimental/otel/tracing.py (1)
holmes/version.py (1)
get_version(48-130)
🪛 Ruff (0.14.8)
experimental/otel/test_otel.py
225-225: Do not catch blind exception: Exception
(BLE001)
experimental/otel/test_otel_integration.py
235-235: Abstract raise to an inner function
(TRY301)
235-235: Avoid specifying long messages outside the exception class
(TRY003)
237-237: Do not catch blind exception: Exception
(BLE001)
264-264: Consider moving this statement to an else block
(TRY300)
265-265: Do not catch blind exception: Exception
(BLE001)
453-453: Do not catch blind exception: Exception
(BLE001)
485-485: Do not catch blind exception: Exception
(BLE001)
experimental/ag-ui/server-agui.py
294-294: Use explicit conversion flag
Replace with conversion flag
(RUF010)
experimental/otel/__init__.py
29-51: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
experimental/otel/tracing.py
88-89: try-except-pass detected, consider logging the exception
(S110)
88-88: Do not catch blind exception: Exception
(BLE001)
106-106: Avoid specifying long messages outside the exception class
(TRY003)
234-234: Consider moving this statement to an else block
(TRY300)
318-318: Consider moving this statement to an else block
(TRY300)
369-369: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (16)
experimental/otel/tracing.py (2)
181-213: Thread-safety concern is well-documented.The environment variable manipulation for profile isolation is a known limitation. The documentation is clear about startup-only usage. Consider adding an assertion or runtime check to prevent accidental concurrent calls if this becomes a concern.
264-272: Initialization semantics are correct.Setting
_initialized = Trueeven when disabled or missing endpoint is intentional — it prevents repeated initialization attempts on subsequentget_tracer()calls. The early returns withFalsecorrectly indicate tracing won't be active.experimental/otel/attributes.py (1)
8-48: Well-structured attribute constants.The constants follow Gen AI semantic conventions and are clearly organized by category. Good documentation linking to AG-UI field mappings.
experimental/ag-ui/server-agui.py (4)
85-91: LGTM - Tracer initialization at startup.Module-level initialization ensures tracing is configured before handling requests. The
get_tracer()call safely returns a no-op tracer when disabled.
142-153: LGTM - Root span with correlation attributes.Creating the span inside the generator ensures it covers the streaming lifecycle. The correlation attributes (REQUEST_ID, CONVERSATION_ID) properly link traces to AG-UI request/thread identifiers.
208-229: LGTM - Tool execution spans with proper parent-child linking.The implementation correctly:
- Links child spans to the root using
trace.set_span_in_context- Calculates duration from tracked start times
- Truncates potentially large tool outputs
- Handles missing start times gracefully with
pop(..., None)
288-298: LGTM - Robust error handling and span lifecycle.The
finallyblock ensures the root span is always ended, preventing resource leaks. Error details are properly recorded on the span before the error event is yielded.experimental/otel/test_otel.py (2)
40-71: Good edge case coverage for truncate.Tests cover all key scenarios: None input, strings within limit, at exact limit, over limit, and empty strings. The assertions include helpful failure messages.
220-228: Test runner correctly aggregates failures.The broad
Exceptioncatch is intentional here to ensure all tests run even if one fails. Consider addingtraceback.print_exc()for better debugging of failures.experimental/otel/test_otel_integration.py (7)
1-28: Good documentation for integration test usage.The docstring clearly documents all required and optional environment variables, usage instructions, and the purpose of the test. This helps developers run the test correctly.
69-73: Good practice: Masking sensitive credentials in output.The password is correctly masked with
"***"to prevent accidental exposure in logs/output.
180-194: Correct span hierarchy and context propagation.Child tool spans are properly created with
trace.set_span_in_context(root_span)ensuring correct parent-child relationships in the trace. Each tool span is correctly ended within the loop.
230-243: Intentional error simulation for testing.The broad
Exceptioncatch and intentionalValueErrorraise are appropriate here—this is a test exercising the error-recording path. The finally block correctly ensures the span is ended regardless of outcome.
344-349: Good retry pattern with exponential backoff potential.The retry loop with configurable
max_retriesandretry_delayis appropriate for handling indexing delays. All HTTP requests correctly usetimeout=10to prevent hangs.
489-500: Good conditional verification with credential check.The OpenSearch verification is correctly gated behind a check for all three required variables (
OPENSEARCH_ENDPOINT,OPENSEARCH_USERNAME,OPENSEARCH_PASSWORD), and failure doesn't break the overall test result—appropriate for optional verification that may have indexing delays.
273-276: Credentials sourced from environment variables.OpenSearch credentials are correctly read from environment variables rather than hardcoded. Empty string defaults are appropriate since the verification is optional.
| search_body = { | ||
| "size": 10, | ||
| "query": { | ||
| "bool": { | ||
| "should": [ | ||
| {"match": {"resource.attributes.gen_ai@request@id": test_id}}, | ||
| {"match": {"attributes.gen_ai@request@id": test_id}}, | ||
| {"match": {"gen_ai.request.id": test_id}}, | ||
| {"wildcard": {"traceId": "*"}}, | ||
| ], | ||
| "minimum_should_match": 1, | ||
| "filter": [{"range": {"startTime": {"gte": "now-5m"}}}], | ||
| } | ||
| }, | ||
| "sort": [{"startTime": {"order": "desc"}}], | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overly broad wildcard query may return unrelated traces.
The {"wildcard": {"traceId": "*"}} clause matches any document with a traceId. Combined with minimum_should_match: 1, this query could return traces unrelated to the test, relying only on the 5-minute time filter for relevance.
Consider removing the wildcard clause or making it more specific—the other match clauses should be sufficient to find the test traces by test_id.
🔎 Suggested fix
search_body = {
"size": 10,
"query": {
"bool": {
"should": [
{"match": {"resource.attributes.gen_ai@request@id": test_id}},
{"match": {"attributes.gen_ai@request@id": test_id}},
{"match": {"gen_ai.request.id": test_id}},
- {"wildcard": {"traceId": "*"}},
],
"minimum_should_match": 1,
"filter": [{"range": {"startTime": {"gte": "now-5m"}}}],
}
},
"sort": [{"startTime": {"order": "desc"}}],
}🤖 Prompt for AI Agents
In experimental/otel/test_otel_integration.py around lines 308 to 323, the
search_body uses a very broad {"wildcard": {"traceId": "*"}} combined with
minimum_should_match: 1 which can return unrelated traces; remove the wildcard
clause (or replace it with a specific pattern tied to test_id if traceId pattern
is known) so the query relies on the explicit match clauses for
resource/attributes/gen_ai request id and the time filter, ensuring results are
scoped to the test.
| opentelemetry-api = "^1.20.0" | ||
| opentelemetry-sdk = "^1.20.0" | ||
| opentelemetry-exporter-otlp-proto-http = "^1.20.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upgrade OpenTelemetry dependencies to a recent version.
The three packages are correctly selected for OTLP HTTP export and the ^1.20.0 constraint format is appropriate. However, current stable releases are 1.39.1, which is 19 minor versions ahead. Since security fixes are only applied to the latest minor version, pinning to 1.20.0 prevents receiving security updates. Update to a recent version like ^1.39.0 to ensure continued maintenance and security patch availability.
🤖 Prompt for AI Agents
In pyproject.toml around lines 58 to 60, the OpenTelemetry package versions are
pinned to ^1.20.0 which is outdated; update opentelemetry-api,
opentelemetry-sdk, and opentelemetry-exporter-otlp-proto-http to a recent
maintained minor version (e.g., ^1.39.0 or the current latest) so the project
receives security and maintenance fixes; change the version specifiers
accordingly and run dependency resolution (poetry lock/install or equivalent) to
ensure compatibility and update any lockfile.
Summary
Add OpenTelemetry (OTEL) instrumentation to the experimental AG-UI chat endpoint for observability and distributed tracing. Traces are exported to AWS OSIS (OpenSearch Ingestion Service) using SigV4 authentication.
Motivation
Key changes:
Environment variables:
Changes
New Module:
experimental/otel/tracing.py: Tracer initialization with AWS SigV4 auth for OSIS endpointsattributes.py: Gen AI semantic convention attribute constants (REQUEST_ID, TOOL_NAME, etc.)__init__.py: Package exportsAG-UI Integration (
server-agui.py)agent.run) created for each chat request with correlation attributestool.execute) for each tool call with duration trackingTests
test_otel.py: Unit tests (no external dependencies required)test_otel_integration.py: Integration tests with real OSIS endpoint + OpenSearch verificationDependencies
opentelemetry-api,opentelemetry-sdk,opentelemetry-exporter-otlp-proto-httpConfiguration
Environment variables (all optional - tracing disabled by default):
OTEL_ENABLEDtrueto enable tracingOTEL_EXPORTER_OTLP_ENDPOINThttps://xxx.us-east-1.osis.amazonaws.com/path/v1/traces)OTEL_AWS_PROFILEOTEL_AWS_REGIONOTEL_SERVICE_NAMEholmesgpt)Testing
Unit Tests
Integration Tests (with real OSIS endpoint)
Notes
Summary by CodeRabbit
New Features
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.