Background
Qwen Code's OpenTelemetry implementation is increasingly complete for the interactive/runtime path, but qwen serve still has a daemon-specific observability gap.
Today the serve daemon process handles HTTP routing, session lifecycle APIs, bridge queueing, ACP child process management, prompt dispatch, cancellation, SSE/EventBus fan-out, and bridge error translation. Most of those daemon-layer operations are not represented in OpenTelemetry. The ACP child can initialize telemetry after loadCliConfig(...) and may emit agent-internal model/tool logs or spans, but that does not cover the full daemon path from HTTP request to bridge to child to response/events.
Current findings:
qwen serve starts from packages/cli/src/commands/serve.ts and packages/cli/src/serve/runQwenServe.ts; it calls the serve runner directly and does not construct a Config for the daemon process, so initializeTelemetry(...) is not run in the daemon itself.
Config initializes telemetry from packages/core/src/config/config.ts, so telemetry exists mainly in paths that build a normal runtime config.
- ACP sessions call
loadCliConfig(...) in packages/cli/src/acp-integration/acpAgent.ts, so child processes can have telemetry if settings enable it.
- The ACP session path logs user prompts/tool calls, but it does not currently provide the same top-level interaction span coverage as the interactive
client.ts path.
sendBridgeError(...) and the bridge lifecycle are primarily observable through daemon stderr today, not OTel traces/logs.
Related but distinct work:
This issue is narrower than #3731 and #4548: make daemon-mode execution reconstructable as a coherent OpenTelemetry trace/log/metric story.
Problem
When a daemon client sees an error such as POST /session/:id/prompt returning HTTP 500, operators cannot reconstruct the complete path from telemetry alone:
- inbound HTTP request to the daemon
- route validation and client/session lookup
- bridge channel selection or child spawn/reuse
- prompt queue wait and dispatch
- ACP child prompt handling
- model request and tool execution
- SSE/EventBus output fan-out
- cancellation, close, child exit, and error translation
Some lower-level model/tool telemetry may exist in the child, but the parent daemon span, bridge span, queue timing, lifecycle events, and error mapping are missing. This leaves gaps between client-visible HTTP failures and agent-internal telemetry.
There is also a multi-session concern: the current telemetry SDK is process-level, while daemon mode may serve multiple sessions over time. Any daemon/ACP telemetry work must avoid stale session root context and must attribute spans/logs to the correct workspace, session, prompt, and client.
Proposal
Add OpenTelemetry coverage for the qwen serve daemon path.
Suggested scope:
1. Initialize telemetry in the daemon process
- Initialize OTel before the HTTP server starts when telemetry is enabled for the daemon workspace/config.
- Reuse existing exporter, shutdown, diagnostic suppression, resource-attribute, and bounded flush semantics from the core telemetry SDK.
- Ensure the daemon process does not emit exporter diagnostics to stdout/stderr in structured/non-interactive contexts.
- Flush/shutdown telemetry during serve shutdown/drain.
2. Add daemon HTTP/request spans
Create a span per relevant daemon request, using route templates rather than raw URLs. At minimum cover:
POST /session
POST /session/:id/load
POST /session/:id/prompt
POST /session/:id/cancel
DELETE /session/:id
GET /workspace/:id/sessions
- SSE/EventBus subscription routes if applicable
Recommended attributes:
- HTTP method, route template, status code
- workspace id/path hash where safe
- session id when known
- prompt id when known
- client id when known
- request duration
- error code/type and sanitized error message for failures
3. Add bridge and child-process spans/events
Instrument the daemon bridge around operations that are invisible from ACP child telemetry:
- session create/load/close/cancel
- child process spawn/reuse/exit
- bridge channel lookup
- prompt queue wait time
- prompt dispatch duration
- cancel propagation to ACP child
- pending permission cancellation
- EventBus/SSE publish/fan-out failures
- bridge transport close/errors
This should make a prompt trace show where time was spent before the ACP child began model/tool work.
4. Propagate trace context across daemon and ACP child
Define a W3C trace context boundary between daemon request spans and ACP child work.
Possible approaches:
- pass
traceparent/tracestate through an ACP request metadata field if the protocol allows it;
- pass a daemon-generated trace context in an internal envelope field that is not exposed as user prompt content;
- fall back to OTel links if strict parent-child context is unsafe for queued or long-lived work.
The child-side prompt/interaction span should be parented to, or linked from, the daemon prompt/bridge span so the trace is navigable end to end.
5. Align ACP session tracing with interactive tracing
Bring ACP prompt handling closer to the interactive client.ts trace tree:
- create a top-level interaction/prompt span for each ACP prompt;
- ensure child LLM spans and tool spans attach under the correct prompt span;
- preserve existing prompt/tool log events;
- avoid global session-root leakage across multiple sessions in one long-lived process.
6. Add daemon metrics/log records where useful
Metrics/logs should complement traces without creating high-cardinality explosions.
Useful low-cardinality metrics may include:
- request count/latency by route and status class
- active sessions by workspace
- prompt queue wait duration
- child process spawn/restart count
- bridge error count by error code/type
- cancellation/close count
Log records should include trace/span ids where possible, especially for bridge errors and child stderr correlation.
Acceptance criteria
- With telemetry enabled,
POST /session/:id/prompt produces a trace that starts at the daemon HTTP route and continues through bridge dispatch into ACP child prompt handling, LLM requests, and tool execution where applicable.
- A generic daemon 500 is marked on the relevant span and emits a correlated log record with route, session id, prompt id if known, and sanitized error details.
- Session lifecycle APIs (
create, load, cancel, close, list) emit useful spans/events or metrics.
- Child process spawn/reuse/exit and bridge transport failures are observable.
- Trace context is propagated or linked across the daemon-to-ACP-child boundary.
- Multiple sessions handled by one daemon do not share stale session root context; spans/logs are attributed to the correct session/workspace/prompt.
- Existing interactive/TUI telemetry behavior remains unchanged.
- OTel shutdown is bounded and runs during daemon shutdown/drain.
- Tests cover daemon telemetry initialization, route span attributes, bridge error span status/logging, context propagation/linking, and multi-session attribution.
Non-goals
Open questions
- What config should control daemon telemetry initialization when
qwen serve has not created a normal session Config yet?
- Should daemon HTTP route spans be implemented manually, through HTTP/Express instrumentation, or both?
- Should daemon-to-child context use parent/child propagation or OTel links for queued prompt work?
- Should ACP child telemetry be one process per session, one process per workspace, or explicitly multi-session-safe with refreshed session context?
Background
Qwen Code's OpenTelemetry implementation is increasingly complete for the interactive/runtime path, but
qwen servestill has a daemon-specific observability gap.Today the serve daemon process handles HTTP routing, session lifecycle APIs, bridge queueing, ACP child process management, prompt dispatch, cancellation, SSE/EventBus fan-out, and bridge error translation. Most of those daemon-layer operations are not represented in OpenTelemetry. The ACP child can initialize telemetry after
loadCliConfig(...)and may emit agent-internal model/tool logs or spans, but that does not cover the full daemon path from HTTP request to bridge to child to response/events.Current findings:
qwen servestarts frompackages/cli/src/commands/serve.tsandpackages/cli/src/serve/runQwenServe.ts; it calls the serve runner directly and does not construct aConfigfor the daemon process, soinitializeTelemetry(...)is not run in the daemon itself.Configinitializes telemetry frompackages/core/src/config/config.ts, so telemetry exists mainly in paths that build a normal runtime config.loadCliConfig(...)inpackages/cli/src/acp-integration/acpAgent.ts, so child processes can have telemetry if settings enable it.client.tspath.sendBridgeError(...)and the bridge lifecycle are primarily observable through daemon stderr today, not OTel traces/logs.Related but distinct work:
traceparent+ session-id propagation.This issue is narrower than #3731 and #4548: make daemon-mode execution reconstructable as a coherent OpenTelemetry trace/log/metric story.
Problem
When a daemon client sees an error such as
POST /session/:id/promptreturning HTTP 500, operators cannot reconstruct the complete path from telemetry alone:Some lower-level model/tool telemetry may exist in the child, but the parent daemon span, bridge span, queue timing, lifecycle events, and error mapping are missing. This leaves gaps between client-visible HTTP failures and agent-internal telemetry.
There is also a multi-session concern: the current telemetry SDK is process-level, while daemon mode may serve multiple sessions over time. Any daemon/ACP telemetry work must avoid stale session root context and must attribute spans/logs to the correct workspace, session, prompt, and client.
Proposal
Add OpenTelemetry coverage for the
qwen servedaemon path.Suggested scope:
1. Initialize telemetry in the daemon process
2. Add daemon HTTP/request spans
Create a span per relevant daemon request, using route templates rather than raw URLs. At minimum cover:
POST /sessionPOST /session/:id/loadPOST /session/:id/promptPOST /session/:id/cancelDELETE /session/:idGET /workspace/:id/sessionsRecommended attributes:
3. Add bridge and child-process spans/events
Instrument the daemon bridge around operations that are invisible from ACP child telemetry:
This should make a prompt trace show where time was spent before the ACP child began model/tool work.
4. Propagate trace context across daemon and ACP child
Define a W3C trace context boundary between daemon request spans and ACP child work.
Possible approaches:
traceparent/tracestatethrough an ACP request metadata field if the protocol allows it;The child-side prompt/interaction span should be parented to, or linked from, the daemon prompt/bridge span so the trace is navigable end to end.
5. Align ACP session tracing with interactive tracing
Bring ACP prompt handling closer to the interactive
client.tstrace tree:6. Add daemon metrics/log records where useful
Metrics/logs should complement traces without creating high-cardinality explosions.
Useful low-cardinality metrics may include:
Log records should include trace/span ids where possible, especially for bridge errors and child stderr correlation.
Acceptance criteria
POST /session/:id/promptproduces a trace that starts at the daemon HTTP route and continues through bridge dispatch into ACP child prompt handling, LLM requests, and tool execution where applicable.create,load,cancel,close,list) emit useful spans/events or metrics.Non-goals
Open questions
qwen servehas not created a normal sessionConfigyet?