-
Notifications
You must be signed in to change notification settings - Fork 74
Description
Requested here: https://discord.com/channels/1156433345631232100/1166779411920597002/1480698051319300147
Summary
DBOS SDK unconditionally overwrites the global OpenTelemetry TracerProvider during initialization and couples span creation to its own OTLP export pipeline. This makes it impossible for DBOS to integrate with existing APM solutions (Datadog dd-trace, Grafana Agent, etc.) that register their own TracerProvider via @opentelemetry/api.
We request that DBOS become a well-behaved OpenTelemetry citizen: use an existing global TracerProvider when one is already registered, and only create its own when none exists.
Context: Our Setup
We run a NestJS application on ECS Fargate with dd-trace for APM. The tracing architecture:
- dd-trace (
import 'dd-trace/init') auto-instruments NestJS, Fastify, gRPC, Prisma, Redis, and axios. It registers itself as the global OTelTracerProvider, so any library that creates spans via@opentelemetry/apiflows through dd-trace into Datadog. - Datadog Agent runs as a sidecar container, receiving spans on port 8126 (Datadog's native protocol).
- dd-trace provides Datadog-specific features we depend on: continuous profiling, Database Monitoring (DBM) propagation, Application Security Monitoring, runtime metrics, and automatic log-trace correlation (
dd.trace_id/dd.span_idinjection).
We recently adopted DBOS for durable workflow execution. DBOS workflows and steps are invisible to our APM.
The Problem
Issue 1: DBOS Overwrites the Global TracerProvider
When DBOS.launch() is called with enableOTLP: true, it unconditionally replaces the global TracerProvider:
// @dbos-inc/dbos-sdk/dist/src/telemetry/traces.js — installTraceContextManager()
function installTraceContextManager(appName = 'dbos') {
if (!utils_1.globalParams.enableOTLP) {
return;
}
const { BasicTracerProvider } = require('@opentelemetry/sdk-trace-base');
const provider = new BasicTracerProvider({
resource: { attributes: { 'service.name': appName } },
});
trace.setGlobalTracerProvider(provider); // ← Overwrites dd-trace's provider
}This is called again in the Tracer constructor:
// @dbos-inc/dbos-sdk/dist/src/telemetry/traces.js — Tracer constructor
constructor(telemetryCollector, appName = 'dbos') {
// ...
const tracer = new BasicTracerProvider({
resource: { attributes: { 'service.name': appName } },
});
trace.setGlobalTracerProvider(tracer); // ← Overwrites again
}Impact: dd-trace's TracerProvider is replaced. Any OTel-based instrumentation (e.g., Prisma's @prisma/instrumentation) that was registered with dd-trace's provider stops correlating with HTTP request traces. dd-trace's own auto-instrumentation (NestJS, gRPC, etc.) continues working because it uses internal references rather than the global provider, but the OTel interop layer is broken.
Issue 2: Span Creation is Coupled to OTLP Export
Every span creation point has this guard:
// @dbos-inc/dbos-sdk/dist/src/telemetry/traces.js — startSpan()
startSpan(name, attributes, inputSpan) {
if (!utils_1.globalParams.enableOTLP) {
return new StubSpan(); // ← No-op when OTLP is disabled
}
// ... create real span
}The same guard exists in runWithTrace():
function runWithTrace(span, func) {
if (!utils_1.globalParams.enableOTLP) {
return func(); // ← No trace context propagation
}
const { context, trace } = require('@opentelemetry/api');
return context.with(trace.setSpan(context.active(), span), func);
}Impact: If enableOTLP is false (the default unless DBOS__CLOUD=true), DBOS creates StubSpan no-ops for every workflow and step. There is no way to get real spans without also enabling DBOS's OTLP export pipeline. Even if dd-trace's provider were not overwritten, DBOS would still produce no spans.
Issue 3: Custom Export Pipeline Instead of Standard OTel SpanProcessors
DBOS uses a bespoke TelemetryCollector that batches spans on a 100ms interval and exports them directly via OTLPTraceExporter:
// @dbos-inc/dbos-sdk/dist/src/telemetry/collector.js
constructor(exporter) {
this.exporter = exporter;
this.signalBufferID = setInterval(() => {
void this.processAndExportSignals();
}, this.processAndExportSignalsIntervalMs);
}// @dbos-inc/dbos-sdk/dist/src/telemetry/exporters.js
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-proto');
this.tracesExporters.push(new OTLPTraceExporter({ url: endpoint }));Impact: Spans are manually collected and exported outside the standard OTel SDK pipeline (BatchSpanProcessor / SimpleSpanProcessor). This means spans don't flow through whatever SpanProcessor is registered on the global TracerProvider. Even if the global provider is dd-trace's, DBOS's spans bypass it entirely.
Net Result
These three issues combined mean DBOS's telemetry is a closed system. The only supported path is:
DBOS → BasicTracerProvider → TelemetryCollector → OTLPTraceExporter → OTLP endpoint
There is no integration point for dd-trace, Grafana Agent, or any other TracerProvider to capture DBOS spans.
Current Workaround
We must run two parallel trace pipelines into the same Datadog backend:
NestJS / Prisma / gRPC / Redis DBOS workflows & steps
│ │
dd-trace @dbos-inc/otel
(Datadog protocol) (OTLP HTTP)
│ │
▼ ▼
DD Agent :8126 DD Agent :4318
│ │
└───────────┐ ┌────────────────┘
▼ ▼
Datadog Backend
This requires:
- Enabling a second receiver (OTLP) on our Datadog Agent sidecar
- Exposing an additional port (4318) on the container
- Installing
@dbos-inc/oteland configuring OTLP endpoints - Two independent trace pipelines for a single service
It works, but it's operationally complex and the two trace trees (dd-trace HTTP spans and DBOS workflow spans) are not correlated — a workflow triggered by an HTTP request appears as two separate traces in Datadog.
Proposed Changes
Change 1: Respect Existing Global TracerProvider
function installTraceContextManager(appName = 'dbos') {
const { context, trace } = require('@opentelemetry/api');
const { AsyncLocalStorageContextManager } = require('@opentelemetry/context-async-hooks');
// Always set up context propagation (needed for parent-child span linking)
if (!context['_getContextManager']()) {
const contextManager = new AsyncLocalStorageContextManager();
contextManager.enable();
context.setGlobalContextManager(contextManager);
}
// Only create a TracerProvider if none exists
const existing = trace.getTracerProvider();
const isNoopProvider = !existing || existing.getTracer('test').startSpan('test').constructor.name === 'NonRecordingSpan';
if (isNoopProvider) {
const { BasicTracerProvider } = require('@opentelemetry/sdk-trace-base');
const provider = new BasicTracerProvider({
resource: { attributes: { 'service.name': appName } },
});
trace.setGlobalTracerProvider(provider);
}
// else: someone (dd-trace, etc.) already registered — use theirs
}Remove the duplicate setGlobalTracerProvider call from the Tracer constructor.
Change 2: Decouple Span Creation from OTLP Export
Introduce a separate flag or auto-detect whether span creation should be active:
// Create real spans when ANY TracerProvider is registered (dd-trace, OTel SDK, etc.)
// Export to OTLP only when enableOTLP is true AND endpoints are configured
startSpan(name, attributes, inputSpan) {
if (!this.spanCreationEnabled) {
return new StubSpan();
}
// ... create real span using trace.getTracer('dbos-tracer')
}Where spanCreationEnabled is true when:
enableOTLPis explicitlytrue, OR- A non-noop global
TracerProvideris already registered (indicating an external APM tool)
This way, enableOTLP: false + dd-trace = real spans created, flowing through dd-trace. enableOTLP: false + no provider = StubSpan (current behavior). enableOTLP: true = DBOS creates its own provider and exporter (current behavior).
Change 3: Use Standard OTel SpanProcessors
When DBOS creates its own BasicTracerProvider, register export via standard BatchSpanProcessor:
const { BatchSpanProcessor } = require('@opentelemetry/sdk-trace-base');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-proto');
const provider = new BasicTracerProvider({ ... });
for (const endpoint of tracesEndpoints) {
provider.addSpanProcessor(
new BatchSpanProcessor(new OTLPTraceExporter({ url: endpoint }))
);
}
trace.setGlobalTracerProvider(provider);This replaces the custom TelemetryCollector for traces. Benefits:
- Spans flow through the standard OTel pipeline
- Compatible with any
SpanProcessor(dd-trace's, custom sampling, etc.) - When an external provider is used, spans export through that provider's processors automatically — no DBOS-side export needed
Expected Behavior After Changes
Scenario 1: DBOS + dd-trace (our use case)
import 'dd-trace/init'; // Registers as global TracerProvider
DBOS.setConfig({ name: 'merchant-api', systemDatabaseUrl: '...' });
await DBOS.launch();
// DBOS sees dd-trace's provider → creates real spans via dd-trace
// Workflow and step spans appear in Datadog APM under the same trace as HTTP spans
// No OTLP endpoint neededScenario 2: DBOS standalone with OTLP (e.g., Jaeger, Grafana)
DBOS.setConfig({
name: 'my-app',
enableOTLP: true,
otlpTracesEndpoints: ['http://localhost:4318/v1/traces'],
});
await DBOS.launch();
// No existing provider → DBOS creates BasicTracerProvider + BatchSpanProcessor
// Exports to OTLP endpoint (current behavior, unchanged)Scenario 3: DBOS on DBOS Cloud
DBOS__CLOUD=true → enableOTLP defaults to true → current behavior, unchanged
Impact
This change would make DBOS compatible with the broader OpenTelemetry ecosystem. Any APM vendor that registers a TracerProvider (Datadog, Dynatrace, New Relic, Honeycomb, Grafana, etc.) would automatically capture DBOS workflow and step spans with zero additional configuration.
The key principle: DBOS should be a TracerProvider consumer, not a TracerProvider owner — unless no provider exists.