feat(llmobs): add support for OpenAI Agents by crysmags · Pull Request #7808 · DataDog/dd-trace-js

crysmags · 2026-03-17T16:27:29Z

What does this PR do?

Adds APM tracing and LLMObs instrumentation for the OpenAI Agents SDK (@openai/agents).

Tracing — instruments the following operations via Orchestrion (rewriter-based):

run() — agent execution span (openai-agents.run)
getResponse() / getStreamedResponse() — model request spans (openai-agents.getResponse, openai-agents.getStreamedResponse)
invokeFunctionTool() — tool call spans (openai-agents.invokeFunctionTool)
onInvokeHandoff() — agent handoff spans (openai-agents.onInvokeHandoff)
runInputGuardrails() / runOutputGuardrails() — guardrail spans

LLMObs — adds an LLMObs plugin (packages/dd-trace/src/llmobs/plugins/openai-agents/) that enriches spans with LLM observability tags (model, provider, token usage, input/output).

Motivation

The OpenAI Agents SDK is a first-party framework from OpenAI for building multi-agent systems in Node.js. It reached a stable API in >=0.7.0. Instrumenting it gives Datadog users distributed tracing and LLM observability for agent workflows without any code changes.

Additional Notes

Instrumentation targets @openai/agents-core and @openai/agents-openai (the sub-packages that contain the actual implementation); @openai/agents is the umbrella re-export.
Uses the Orchestrion rewriter pattern (same as ai, langchain) rather than traditional addHook — patches are applied at the function level via versionRange: '>=0.7.0'.
14 tracing tests and 6 LLMObs tests added; both suites run with PLUGINS=openai-agents.

github-actions · 2026-03-17T17:45:09Z

Overall package size

Self size: 5.48 MB
Deduped: 6.33 MB
No deduping: 6.33 MB

Dependency sizes

| name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 3.0.1 | 82.56 kB | 817.39 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |

_{🤖 This report was automatically generated by heaviest-objects-in-the-universe}

.github/workflows/apm-integrations.yml

index.d.ts

wconti27 · 2026-03-18T18:07:10Z

packages/datadog-plugin-openai-agents/src/client.js

+    const usage = result.usage
+    if (usage) {
+      if (usage.inputTokens !== undefined) {
+        span.setTag('openai.response.usage.prompt_tokens', usage.inputTokens)
+      }
+      if (usage.outputTokens !== undefined) {
+        span.setTag('openai.response.usage.completion_tokens', usage.outputTokens)
+      }
+      if (usage.totalTokens !== undefined) {
+        span.setTag('openai.response.usage.total_tokens', usage.totalTokens)
+      }


AFAIK we only tag token usage on the actual llm event spans, and not the apm tracing spans

yeah, we don't need to add really any tags on the APM spans here

wconti27 · 2026-03-18T18:10:26Z

packages/datadog-plugin-openai-agents/test/index.spec.js

+          service: ANY_STRING,
+          resource: ANY_STRING,


we should assert the actual values here.

as well as the rest of the file for values that are constant

packages/datadog-plugin-openai-agents/test/test-setup.js

wconti27 · 2026-03-18T18:14:23Z

packages/dd-trace/src/config/supported-configurations.json

        "default": null
      }
    ],
+    "DD_TRACE_OPENAI_AGENTS_ENABLED": ["A"],


seems like our code generator is generating the wrong format for this, should match the below structure.

wconti27 · 2026-03-18T18:18:13Z

packages/dd-trace/src/llmobs/plugins/openai-agents/index.js

+  /**
+   * For streaming, the span finishes before stream iteration begins.
+   * Output data is not available, so we only tag inputs and metadata.
+   *
+   * @param {{ currentStore?: { span: object }, arguments?: Array<*> }} ctx
+   */
+  setLLMObsTags (ctx) {
+    const span = ctx.currentStore?.span
+    if (!span) return
+
+    const request = ctx.arguments?.[0]
+    const inputMessages = extractInputMessages(request)
+
+    // Streaming spans finish before iteration; output is not available
+    this._tagger.tagLLMIO(span, inputMessages, [{ content: '', role: '' }])
+
+    const metadata = extractMetadata(request)
+    metadata.stream = true
+    this._tagger.tagMetadata(span, metadata)
+  }


it seems to have skipped tagging streaming output, which we should collect by wrapping the returned async iterator.

wconti27 · 2026-03-18T18:18:41Z

packages/dd-trace/src/llmobs/plugins/openai-agents/index.js

+  if (baseURL.includes('azure')) return 'azure_openai'
+  if (baseURL.includes('deepseek')) return 'deepseek'
+  return 'openai'


can we parse the url instead of hardcoding?

wconti27 · 2026-03-18T18:19:41Z

packages/dd-trace/src/llmobs/plugins/openai-agents/index.js

+    for (const item of input) {
+      if (item.type === 'message') {
+        const role = item.role
+        if (!role) continue
+
+        let content = ''
+        if (Array.isArray(item.content)) {
+          const textParts = item.content
+            .filter(c => c.type === 'input_text' || c.type === 'text')
+            .map(c => c.text)
+          content = textParts.join('')
+        } else if (typeof item.content === 'string') {
+          content = item.content
+        }
+
+        if (content) {
+          messages.push({ role, content })
+        }
+      } else if (item.type === 'function_call') {
+        let args = item.arguments
+        if (typeof args === 'string') {
+          try {
+            args = JSON.parse(args)
+          } catch {
+            args = {}
+          }
+        }


can we break this function into some helpers? Would help to improve readability.

packages/dd-trace/test/llmobs/plugins/openai-agents/index.spec.js

PROFeNoM · 2026-03-20T14:42:39Z

At first glance, looking at packages/dd-trace/test/llmobs/plugins/openai-agents/index.spec.js, I can see that only llm LLMObs span kinds are being generated. Looking at tests/contrib/openai_agents/test_openai_agents_llmobs.py in dd-trace-py, the Python integration generates workflow, agent, llm, tool, and task span kinds. Are we missing observability for some operations?

More importantly, The Python integration achieves this by hooking into the SDK's TracingProcessor interface via add_trace_processor(), which gives it full semantic span tree. The JS SDK has the same infrastructure: addTraceProcessor() and a TracingProcessor interface with onTraceStart/onTraceEnd/onSpanStart/onSpanEnd.
⚠️ I strongly believe this is the approach to use. ⚠️

wconti27 · 2026-03-20T16:08:54Z

At first glance, looking at packages/dd-trace/test/llmobs/plugins/openai-agents/index.spec.js, I can see that only llm LLMObs span kinds are being generated. Looking at tests/contrib/openai_agents/test_openai_agents_llmobs.py in dd-trace-py, the Python integration generates workflow, agent, llm, tool, and task span kinds. Are we missing observability for some operations?

More importantly, The Python integration achieves this by hooking into the SDK's TracingProcessor interface via add_trace_processor(), which gives it full semantic span tree. The JS SDK has the same infrastructure: addTraceProcessor() and a TracingProcessor interface with onTraceStart/onTraceEnd/onSpanStart/onSpanEnd. ⚠️ I strongly believe this is the approach to use. ⚠️

We actually had a few meetings of how to handle these cases, one with @sabrenner , another with out larger IDM team, and another with the node core engineers. Basically we decided to utilize these types of trace processors when the processor is OTel compatible. Given that the tracing provided by the package is not OTel, and is some form of internal tracing, which can change on a whim leaving our instrumentation broken, we decided to not go that route for these types of cases.

PROFeNoM · 2026-03-23T09:58:17Z

which can change on a whim leaving our instrumentation broken

Hummm... I'm not totally convinced by that tbh. By that I mean that tracing specific, internal, methods, in inherently more brittle than rely on an interface that should follow semver. It's not guaranteed, I agree, but has at least better odds.

Regardless of the approach chosen, it still seems to be that there quite some difference between the current Python integration and the proposed NodeJS one (only llm LLMObs span kinds are being generated). Is there a plan to align both integrations?

datadog-datadog-prod-us1-2 · 2026-03-25T21:28:26Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
• Patch Coverage: 27.36%
• Overall Coverage: 68.15% (-0.37%)

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 34073c2 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!}

codecov · 2026-03-25T21:28:45Z

Codecov Report

❌ Patch coverage is 26.21723% with 197 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.90%. Comparing base (e68f386) to head (7fa17a3).
⚠️ Report is 80 commits behind head on master.

Files with missing lines	Patch %	Lines
...dd-trace/src/llmobs/plugins/openai-agents/utils.js	1.85%	106 Missing ⚠️
...dd-trace/src/llmobs/plugins/openai-agents/index.js	20.17%	91 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7808      +/-   ##
==========================================
- Coverage   80.45%   73.90%   -6.56%     
==========================================
  Files         748      776      +28     
  Lines       32411    36335    +3924     
==========================================
+ Hits        26076    26853     +777     
- Misses       6335     9482    +3147

Flag	Coverage Δ
aiguard-macos	`39.44% <33.33%> (+0.24%)`	⬆️
aiguard-ubuntu	`39.56% <33.33%> (+0.24%)`	⬆️
aiguard-windows	`39.23% <33.33%> (+0.17%)`	⬆️
apm-capabilities-tracing-macos	`49.26% <20.07%> (+0.35%)`	⬆️
apm-capabilities-tracing-ubuntu	`49.18% <20.07%> (+0.23%)`	⬆️
apm-capabilities-tracing-windows	`49.03% <20.07%> (+0.35%)`	⬆️
apm-integrations-child-process	`38.74% <33.33%> (+0.23%)`	⬆️
apm-integrations-couchbase-18	`37.52% <33.33%> (+0.09%)`	⬆️
apm-integrations-couchbase-eol	`38.04% <33.33%> (+0.15%)`	⬆️
apm-integrations-oracledb	`37.87% <33.33%> (+0.14%)`	⬆️
appsec-express	`55.38% <33.33%> (+0.12%)`	⬆️
appsec-fastify	`51.71% <33.33%> (+0.10%)`	⬆️
appsec-graphql	`51.86% <33.33%> (+0.06%)`	⬆️
appsec-kafka	`44.48% <33.33%> (+0.11%)`	⬆️
appsec-ldapjs	`44.11% <33.33%> (+0.10%)`	⬆️
appsec-lodash	`43.70% <33.33%> (+0.07%)`	⬆️
appsec-macos	`58.08% <33.33%> (-0.11%)`	⬇️
appsec-mongodb-core	`48.89% <33.33%> (+0.10%)`	⬆️
appsec-mongoose	`49.55% <33.33%> (+0.10%)`	⬆️
appsec-mysql	`51.07% <33.33%> (+0.21%)`	⬆️
appsec-node-serialize	`43.29% <33.33%> (+0.10%)`	⬆️
appsec-passport	`47.75% <33.33%> (+0.11%)`	⬆️
appsec-postgres	`50.70% <33.33%> (+0.11%)`	⬆️
appsec-sourcing	`42.54% <33.33%> (-0.07%)`	⬇️
appsec-stripe	`44.73% <33.33%> (?)`
appsec-template	`43.45% <33.33%> (+0.10%)`	⬆️
appsec-ubuntu	`58.17% <33.33%> (-0.09%)`	⬇️
appsec-windows	`57.91% <33.33%> (-0.14%)`	⬇️
instrumentations-instrumentation-bluebird	`32.33% <33.33%> (-0.02%)`	⬇️
instrumentations-instrumentation-body-parser	`40.63% <33.33%> (+0.11%)`	⬆️
instrumentations-instrumentation-child_process	`38.07% <33.33%> (+0.24%)`	⬆️
instrumentations-instrumentation-cookie-parser	`34.36% <33.33%> (+0.04%)`	⬆️
instrumentations-instrumentation-express	`34.67% <33.33%> (+0.04%)`	⬆️
instrumentations-instrumentation-express-mongo-sanitize	`34.49% <33.33%> (+0.04%)`	⬆️
instrumentations-instrumentation-express-session	`40.27% <33.33%> (+0.11%)`	⬆️
instrumentations-instrumentation-fs	`32.01% <33.33%> (+0.05%)`	⬆️
instrumentations-instrumentation-generic-pool	`29.41% <50.00%> (-0.11%)`	⬇️
instrumentations-instrumentation-http	`39.99% <33.33%> (+0.19%)`	⬆️
instrumentations-instrumentation-knex	`32.39% <33.33%> (+0.05%)`	⬆️
instrumentations-instrumentation-mongoose	`33.51% <33.33%> (+0.04%)`	⬆️
instrumentations-instrumentation-multer	`40.38% <33.33%> (+0.11%)`	⬆️
instrumentations-instrumentation-mysql2	`38.40% <33.33%> (+0.12%)`	⬆️
instrumentations-instrumentation-passport	`44.16% <33.33%> (+0.11%)`	⬆️
instrumentations-instrumentation-passport-http	`43.84% <33.33%> (+0.11%)`	⬆️
instrumentations-instrumentation-passport-local	`44.37% <33.33%> (+0.11%)`	⬆️
instrumentations-instrumentation-pg	`37.84% <33.33%> (+0.12%)`	⬆️
instrumentations-instrumentation-promise	`32.26% <33.33%> (-0.02%)`	⬇️
instrumentations-instrumentation-promise-js	`32.26% <33.33%> (-0.02%)`	⬇️
instrumentations-instrumentation-q	`32.31% <33.33%> (-0.02%)`	⬇️
instrumentations-instrumentation-url	`32.23% <33.33%> (-0.02%)`	⬇️
instrumentations-instrumentation-when	`32.28% <33.33%> (-0.02%)`	⬇️
llmobs-ai	`41.37% <33.33%> (-0.89%)`	⬇️
llmobs-anthropic	`40.84% <33.33%> (+0.55%)`	⬆️
llmobs-bedrock	`39.32% <33.33%> (+0.08%)`	⬆️
llmobs-google-genai	`39.87% <33.33%> (-0.04%)`	⬇️
llmobs-langchain	`39.45% <33.33%> (-0.58%)`	⬇️
llmobs-openai	`44.12% <33.33%> (+0.15%)`	⬆️
llmobs-vertex-ai	`40.13% <33.33%> (+0.09%)`	⬆️
platform-core	`31.47% <ø> (ø)`
platform-esbuild	`34.42% <ø> (ø)`
platform-instrumentations-misc	`34.19% <100.00%> (-14.22%)`	⬇️
platform-shimmer	`37.56% <ø> (ø)`
platform-unit-guardrails	`32.89% <ø> (ø)`
platform-webpack	`19.88% <83.33%> (?)`
plugins-azure-durable-functions	`25.86% <100.00%> (+0.11%)`	⬆️
plugins-azure-event-hubs	`26.02% <100.00%> (+0.11%)`	⬆️
plugins-azure-service-bus	`25.38% <100.00%> (+0.11%)`	⬆️
plugins-bullmq	`43.60% <33.33%> (-0.60%)`	⬇️
plugins-cassandra	`38.02% <33.33%> (+0.25%)`	⬆️
plugins-cookie	`27.08% <100.00%> (+0.11%)`	⬆️
plugins-cookie-parser	`26.86% <100.00%> (+0.11%)`	⬆️
plugins-crypto	`26.73% <ø> (ø)`
plugins-dd-trace-api	`38.43% <33.33%> (+0.11%)`	⬆️
plugins-express-mongo-sanitize	`27.01% <100.00%> (+0.11%)`	⬆️
plugins-express-session	`26.82% <100.00%> (+0.11%)`	⬆️
plugins-fastify	`42.36% <33.33%> (+0.12%)`	⬆️
plugins-fetch	`38.51% <33.33%> (+0.18%)`	⬆️
plugins-fs	`38.75% <33.33%> (+0.14%)`	⬆️
plugins-generic-pool	`26.06% <100.00%> (+0.11%)`	⬆️
plugins-google-cloud-pubsub	`45.68% <33.33%> (+0.25%)`	⬆️
plugins-grpc	`41.01% <33.33%> (+0.10%)`	⬆️
plugins-handlebars	`27.05% <100.00%> (+0.11%)`	⬆️
plugins-hapi	`40.27% <33.33%> (+0.12%)`	⬆️
plugins-hono	`40.60% <33.33%> (+0.19%)`	⬆️
plugins-ioredis	`38.60% <33.33%> (+0.18%)`	⬆️
plugins-knex	`26.68% <100.00%> (+0.11%)`	⬆️
plugins-langgraph	`37.99% <33.33%> (-0.47%)`	⬇️
plugins-ldapjs	`24.55% <100.00%> (+0.11%)`	⬆️
plugins-light-my-request	`26.42% <100.00%> (+0.11%)`	⬆️
plugins-limitd-client	`32.61% <33.33%> (-0.01%)`	⬇️
plugins-lodash	`26.15% <100.00%> (+0.11%)`	⬆️
plugins-mariadb	`39.61% <33.33%> (+0.15%)`	⬆️
plugins-memcached	`38.34% <33.33%> (+0.20%)`	⬆️
plugins-microgateway-core	`39.41% <33.33%> (+0.19%)`	⬆️
plugins-moleculer	`40.63% <33.33%> (+0.12%)`	⬆️
plugins-mongodb	`39.27% <33.33%> (+0.11%)`	⬆️
plugins-mongodb-core	`39.11% <33.33%> (+0.12%)`	⬆️
plugins-mongoose	`38.92% <33.33%> (+0.08%)`	⬆️
plugins-multer	`26.82% <100.00%> (+0.11%)`	⬆️
plugins-mysql	`39.45% <33.33%> (+0.29%)`	⬆️
plugins-mysql2	`39.40% <33.33%> (+0.15%)`	⬆️
plugins-node-serialize	`27.12% <100.00%> (+0.11%)`	⬆️
plugins-openai-agents	`34.96% <26.21%> (?)`
plugins-opensearch	`37.74% <33.33%> (+0.15%)`	⬆️
plugins-passport-http	`26.87% <100.00%> (+0.11%)`	⬆️
plugins-postgres	`35.54% <33.33%> (-0.03%)`	⬇️
plugins-process	`26.73% <ø> (ø)`
plugins-pug	`27.08% <100.00%> (+0.11%)`	⬆️
plugins-redis	`39.04% <33.33%> (+0.16%)`	⬆️
plugins-router	`43.22% <33.33%> (+0.12%)`	⬆️
plugins-sequelize	`25.66% <100.00%> (+0.11%)`	⬆️
plugins-test-and-upstream-amqp10	`38.61% <33.33%> (+0.12%)`	⬆️
plugins-test-and-upstream-amqplib	`44.36% <33.33%> (+0.50%)`	⬆️
plugins-test-and-upstream-apollo	`39.23% <33.33%> (+0.13%)`	⬆️
plugins-test-and-upstream-avsc	`38.69% <33.33%> (+0.07%)`	⬆️
plugins-test-and-upstream-bunyan	`33.94% <33.33%> (+0.06%)`	⬆️
plugins-test-and-upstream-connect	`40.93% <33.33%> (+0.12%)`	⬆️
plugins-test-and-upstream-graphql	`40.27% <33.33%> (+0.17%)`	⬆️
plugins-test-and-upstream-koa	`40.52% <33.33%> (+0.12%)`	⬆️
plugins-test-and-upstream-protobufjs	`38.92% <33.33%> (+0.07%)`	⬆️
plugins-test-and-upstream-rhea	`44.39% <33.33%> (+0.35%)`	⬆️
plugins-undici	`39.36% <33.33%> (+0.27%)`	⬆️
plugins-url	`26.73% <ø> (ø)`
plugins-valkey	`38.31% <33.33%> (+0.22%)`	⬆️
plugins-vm	`26.73% <ø> (ø)`
plugins-winston	`34.26% <33.33%> (+0.19%)`	⬆️
plugins-ws	`42.12% <33.33%> (+0.26%)`	⬆️
profiling-macos	`40.65% <33.33%> (+0.09%)`	⬆️
profiling-ubuntu	`40.77% <33.33%> (-0.32%)`	⬇️
profiling-windows	`42.29% <33.33%> (+0.45%)`	⬆️
serverless-azure-functions-client	`25.74% <100.00%> (+0.11%)`	⬆️
serverless-azure-functions-eventhubs	`25.74% <100.00%> (+0.11%)`	⬆️
serverless-azure-functions-servicebus	`25.74% <100.00%> (+0.11%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

pr-commenter · 2026-03-25T21:36:23Z

Benchmarks

Benchmark execution time: 2026-04-08 18:09:48

Comparing candidate commit 34073c2 in PR branch crysmags/openai-agents-test2 with baseline commit 2bac203 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 227 metrics, 33 unstable metrics.

crysmags · 2026-03-27T15:26:32Z

which can change on a whim leaving our instrumentation broken

Hummm... I'm not totally convinced by that tbh. By that I mean that tracing specific, internal, methods, in inherently more brittle than rely on an interface that should follow semver. It's not guaranteed, I agree, but has at least better odds.

Regardless of the approach chosen, it still seems to be that there quite some difference between the current Python integration and the proposed NodeJS one (only llm LLMObs span kinds are being generated). Is there a plan to align both integrations?

I took a look at what the internal traces from OpenAI Agents were covering to see what the span kinds actually represented. Then I compared them to what we are covering, and for tracing we had the same instrumentation points, however, for LLM Observability we were only creating plugins which covered span_kind:llm so I added the additional plugins to match tracing and this will give us the same level of coverage that you see in Python.

PROFeNoM · 2026-04-01T14:08:19Z

Hey 👋 I was testing the integration with a minimal demo app and ran into two issues. I spent quite some time debugging so I wanted to share what I found.

Setup

Simple app with a single agent, run against @openai/agents@^0.7.0, with LLMObs enabled and dd-trace loaded via --import dd-trace/initialize.mjs.

I tested three configurations:

Test	App file	package.json	Module system	dd-trace init
1	app.mjs	"type": "module"	ESM	--import dd-trace/initialize.mjs
2	app.js	(no type field)	CJS	--require dd-trace/init
3	app.js	"type": "module"	ESM	--import dd-trace/initialize.mjs

Tests 1 and 3 are equivalent (both ESM). Test 3 is the modern recommended approach (.js + "type": "module").

The app itself is minimal:

import { Agent, run } from '@openai/agents'

const agent = new Agent({
  name: 'Simple Agent',
  instructions: 'You are a helpful assistant.',
  model: 'gpt-4o'
})

const result = await run(agent, 'What is the capital of France?')

Traces from the test runs:

Python reference trace: 4 spans, full hierarchy
NodeJS ESM trace: only the vanilla openai LLM span
NodeJS CJS trace: 3 spans, missing agent span

Issue 1: ESM apps don't generate any openai-agents spans

In tests 1 and 3 (ESM), I only see the vanilla openai LLM span OpenAI.createResponse. No workflow, no agent, no tool spans. The openai-agents integration doesn't fire at all.

I added some logging to the rewriter to see what files it was processing:

[RW] checking: @openai/agents-core dist/run.mjs        -> no transformer
[RW] checking: @openai/agents-openai dist/openaiResponsesModel.mjs  -> no transformer
[RW] checking: @openai/agents-core dist/tool.mjs        -> no transformer
[RW] checking: @openai/agents-core dist/handoff.mjs     -> no transformer

The rewriter sees .mjs files, but the instrumentation config targets .js:

{
  module: {
    name: '@openai/agents-core',
    filePath: 'dist/run.js',       // <-- .js
  },
}

The matcher does strict equality:
filePath === file_path, so dist/run.mjs !== dist/run.js - no match, no transformation.

It appears the @openai/agents packages ship dual format. The exports field in @openai/agents-core maps "require" to .js and "import" to .mjs. Any ESM app using import triggers the .mjs path. Only CJS apps using require() would get .js.

Issue 2: CJS works, but missing agent span compared to Python

In test 2 (CJS), the rewriter matches .js files and the integration fires. But the trace structure differs from what Python produces for the same scenario (simple agent, "What is the capital of France?"):

Python (4 LLMObs spans):

Workflow "Agent workflow"
  -> Agent "Simple Agent"
       -> LLM "Simple Agent (LLM)"
            -> LLM "OpenAI.createResponse"

NodeJS CJS (3 LLMObs spans):

Workflow "openai-agents.run"
  -> LLM "openai-agents.getResponse"
       -> LLM "OpenAI.createResponse"

The JS integration is missing the agent span. In Python, the agent span sits between the workflow and the LLM call and carries the agent name ("Simple Agent"). In the JS integration, the workflow span directly parents the LLM span with no agent in between.

This also means in multi-agent handoff scenarios, there's no clear boundary between which agent is running; the handoff tool span fires but there's no parent agent span to anchor it to.

A few other differences I noticed on the CJS spans:

Span names use internal function names (openai-agents.run, openai-agents.getResponse) rather than user-facing names (Agent workflow, Simple Agent)
The openai-agents.getResponse LLM span is missing metadata that Python's equivalent "Simple Agent (LLM)" span includes (text, tool_choice, truncation)

Questions

Was the integration tested against ESM apps using the published npm package?
Wondering if maybe I'm doing something wrong (but then, any customer could do somethign wrong, so it doesn't really matter)
Is the missing agent span intentional? I saw that the rewriter targets run, getResponse, invokeFunctionTool, onInvokeHandoff, and guardrails...but there's no hook for agent invocation itself (Python wraps _run_single_turn for this). Curious if this is planned or if I'm missing something.

For issue 1, I see we're already handling this in the anthropic integration, which loops over both extensions:

const extensions = ['js', 'mjs']
for (const extension of extensions) {
  addHook({
    name: '@anthropic-ai/sdk',
    file: `resources/messages.${extension}`,
    versions: ['>=0.14.0 <0.33.0'],
  }, exports => { ... })
}

The rewriter instrumentation config would need something similar - registering both .js and .mjs variants for each target file. Not sure if the rewriter supports that pattern directly or if the matcher would need to be adjusted.

sabrenner

did a first pass - as @PROFeNoM referenced in his comment, we can try and guide the toolkit to follow the existing tracing results (producing the same spans) that the Python integration has.

sabrenner · 2026-04-06T19:54:47Z

packages/datadog-instrumentations/src/helpers/rewriter/instrumentations/openai-agents.js

+// TODO: Add agent-level LLMObs span (kind: 'agent') wrapping per-agent async execution.
+// Python achieves this via add_trace_processor(LLMObsTraceProcessor) which hooks
+// Span.start() / Span.end() on the SDK's internal Span class (dist/tracing/spans.js).
+// The equivalent here would be hooking Span.prototype.start / Span.prototype.end via
+// orchestrion. Requires team sign-off before implementation.


does this comment still apply? looks like we patch the run method below, this should allow us to capture the agent-level spans i think (but, correct me if i'm wrong, and i'll also run through this locally after giving a first review)

For the simple single-agent case, run() already has everything we need to emit an agent span — we know the agent name, input, and output, and the hook wraps the full execution. The gap is multi-agent handoff scenarios: run() only gives us the starting agent, so we can't derive per-agent execution boundaries mid-run without hooking something lower-level like prepareAgentArtifacts.

That said, onInvokeHandoff is already instrumented separately, so the combination of run() + onInvokeHandoff covers most handoff observability. The missing piece is strictly the parent relationship — having the agent span wrap its own LLM calls in a handoff chain.

For the simple case, would it make sense to emit an agent span from the existing run() hook and update the TODO to note the handoff limitation? Or are you looking for full Python parity on the parent hierarchy, which would require a new hook point?

sabrenner · 2026-04-06T19:59:33Z

packages/datadog-plugin-openai-agents/src/client.js

+    if (baseURL) {
+      const host = this.getHostFromBaseURL(baseURL)
+      if (host) {
+        tags['out.host'] = host


we typically haven't done stuff like this for the APM side of llm-type or agentic integrations, any reason we're including it here? maybe we're good to just tag model name and provider

sabrenner · 2026-04-06T20:00:38Z

packages/datadog-plugin-openai-agents/src/client.js

+    const tags = {
+      component: 'openai-agents',
+      'span.kind': 'client',
+      'ai.request.model_provider': 'openai',


i actually don't think we set any APM tags for the openai-agents package in the Python integration, we're probably good to just not set any tags

sabrenner · 2026-04-06T20:01:10Z

packages/datadog-plugin-openai-agents/src/client.js

+    const usage = result.usage
+    if (usage) {
+      if (usage.inputTokens !== undefined) {
+        span.setTag('openai.response.usage.prompt_tokens', usage.inputTokens)
+      }
+      if (usage.outputTokens !== undefined) {
+        span.setTag('openai.response.usage.completion_tokens', usage.outputTokens)
+      }
+      if (usage.totalTokens !== undefined) {
+        span.setTag('openai.response.usage.total_tokens', usage.totalTokens)
+      }


yeah, we don't need to add really any tags on the APM spans here

sabrenner · 2026-04-06T20:01:48Z

packages/datadog-plugin-openai-agents/src/client.js

+
+  getTags (ctx) {
+    const tags = super.getTags(ctx)
+    tags['openai.request.stream'] = 'true'


i think this is maybe the one tag we might wanna keep, but we could also just get rid of it too. all tagging/metadata can just be done on the LLMObs spans

sabrenner · 2026-04-07T14:59:49Z

packages/datadog-plugin-openai-agents/src/internal.js

+
+const TracingPlugin = require('../../dd-trace/src/plugins/tracing')
+
+class BaseOpenaiAgentsInternalPlugin extends TracingPlugin {


i think we should not give the static prefixes below in this base class, and instead let all implementers define them (for example, the RunPlugin below would define these fields, as the other implementing Plugins here do

sabrenner · 2026-04-07T15:16:57Z

packages/dd-trace/src/llmobs/plugins/openai-agents/index.js

+ * @param {string} baseURL - The base URL of the OpenAI client
+ * @returns {string} The model provider name
+ */
+function getModelProvider (baseURL) {


we just landed a change which updates this logic elsewhere: a7de9c0

i wonder if we can refactor both here and that instance into a shared getModelProviderFromOpenAIBaseUrl function, or something like that, so that any logic updates are shared,

sabrenner · 2026-04-07T15:17:25Z

packages/dd-trace/src/llmobs/plugins/openai-agents/index.js

i think for all of the inlined-functions here, we can move them to a util.js file in this folder

sabrenner · 2026-04-07T15:22:34Z

packages/dd-trace/test/llmobs/plugins/openai-agents/index.spec.js

+      if (savedAgentUrl !== undefined) process.env.DD_TRACE_AGENT_URL = savedAgentUrl
+      if (savedAgentPort !== undefined) process.env.DD_TRACE_AGENT_PORT = savedAgentPort
+    })
+


i believe this setup should not be needed, it works fine with all other llmobs tests without this change. are we able to remove these blocks?

sabrenner · 2026-04-07T15:23:31Z

packages/dd-trace/test/llmobs/plugins/openai-agents/index.spec.js

+      for (const key of Object.keys(require.cache)) {
+        if (key.includes('@openai/agents')) {
+          delete require.cache[key]
+        }
+      }


this approach also isn't needed for the langchain or langgraph suites which also use orchestrion, can we try getting rid of this and follow the same patterns we use in those test suites?

Instruments the OpenAI Agents SDK with Datadog APM tracing. Adds span coverage for agent runs, model calls (getResponse, getStreamedResponse), tool invocations, and handoffs with full semantic tag support. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Align LLMObs plugin and test directory names with the integration/plugin ID (openai-agents) rather than the npm sub-package name (openai-agents-core). Both test suites now run with the same PLUGINS=openai-agents value: tracing: PLUGINS=openai-agents yarn test:plugins llmobs: PLUGINS=openai-agents yarn test:llmobs:plugins Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Accidentally committed during workflow test run; not a source file. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Registers @openai/agents-core and @openai/agents-openai with their version ranges so yarn services correctly handles them and withVersions picks them up for the test matrix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The versions/package.json latests file is read-only by install_plugin_modules.js and does not need its deps resolved in the root yarn.lock. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…l span kinds

…ming - Rewriter: generate .mjs twins via flatMap so ESM apps get instrumented - LLM spans: name as `{modelName} (LLM)` instead of internal method name - Workflow spans: use run() options.workflowName, default 'Agent workflow' - Handoff spans: name as `transfer_to_{agentName}` (Python parity) - Metadata: map camelCase modelSettings to snake_case keys (top_p, max_tokens, etc.) - Metadata: include request.tools list - Metrics: capture reasoning_tokens from outputTokensDetails - Workflow: extract agent manifest into metadata._dd.agent_manifest - TODO: agent-level span (requires Span.start/end hook, needs team approval) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… them Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…plugins/index.js The facade package has no hooks or rewriter entries — only @openai/agents-core and @openai/agents-openai are actually instrumented. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ns/index.js @openai/agents-openai depends on @openai/agents-core, so the plugin is always registered when @openai/agents-core loads first. The second entry is a no-op. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…om hooks.js @openai/agents-openai depends on @openai/agents-core, so the instrumentation file is already loaded (and shimmers for both packages registered) before @openai/agents-openai ever loads. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Refactor APM plugins: switch from ClientPlugin/CompositePlugin object spread to TracingPlugin with individual static prefix/spanName per class; export as arrays - Remove all model/provider/host/usage tags from APM spans (LLMObs-only) - Extract LLMObs helpers into utils.js (getModelProvider, extractAgentManifest, extractInputMessages, extractOutputMessages, etc.) for testability - Fix getModelProvider to fall back to 'unknown' instead of empty string - Fix TypeScript definition comment for openai-agents integration - Restore accidentally-dropped supported-configurations.json entries - Add DD_TRACE_OPENAI_AGENTS_ENABLED to supported-configurations.json - Fix test-setup.js: use versioned absolute paths for @openai/agents-openai and openai resolution; fix module.Agent → clientModule.Agent references - Fix LLMObs spec: use withVersions() wrapper, fix openai require path, add metadata: MOCK_NOT_NULLISH assertions for run() workflow spans Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rovider utility - Extract getOpenAIModelProvider() into a shared plugins/utils.js, used by both the openai and openai-agents LLMObs plugins (eliminates duplicate logic and incorporates the 'unknown' fallback for unrecognised base URLs) - Convert index.js plugin registration to object-keyed accumulation pattern, consistent with the langgraph plugin - Add unique static id to each tracing plugin subclass (required for the object-keyed pattern) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…onse Implement full streaming support using AsyncIterator orchestrion pattern: - Switch getStreamedResponse instrumentation to kind: 'AsyncIterator' - Add GetStreamedResponseNextPlugin (APM) to keep span open until iterator exhaustion, fixing premature span close via traceSync end() side-effect - Add GetStreamedResponseNextLLMObsPlugin (LLMObs) to accumulate response_done event and tag span with full I/O, metrics, and metadata once the stream completes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…n in llmobs workflow Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…test action Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…date Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

crysmags changed the title ~~Crysmags/OpenAI agents test2~~ feat(llmobs): add support for OpenAI Agents Mar 17, 2026

crysmags force-pushed the crysmags/openai-agents-test2 branch from ab4ce04 to 7e401ba Compare March 17, 2026 16:31

wconti27 reviewed Mar 18, 2026

View reviewed changes

.github/workflows/apm-integrations.yml Outdated Show resolved Hide resolved

wconti27 reviewed Mar 18, 2026

View reviewed changes

index.d.ts Show resolved Hide resolved

wconti27 reviewed Mar 18, 2026

View reviewed changes

packages/datadog-plugin-openai-agents/test/test-setup.js Outdated Show resolved Hide resolved

wconti27 reviewed Mar 18, 2026

View reviewed changes

packages/datadog-plugin-openai-agents/test/test-setup.js Outdated Show resolved Hide resolved

wconti27 reviewed Mar 18, 2026

View reviewed changes

packages/dd-trace/test/llmobs/plugins/openai-agents/index.spec.js Outdated Show resolved Hide resolved

crysmags added the semver-minor label Mar 25, 2026

crysmags marked this pull request as ready for review March 25, 2026 23:05

crysmags requested review from a team as code owners March 25, 2026 23:05

crysmags requested review from ida613 and removed request for a team March 25, 2026 23:05

sabrenner reviewed Apr 7, 2026

View reviewed changes

crysmags and others added 23 commits April 8, 2026 13:29

workflow(openai-agents): tracing_precheck

2a7f719

workflow(openai-agents): build_integration

8adf865

workflow(openai-agents): write_tests

a509bdb

workflow(openai-agents): llm_obs_test:att1:iter1:llm_obs_fixer

4147d17

workflow(openai-agents): llm_obs_test:att1:iter3:llm_obs_fixer

582de04

chore: remove xunit.xml test artifact

6f0b026

Accidentally committed during workflow test run; not a source file. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: revert yarn.lock changes

37967a8

The versions/package.json latests file is read-only by install_plugin_modules.js and does not need its deps resolved in the root yarn.lock. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

adding cassettes to tracing and llmobs tests

a0c1228

removing openai as a dependency

a819455

linting

d307721

adding additional llmobs plugins to match tracing and cover additiona…

6bbdbb7

…l span kinds

fix(openai-agents): list ESM entries explicitly instead of generating…

3858383

… them Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(openai-agents): remove spurious @openai/agents facade entry from …

595555c

…plugins/index.js The facade package has no hooks or rewriter entries — only @openai/agents-core and @openai/agents-openai are actually instrumented. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: revert accidental eslint.config.mjs change

085f0a8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

crysmags force-pushed the crysmags/openai-agents-test2 branch from 934e578 to 45b42c3 Compare April 8, 2026 17:32

crysmags and others added 4 commits April 8, 2026 13:34

fix(openai-agents): add id-token permission and update checkout actio…

34b8a89

…n in llmobs workflow Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(openai-agents): remove unsupported dd_api_key input from plugins/…

5efacd0

…test action Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: sync yarn.lock with master after rebase

8827b1b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: regenerate config types after supported-configurations.json up…

34073c2

…date Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

wconti27 added the apm-integration-toolkit PR Generated by APM AI Integration Toolkit label Apr 9, 2026


		const TracingPlugin = require('../../dd-trace/src/plugins/tracing')

		class BaseOpenaiAgentsInternalPlugin extends TracingPlugin {

Conversation

crysmags commented Mar 17, 2026

What does this PR do?

Motivation

Additional Notes

Uh oh!

github-actions bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overall package size

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

PROFeNoM commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wconti27 commented Mar 20, 2026

Uh oh!

PROFeNoM commented Mar 23, 2026

Uh oh!

datadog-datadog-prod-us1-2 bot commented Mar 25, 2026 • edited by datadog-prod-us1-4 bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pr-commenter bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Uh oh!

crysmags commented Mar 27, 2026

Uh oh!

PROFeNoM commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Setup

Issue 1: ESM apps don't generate any openai-agents spans

Issue 2: CJS works, but missing agent span compared to Python

Questions

Uh oh!

sabrenner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

github-actions bot commented Mar 17, 2026 •

edited

Loading

PROFeNoM commented Mar 20, 2026 •

edited

Loading

datadog-datadog-prod-us1-2 bot commented Mar 25, 2026 •

edited by datadog-prod-us1-4 bot

Loading

codecov bot commented Mar 25, 2026 •

edited

Loading

pr-commenter bot commented Mar 25, 2026 •

edited

Loading

PROFeNoM commented Apr 1, 2026 •

edited

Loading