feat(llmobs): add support for OpenAI Agents#7808
Conversation
ab4ce04 to
7e401ba
Compare
Overall package sizeSelf size: 5.48 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 3.0.1 | 82.56 kB | 817.39 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
| const usage = result.usage | ||
| if (usage) { | ||
| if (usage.inputTokens !== undefined) { | ||
| span.setTag('openai.response.usage.prompt_tokens', usage.inputTokens) | ||
| } | ||
| if (usage.outputTokens !== undefined) { | ||
| span.setTag('openai.response.usage.completion_tokens', usage.outputTokens) | ||
| } | ||
| if (usage.totalTokens !== undefined) { | ||
| span.setTag('openai.response.usage.total_tokens', usage.totalTokens) | ||
| } |
There was a problem hiding this comment.
AFAIK we only tag token usage on the actual llm event spans, and not the apm tracing spans
There was a problem hiding this comment.
yeah, we don't need to add really any tags on the APM spans here
| service: ANY_STRING, | ||
| resource: ANY_STRING, |
There was a problem hiding this comment.
we should assert the actual values here.
There was a problem hiding this comment.
as well as the rest of the file for values that are constant
| "default": null | ||
| } | ||
| ], | ||
| "DD_TRACE_OPENAI_AGENTS_ENABLED": ["A"], |
There was a problem hiding this comment.
seems like our code generator is generating the wrong format for this, should match the below structure.
| /** | ||
| * For streaming, the span finishes before stream iteration begins. | ||
| * Output data is not available, so we only tag inputs and metadata. | ||
| * | ||
| * @param {{ currentStore?: { span: object }, arguments?: Array<*> }} ctx | ||
| */ | ||
| setLLMObsTags (ctx) { | ||
| const span = ctx.currentStore?.span | ||
| if (!span) return | ||
|
|
||
| const request = ctx.arguments?.[0] | ||
| const inputMessages = extractInputMessages(request) | ||
|
|
||
| // Streaming spans finish before iteration; output is not available | ||
| this._tagger.tagLLMIO(span, inputMessages, [{ content: '', role: '' }]) | ||
|
|
||
| const metadata = extractMetadata(request) | ||
| metadata.stream = true | ||
| this._tagger.tagMetadata(span, metadata) | ||
| } |
There was a problem hiding this comment.
it seems to have skipped tagging streaming output, which we should collect by wrapping the returned async iterator.
| if (baseURL.includes('azure')) return 'azure_openai' | ||
| if (baseURL.includes('deepseek')) return 'deepseek' | ||
| return 'openai' |
There was a problem hiding this comment.
can we parse the url instead of hardcoding?
| for (const item of input) { | ||
| if (item.type === 'message') { | ||
| const role = item.role | ||
| if (!role) continue | ||
|
|
||
| let content = '' | ||
| if (Array.isArray(item.content)) { | ||
| const textParts = item.content | ||
| .filter(c => c.type === 'input_text' || c.type === 'text') | ||
| .map(c => c.text) | ||
| content = textParts.join('') | ||
| } else if (typeof item.content === 'string') { | ||
| content = item.content | ||
| } | ||
|
|
||
| if (content) { | ||
| messages.push({ role, content }) | ||
| } | ||
| } else if (item.type === 'function_call') { | ||
| let args = item.arguments | ||
| if (typeof args === 'string') { | ||
| try { | ||
| args = JSON.parse(args) | ||
| } catch { | ||
| args = {} | ||
| } | ||
| } |
There was a problem hiding this comment.
can we break this function into some helpers? Would help to improve readability.
packages/dd-trace/test/llmobs/plugins/openai-agents/index.spec.js
Outdated
Show resolved
Hide resolved
|
At first glance, looking at packages/dd-trace/test/llmobs/plugins/openai-agents/index.spec.js, I can see that only More importantly, The Python integration achieves this by hooking into the SDK's TracingProcessor interface via add_trace_processor(), which gives it full semantic span tree. The JS SDK has the same infrastructure: addTraceProcessor() and a TracingProcessor interface with |
We actually had a few meetings of how to handle these cases, one with @sabrenner , another with out larger IDM team, and another with the node core engineers. Basically we decided to utilize these types of trace processors when the processor is OTel compatible. Given that the tracing provided by the package is not OTel, and is some form of internal tracing, which can change on a whim leaving our instrumentation broken, we decided to not go that route for these types of cases. |
Hummm... I'm not totally convinced by that tbh. By that I mean that tracing specific, internal, methods, in inherently more brittle than rely on an interface that should follow semver. It's not guaranteed, I agree, but has at least better odds. Regardless of the approach chosen, it still seems to be that there quite some difference between the current Python integration and the proposed NodeJS one (only |
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: 34073c2 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7808 +/- ##
==========================================
- Coverage 80.45% 73.90% -6.56%
==========================================
Files 748 776 +28
Lines 32411 36335 +3924
==========================================
+ Hits 26076 26853 +777
- Misses 6335 9482 +3147
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
BenchmarksBenchmark execution time: 2026-04-08 18:09:48 Comparing candidate commit 34073c2 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 227 metrics, 33 unstable metrics. |
I took a look at what the internal traces from OpenAI Agents were covering to see what the span kinds actually represented. Then I compared them to what we are covering, and for tracing we had the same instrumentation points, however, for LLM Observability we were only creating plugins which covered |
|
Hey 👋 I was testing the integration with a minimal demo app and ran into two issues. I spent quite some time debugging so I wanted to share what I found. SetupSimple app with a single agent, run against I tested three configurations:
Tests 1 and 3 are equivalent (both ESM). Test 3 is the modern recommended approach (.js + "type": "module"). The app itself is minimal: import { Agent, run } from '@openai/agents'
const agent = new Agent({
name: 'Simple Agent',
instructions: 'You are a helpful assistant.',
model: 'gpt-4o'
})
const result = await run(agent, 'What is the capital of France?')Traces from the test runs:
Issue 1: ESM apps don't generate any openai-agents spansIn tests 1 and 3 (ESM), I only see the vanilla I added some logging to the rewriter to see what files it was processing: The rewriter sees {
module: {
name: '@openai/agents-core',
filePath: 'dist/run.js', // <-- .js
},
}The matcher does strict equality: It appears the Issue 2: CJS works, but missing agent span compared to PythonIn test 2 (CJS), the rewriter matches Python (4 LLMObs spans): NodeJS CJS (3 LLMObs spans): The JS integration is missing the agent span. In Python, the agent span sits between the workflow and the LLM call and carries the agent name ("Simple Agent"). In the JS integration, the workflow span directly parents the LLM span with no agent in between. This also means in multi-agent handoff scenarios, there's no clear boundary between which agent is running; the handoff tool span fires but there's no parent agent span to anchor it to. A few other differences I noticed on the CJS spans:
Questions
For issue 1, I see we're already handling this in the anthropic integration, which loops over both extensions: const extensions = ['js', 'mjs']
for (const extension of extensions) {
addHook({
name: '@anthropic-ai/sdk',
file: `resources/messages.${extension}`,
versions: ['>=0.14.0 <0.33.0'],
}, exports => { ... })
}The rewriter instrumentation config would need something similar - registering both |
| // TODO: Add agent-level LLMObs span (kind: 'agent') wrapping per-agent async execution. | ||
| // Python achieves this via add_trace_processor(LLMObsTraceProcessor) which hooks | ||
| // Span.start() / Span.end() on the SDK's internal Span class (dist/tracing/spans.js). | ||
| // The equivalent here would be hooking Span.prototype.start / Span.prototype.end via | ||
| // orchestrion. Requires team sign-off before implementation. |
There was a problem hiding this comment.
does this comment still apply? looks like we patch the run method below, this should allow us to capture the agent-level spans i think (but, correct me if i'm wrong, and i'll also run through this locally after giving a first review)
There was a problem hiding this comment.
For the simple single-agent case, run() already has everything we need to emit an agent span — we know the agent name, input, and output, and the hook wraps the full execution. The gap is multi-agent handoff scenarios: run() only gives us the starting agent, so we can't derive per-agent execution boundaries mid-run without hooking something lower-level like prepareAgentArtifacts.
That said, onInvokeHandoff is already instrumented separately, so the combination of run() + onInvokeHandoff covers most handoff observability. The missing piece is strictly the parent relationship — having the agent span wrap its own LLM calls in a handoff chain.
For the simple case, would it make sense to emit an agent span from the existing run() hook and update the TODO to note the handoff limitation? Or are you looking for full Python parity on the parent hierarchy, which would require a new hook point?
| if (baseURL) { | ||
| const host = this.getHostFromBaseURL(baseURL) | ||
| if (host) { | ||
| tags['out.host'] = host |
There was a problem hiding this comment.
we typically haven't done stuff like this for the APM side of llm-type or agentic integrations, any reason we're including it here? maybe we're good to just tag model name and provider
| const tags = { | ||
| component: 'openai-agents', | ||
| 'span.kind': 'client', | ||
| 'ai.request.model_provider': 'openai', |
There was a problem hiding this comment.
i actually don't think we set any APM tags for the openai-agents package in the Python integration, we're probably good to just not set any tags
| const usage = result.usage | ||
| if (usage) { | ||
| if (usage.inputTokens !== undefined) { | ||
| span.setTag('openai.response.usage.prompt_tokens', usage.inputTokens) | ||
| } | ||
| if (usage.outputTokens !== undefined) { | ||
| span.setTag('openai.response.usage.completion_tokens', usage.outputTokens) | ||
| } | ||
| if (usage.totalTokens !== undefined) { | ||
| span.setTag('openai.response.usage.total_tokens', usage.totalTokens) | ||
| } |
There was a problem hiding this comment.
yeah, we don't need to add really any tags on the APM spans here
|
|
||
| getTags (ctx) { | ||
| const tags = super.getTags(ctx) | ||
| tags['openai.request.stream'] = 'true' |
There was a problem hiding this comment.
i think this is maybe the one tag we might wanna keep, but we could also just get rid of it too. all tagging/metadata can just be done on the LLMObs spans
|
|
||
| const TracingPlugin = require('../../dd-trace/src/plugins/tracing') | ||
|
|
||
| class BaseOpenaiAgentsInternalPlugin extends TracingPlugin { |
There was a problem hiding this comment.
i think we should not give the static prefixes below in this base class, and instead let all implementers define them (for example, the RunPlugin below would define these fields, as the other implementing Plugins here do
| * @param {string} baseURL - The base URL of the OpenAI client | ||
| * @returns {string} The model provider name | ||
| */ | ||
| function getModelProvider (baseURL) { |
There was a problem hiding this comment.
we just landed a change which updates this logic elsewhere: a7de9c0
i wonder if we can refactor both here and that instance into a shared getModelProviderFromOpenAIBaseUrl function, or something like that, so that any logic updates are shared,
There was a problem hiding this comment.
i think for all of the inlined-functions here, we can move them to a util.js file in this folder
| if (savedAgentUrl !== undefined) process.env.DD_TRACE_AGENT_URL = savedAgentUrl | ||
| if (savedAgentPort !== undefined) process.env.DD_TRACE_AGENT_PORT = savedAgentPort | ||
| }) | ||
|
|
There was a problem hiding this comment.
i believe this setup should not be needed, it works fine with all other llmobs tests without this change. are we able to remove these blocks?
| for (const key of Object.keys(require.cache)) { | ||
| if (key.includes('@openai/agents')) { | ||
| delete require.cache[key] | ||
| } | ||
| } |
There was a problem hiding this comment.
this approach also isn't needed for the langchain or langgraph suites which also use orchestrion, can we try getting rid of this and follow the same patterns we use in those test suites?
Instruments the OpenAI Agents SDK with Datadog APM tracing. Adds span coverage for agent runs, model calls (getResponse, getStreamedResponse), tool invocations, and handoffs with full semantic tag support. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Align LLMObs plugin and test directory names with the integration/plugin ID (openai-agents) rather than the npm sub-package name (openai-agents-core). Both test suites now run with the same PLUGINS=openai-agents value: tracing: PLUGINS=openai-agents yarn test:plugins llmobs: PLUGINS=openai-agents yarn test:llmobs:plugins Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Accidentally committed during workflow test run; not a source file. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Registers @openai/agents-core and @openai/agents-openai with their version ranges so yarn services correctly handles them and withVersions picks them up for the test matrix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The versions/package.json latests file is read-only by install_plugin_modules.js and does not need its deps resolved in the root yarn.lock. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ming
- Rewriter: generate .mjs twins via flatMap so ESM apps get instrumented
- LLM spans: name as `{modelName} (LLM)` instead of internal method name
- Workflow spans: use run() options.workflowName, default 'Agent workflow'
- Handoff spans: name as `transfer_to_{agentName}` (Python parity)
- Metadata: map camelCase modelSettings to snake_case keys (top_p, max_tokens, etc.)
- Metadata: include request.tools list
- Metrics: capture reasoning_tokens from outputTokensDetails
- Workflow: extract agent manifest into metadata._dd.agent_manifest
- TODO: agent-level span (requires Span.start/end hook, needs team approval)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… them Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…plugins/index.js The facade package has no hooks or rewriter entries — only @openai/agents-core and @openai/agents-openai are actually instrumented. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ns/index.js @openai/agents-openai depends on @openai/agents-core, so the plugin is always registered when @openai/agents-core loads first. The second entry is a no-op. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…om hooks.js @openai/agents-openai depends on @openai/agents-core, so the instrumentation file is already loaded (and shimmers for both packages registered) before @openai/agents-openai ever loads. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Refactor APM plugins: switch from ClientPlugin/CompositePlugin object spread to TracingPlugin with individual static prefix/spanName per class; export as arrays - Remove all model/provider/host/usage tags from APM spans (LLMObs-only) - Extract LLMObs helpers into utils.js (getModelProvider, extractAgentManifest, extractInputMessages, extractOutputMessages, etc.) for testability - Fix getModelProvider to fall back to 'unknown' instead of empty string - Fix TypeScript definition comment for openai-agents integration - Restore accidentally-dropped supported-configurations.json entries - Add DD_TRACE_OPENAI_AGENTS_ENABLED to supported-configurations.json - Fix test-setup.js: use versioned absolute paths for @openai/agents-openai and openai resolution; fix module.Agent → clientModule.Agent references - Fix LLMObs spec: use withVersions() wrapper, fix openai require path, add metadata: MOCK_NOT_NULLISH assertions for run() workflow spans Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rovider utility - Extract getOpenAIModelProvider() into a shared plugins/utils.js, used by both the openai and openai-agents LLMObs plugins (eliminates duplicate logic and incorporates the 'unknown' fallback for unrecognised base URLs) - Convert index.js plugin registration to object-keyed accumulation pattern, consistent with the langgraph plugin - Add unique static id to each tracing plugin subclass (required for the object-keyed pattern) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…onse Implement full streaming support using AsyncIterator orchestrion pattern: - Switch getStreamedResponse instrumentation to kind: 'AsyncIterator' - Add GetStreamedResponseNextPlugin (APM) to keep span open until iterator exhaustion, fixing premature span close via traceSync end() side-effect - Add GetStreamedResponseNextLLMObsPlugin (LLMObs) to accumulate response_done event and tag span with full I/O, metrics, and metadata once the stream completes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
934e578 to
45b42c3
Compare
…n in llmobs workflow Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…test action Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…date Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

What does this PR do?
Adds APM tracing and LLMObs instrumentation for the OpenAI Agents SDK (@openai/agents).
Tracing — instruments the following operations via Orchestrion (rewriter-based):
LLMObs — adds an LLMObs plugin (packages/dd-trace/src/llmobs/plugins/openai-agents/) that enriches spans with LLM observability tags (model, provider, token usage, input/output).
Motivation
The OpenAI Agents SDK is a first-party framework from OpenAI for building multi-agent systems in Node.js. It reached a stable API in >=0.7.0. Instrumenting it gives Datadog users distributed tracing and LLM observability for agent workflows without any code changes.
Additional Notes