feat(graphql): migrate shimmer to orchestrion instrumentation#7757
feat(graphql): migrate shimmer to orchestrion instrumentation#7757
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #7757 +/- ##
==========================================
+ Coverage 74.25% 74.27% +0.01%
==========================================
Files 765 767 +2
Lines 35786 35816 +30
==========================================
+ Hits 26574 26602 +28
- Misses 9212 9214 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Overall package sizeSelf size: 5.46 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 3.0.0 | 81.15 kB | 815.98 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: 297e8ac | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
BenchmarksBenchmark execution time: 2026-04-03 22:07:08 Comparing candidate commit 297e8ac in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 229 metrics, 31 unstable metrics. |
Replaces shimmer-based graphql instrumentation with orchestrion's tracingChannel approach. All 219 tests passing. - graphql.js: switch from addHook/shimmer to orchestrion channels - rewriter/instrumentations/graphql.js: orchestrion config for execute, parse, validate, resolveField across graphql versions - execute.js: use tracePromise with asyncEnd lifecycle - parse.js: use traceSync with document/source caching via WeakMap - validate.js: use traceSync with error tag on validation failures - resolve.js: use bindStart/end with field info from exeContext - state.js: shared WeakMap state for cross-plugin span coordination Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cking
- Add `hooks.resolve` user callback for graphql.resolve spans, called with
the span and a FieldContext object (fieldName, path, error, result)
- Add `FieldContext` TypeScript interface and `resolve?` to hooks types
- Fix latent AppSec blocking bug: align channel name (startGraphqlResolver)
and publish payload ({ abortController }) with handler expectations;
previously `context?.abortController?.abort()` was always a no-op
- Add test coverage for hooks.resolve
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
827acf3 to
6c4c132
Compare
The IAST plugin subscribes to 'apm:graphql:resolve:start' to taint
resolver args when they originate from a tainted query source (e.g.
hardcoded query literals like books(title: "ls")). The orchestrion
resolve.js was only publishing to the AppSec channel, so IAST never
received field args for hardcoded arguments — only variable args were
tainted via HTTP request body tracking.
Publish { rootCtx, args, info, path, pathString } to the IAST channel
from bindStart to restore parity with the shimmer instrumentation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The orchestrion migration published a reconstructed resolverArgs object to 'apm:graphql:resolve:start'. The IAST subscriber calls taintObject() on the published args, which replaces string properties with new tainted string objects. Since the reconstructed args are a different object from what graphql passes to the actual resolver, the taint never reached the resolver body — so hardcoded literal arguments (e.g. books(title: "ls")) were never detected as COMMAND_INJECTION sources. Variable arguments were unaffected because they flow through variableValues, which is tainted during HTTP body parsing before graphql execution begins. Fix: when the IAST channel has subscribers, temporarily replace fieldDef.resolve with a thin wrapper that fires the channel with the actual args object graphql constructs via getArgumentValues. The wrapper restores fieldDef.resolve synchronously as its first action — safe because there is no async gap between bindStart and the resolver call inside executeField. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
28daea8 to
297e8ac
Compare
What does this PR do?
Replaces the shimmer/hook-wrapping approach in the graphql plugin with Orchestrion's
tracingChannel-based instrumentation. Instead of dynamically wrapping individual field resolvers at schema execution time, the rewriter instruments the internalexecute,executeField/resolveField,parse, andvalidatefunctions directly — giving a single stable intercept point across all graphql versions.Additionally includes improvements ported from PR #7604 (graphql plugin refactor), a fix for a latent AppSec blocking bug, and a fix for an IAST taint tracking regression introduced by the orchestrion migration.
Orchestrion migration
How the old shimmer approach worked
graphql.jsinstrumentation usedshimmer.wrapto patchgraphql.execute, then walked the schema's type map at execution time to wrap every field resolver individuallyexecute()call as schemas can be dynamicfieldcontext object assembled by the shimmer layer, withinfo,rootCtx,args, andparentFieldalready extractedHow the orchestrion approach works
rewriter/instrumentations/graphql.js: Orchestrion config that instrumentsexecute,executeField/resolveField,parse, andvalidateat the module level for bothgraphqland@graphql-tools/executorgraphql.js(instrumentation): Stripped down toaddHookstubs that trigger module loading, plus hooks that cacheprinter,visitor, andutilitiesonddGlobalfor the signature toolsexecute.js: Subscribes totracing:orchestrion:graphql:apm:graphql:execute. Handles both sync and asyncexecute()returns (Promise vs plain result). Also subscribes to the@graphql-tools/executorchannel so graphql-yoga produces spansresolve.js: Subscribes totracing:orchestrion:graphql:apm:graphql:resolve(maps toexecuteField/resolveField). Reconstructs aninfo-like object from rawexecuteFieldarguments (exeContext,parentType,fieldNodes,path). Uses aWeakMapkeyed onexeContextto track per-execution state (parent spans, source text, collapsed paths)parse.js: Subscribes totracing:orchestrion:graphql:apm:graphql:parse. Caches document→source mappings so execute spans can showgraphql.sourcevalidate.js: Subscribes totracing:orchestrion:graphql:apm:graphql:validatestate.js(new): Shared WeakMap state for cross-plugin span coordination — defers resolve span finishing until the parent execute span finishes, so all spans flush together in one trace payload instead of triggering separate partial flushesImprovements from PR #7604
PR #7604 is an open refactor of the shimmer-based graphql plugin. Its commits cannot be cherry-picked cleanly (entirely different architecture), but two specific features were ported by hand:
hooks.resolveuser callbackPR #7604 added a
resolvehook to match the existingexecute,parse, andvalidatehooks. This lets users attach custom logic to each resolver span:Files changed:
packages/datadog-plugin-graphql/src/index.js— addedresolve: hooks?.resolve ?? nooptogetHooks()packages/datadog-plugin-graphql/src/resolve.js— callsthis.config.hooks.resolve(span, { fieldName, path, error, result })inend()before deferring span finishFieldContextTypeScript interfaceAdded the
FieldContextinterface toindex.d.tsdescribing the second argument passed tohooks.resolve:AppSec blocking bug fix
Both this branch and PR #7604 had a latent bug where GraphQL blocking never actually aborted a resolve. The
datadog:graphql:resolver:startchannel was published with one key but the AppSec handler destructured a different key:{ ctx: rootCtx, resolverInfo }{ context, resolverInfo }context?.abortController?.abort()→ always no-op{ abortController: rootCtx.abortController }{ context, resolverInfo }context?.abortController?.abort()→ always no-op{ abortController, resolverInfo }{ abortController, resolverInfo }abortController?.abort()→ worksFiles changed:
packages/dd-trace/src/appsec/channels.js— renamed exportstartGraphqlResolve→startGraphqlResolver(cosmetic alignment)packages/dd-trace/src/appsec/graphql.js— handler now destructures{ abortController }and callsabortController?.abort()directlypackages/datadog-plugin-graphql/src/resolve.js— publish now sends{ abortController: new AbortController(), resolverInfo }packages/dd-trace/test/appsec/graphql.spec.js— updated all test payloads to match new publish shapeIAST taint tracking fix
The orchestrion migration introduced a regression in IAST detection for GraphQL queries with hardcoded literal arguments (e.g.
books(title: "ls")).Root cause
In the old shimmer approach, each field resolver was wrapped individually. The resolver wrapper had access to the actual
argsobject graphql passes to the resolver, so when it publishedapm:graphql:resolve:start, the IAST subscriber'staintObject()call mutated the same object the resolver would use — taint propagated correctly.The orchestrion approach wraps
executeFieldinstead. An initial fix published a reconstructedresolverArgsobject built from AST nodes.taintObject()replaces string properties with new tainted string objects and assigns them back to the reconstructed object — but graphql'sgetArgumentValues()creates its own separate args object, so the taint never reached the resolver body.Variable arguments were unaffected because they flow through
variableValues, which is already tainted during HTTP body parsing before graphql execution.Fix
When
apm:graphql:resolve:starthas subscribers,bindStarttemporarily replacesfieldDef.resolvewith a wrapper that fires the IAST channel with the actual args object graphql constructs internally. The wrapper restoresfieldDef.resolvesynchronously as its first action — safe because there is no async gap betweenbindStartand the resolver call insideexecuteField.Test results