Skip to content

feat(graphql): migrate shimmer to orchestrion instrumentation#7757

Open
crysmags wants to merge 5 commits intomasterfrom
crysmags/graphql-migration
Open

feat(graphql): migrate shimmer to orchestrion instrumentation#7757
crysmags wants to merge 5 commits intomasterfrom
crysmags/graphql-migration

Conversation

@crysmags
Copy link
Copy Markdown
Collaborator

@crysmags crysmags commented Mar 12, 2026

What does this PR do?

Replaces the shimmer/hook-wrapping approach in the graphql plugin with Orchestrion's tracingChannel-based instrumentation. Instead of dynamically wrapping individual field resolvers at schema execution time, the rewriter instruments the internal execute, executeField/resolveField, parse, and validate functions directly — giving a single stable intercept point across all graphql versions.

Additionally includes improvements ported from PR #7604 (graphql plugin refactor), a fix for a latent AppSec blocking bug, and a fix for an IAST taint tracking regression introduced by the orchestrion migration.


Orchestrion migration

How the old shimmer approach worked

  • graphql.js instrumentation used shimmer.wrap to patch graphql.execute, then walked the schema's type map at execution time to wrap every field resolver individually
  • Required re-wrapping on each execute() call as schemas can be dynamic
  • The resolve plugin received a pre-built field context object assembled by the shimmer layer, with info, rootCtx, args, and parentField already extracted

How the orchestrion approach works

  • rewriter/instrumentations/graphql.js: Orchestrion config that instruments execute, executeField/resolveField, parse, and validate at the module level for both graphql and @graphql-tools/executor
  • graphql.js (instrumentation): Stripped down to addHook stubs that trigger module loading, plus hooks that cache printer, visitor, and utilities on ddGlobal for the signature tools
  • execute.js: Subscribes to tracing:orchestrion:graphql:apm:graphql:execute. Handles both sync and async execute() returns (Promise vs plain result). Also subscribes to the @graphql-tools/executor channel so graphql-yoga produces spans
  • resolve.js: Subscribes to tracing:orchestrion:graphql:apm:graphql:resolve (maps to executeField/resolveField). Reconstructs an info-like object from raw executeField arguments (exeContext, parentType, fieldNodes, path). Uses a WeakMap keyed on exeContext to track per-execution state (parent spans, source text, collapsed paths)
  • parse.js: Subscribes to tracing:orchestrion:graphql:apm:graphql:parse. Caches document→source mappings so execute spans can show graphql.source
  • validate.js: Subscribes to tracing:orchestrion:graphql:apm:graphql:validate
  • state.js (new): Shared WeakMap state for cross-plugin span coordination — defers resolve span finishing until the parent execute span finishes, so all spans flush together in one trace payload instead of triggering separate partial flushes

Improvements from PR #7604

PR #7604 is an open refactor of the shimmer-based graphql plugin. Its commits cannot be cherry-picked cleanly (entirely different architecture), but two specific features were ported by hand:

hooks.resolve user callback

PR #7604 added a resolve hook to match the existing execute, parse, and validate hooks. This lets users attach custom logic to each resolver span:

tracer.use('graphql', {
  hooks: {
    resolve (span, field) {
      span.setTag('app.field', field.fieldName)
    }
  }
})

Files changed:

  • packages/datadog-plugin-graphql/src/index.js — added resolve: hooks?.resolve ?? noop to getHooks()
  • packages/datadog-plugin-graphql/src/resolve.js — calls this.config.hooks.resolve(span, { fieldName, path, error, result }) in end() before deferring span finish

FieldContext TypeScript interface

Added the FieldContext interface to index.d.ts describing the second argument passed to hooks.resolve:

interface FieldContext {
  fieldName: string;   // e.g. 'hello'
  path: string;        // dot-separated path, e.g. 'user.address.city'
  error: Error | null; // resolver error, or null on success
  result: unknown;     // sync resolver return value; undefined for async
}

AppSec blocking bug fix

Both this branch and PR #7604 had a latent bug where GraphQL blocking never actually aborted a resolve. The datadog:graphql:resolver:start channel was published with one key but the AppSec handler destructured a different key:

Published payload AppSec handler destructures Abort call
Before (this branch) { ctx: rootCtx, resolverInfo } { context, resolverInfo } context?.abortController?.abort() → always no-op
Before (PR #7604) { abortController: rootCtx.abortController } { context, resolverInfo } context?.abortController?.abort() → always no-op
After (this PR) { abortController, resolverInfo } { abortController, resolverInfo } abortController?.abort() → works

Files changed:

  • packages/dd-trace/src/appsec/channels.js — renamed export startGraphqlResolvestartGraphqlResolver (cosmetic alignment)
  • packages/dd-trace/src/appsec/graphql.js — handler now destructures { abortController } and calls abortController?.abort() directly
  • packages/datadog-plugin-graphql/src/resolve.js — publish now sends { abortController: new AbortController(), resolverInfo }
  • packages/dd-trace/test/appsec/graphql.spec.js — updated all test payloads to match new publish shape

IAST taint tracking fix

The orchestrion migration introduced a regression in IAST detection for GraphQL queries with hardcoded literal arguments (e.g. books(title: "ls")).

Root cause

In the old shimmer approach, each field resolver was wrapped individually. The resolver wrapper had access to the actual args object graphql passes to the resolver, so when it published apm:graphql:resolve:start, the IAST subscriber's taintObject() call mutated the same object the resolver would use — taint propagated correctly.

The orchestrion approach wraps executeField instead. An initial fix published a reconstructed resolverArgs object built from AST nodes. taintObject() replaces string properties with new tainted string objects and assigns them back to the reconstructed object — but graphql's getArgumentValues() creates its own separate args object, so the taint never reached the resolver body.

Variable arguments were unaffected because they flow through variableValues, which is already tainted during HTTP body parsing before graphql execution.

Fix

When apm:graphql:resolve:start has subscribers, bindStart temporarily replaces fieldDef.resolve with a wrapper that fires the IAST channel with the actual args object graphql constructs internally. The wrapper restores fieldDef.resolve synchronously as its first action — safe because there is no async gap between bindStart and the resolver call inside executeField.


Test results

  • 223 graphql plugin tests passing, 0 failing (across graphql v15 and v16, CJS and ESM)
  • 9/9 AppSec graphql unit tests passing
  • 8/8 IAST graphql source tests passing (hardcoded + variable args, all apollo-server-express versions)

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 12, 2026

Codecov Report

❌ Patch coverage is 93.35938% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.27%. Comparing base (e90f4e5) to head (297e8ac).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
packages/datadog-plugin-graphql/src/state.js 62.50% 12 Missing ⚠️
packages/datadog-plugin-graphql/src/execute.js 92.15% 4 Missing ⚠️
packages/datadog-plugin-graphql/src/resolve.js 99.28% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7757      +/-   ##
==========================================
+ Coverage   74.25%   74.27%   +0.01%     
==========================================
  Files         765      767       +2     
  Lines       35786    35816      +30     
==========================================
+ Hits        26574    26602      +28     
- Misses       9212     9214       +2     
Flag Coverage Δ
aiguard-macos 39.42% <100.00%> (-0.10%) ⬇️
aiguard-ubuntu 39.54% <100.00%> (-0.10%) ⬇️
aiguard-windows 39.20% <100.00%> (-0.10%) ⬇️
apm-capabilities-tracing-macos 48.89% <7.56%> (-0.31%) ⬇️
apm-capabilities-tracing-ubuntu 48.80% <7.56%> (-0.31%) ⬇️
apm-capabilities-tracing-windows 48.65% <7.56%> (-0.32%) ⬇️
apm-integrations-child-process 38.74% <100.00%> (-0.10%) ⬇️
apm-integrations-couchbase-18 37.52% <100.00%> (-0.09%) ⬇️
apm-integrations-couchbase-eol 38.02% <100.00%> (-0.12%) ⬇️
apm-integrations-oracledb 38.01% <100.00%> (+0.05%) ⬆️
appsec-express 55.39% <75.00%> (-0.07%) ⬇️
appsec-fastify 51.72% <75.00%> (-0.07%) ⬇️
appsec-graphql 51.98% <83.59%> (+0.03%) ⬆️
appsec-kafka 44.49% <100.00%> (+0.02%) ⬆️
appsec-ldapjs 44.11% <100.00%> (-0.08%) ⬇️
appsec-lodash 43.71% <100.00%> (-0.08%) ⬇️
appsec-macos 58.15% <100.00%> (-0.06%) ⬇️
appsec-mongodb-core 48.90% <100.00%> (-0.07%) ⬇️
appsec-mongoose 49.55% <100.00%> (-0.07%) ⬇️
appsec-mysql 51.08% <75.00%> (-0.07%) ⬇️
appsec-node-serialize 43.30% <100.00%> (-0.08%) ⬇️
appsec-passport 47.76% <75.00%> (-0.08%) ⬇️
appsec-postgres 50.71% <75.00%> (-0.07%) ⬇️
appsec-sourcing 42.55% <100.00%> (-0.08%) ⬇️
appsec-stripe 44.74% <75.00%> (-0.08%) ⬇️
appsec-template 43.46% <100.00%> (-0.08%) ⬇️
appsec-ubuntu 58.24% <100.00%> (-0.07%) ⬇️
appsec-windows 57.97% <100.00%> (-0.07%) ⬇️
instrumentations-instrumentation-bluebird 32.33% <100.00%> (-0.10%) ⬇️
instrumentations-instrumentation-body-parser 40.64% <100.00%> (-0.09%) ⬇️
instrumentations-instrumentation-child_process 38.08% <100.00%> (-0.10%) ⬇️
instrumentations-instrumentation-cookie-parser 34.36% <100.00%> (-0.09%) ⬇️
instrumentations-instrumentation-express 34.68% <100.00%> (-0.09%) ⬇️
instrumentations-instrumentation-express-mongo-sanitize 34.49% <100.00%> (-0.09%) ⬇️
instrumentations-instrumentation-express-session 40.27% <100.00%> (-0.09%) ⬇️
instrumentations-instrumentation-fs 32.01% <100.00%> (-0.10%) ⬇️
instrumentations-instrumentation-generic-pool 29.47% <100.00%> (+0.01%) ⬆️
instrumentations-instrumentation-http 39.99% <100.00%> (-0.09%) ⬇️
instrumentations-instrumentation-knex 32.40% <100.00%> (-0.10%) ⬇️
instrumentations-instrumentation-mongoose 33.51% <100.00%> (-0.09%) ⬇️
instrumentations-instrumentation-multer 40.39% <100.00%> (-0.09%) ⬇️
instrumentations-instrumentation-mysql2 38.41% <100.00%> (-0.09%) ⬇️
instrumentations-instrumentation-passport 44.17% <50.00%> (-0.08%) ⬇️
instrumentations-instrumentation-passport-http 43.84% <50.00%> (-0.08%) ⬇️
instrumentations-instrumentation-passport-local 44.38% <50.00%> (-0.08%) ⬇️
instrumentations-instrumentation-pg 37.85% <100.00%> (-0.10%) ⬇️
instrumentations-instrumentation-promise 32.26% <100.00%> (-0.10%) ⬇️
instrumentations-instrumentation-promise-js 32.26% <100.00%> (-0.10%) ⬇️
instrumentations-instrumentation-q 32.31% <100.00%> (-0.10%) ⬇️
instrumentations-instrumentation-url 32.23% <100.00%> (-0.10%) ⬇️
instrumentations-instrumentation-when 32.28% <100.00%> (-0.10%) ⬇️
llmobs-ai 41.61% <100.00%> (-0.09%) ⬇️
llmobs-anthropic 40.84% <100.00%> (+0.19%) ⬆️
llmobs-bedrock 39.32% <100.00%> (-0.08%) ⬇️
llmobs-google-genai 39.88% <100.00%> (-0.08%) ⬇️
llmobs-langchain 39.34% <100.00%> (-0.09%) ⬇️
llmobs-openai 44.13% <100.00%> (-0.05%) ⬇️
llmobs-vertex-ai 40.14% <100.00%> (-0.09%) ⬇️
platform-core 31.47% <ø> (ø)
platform-esbuild 34.42% <ø> (ø)
platform-instrumentations-misc 34.19% <100.00%> (+0.07%) ⬆️
platform-shimmer 37.56% <ø> (ø)
platform-unit-guardrails 32.89% <ø> (ø)
platform-webpack 20.06% <100.00%> (+0.09%) ⬆️
plugins-azure-durable-functions 25.86% <100.00%> (+0.11%) ⬆️
plugins-azure-event-hubs 26.02% <100.00%> (+0.11%) ⬆️
plugins-azure-service-bus 25.38% <100.00%> (+0.11%) ⬆️
plugins-bullmq 43.61% <100.00%> (-0.10%) ⬇️
plugins-cassandra 38.02% <100.00%> (-0.09%) ⬇️
plugins-cookie 27.08% <100.00%> (+0.11%) ⬆️
plugins-cookie-parser 26.86% <100.00%> (+0.11%) ⬆️
plugins-crypto 26.73% <ø> (ø)
plugins-dd-trace-api 38.43% <100.00%> (-0.10%) ⬇️
plugins-express-mongo-sanitize 27.01% <100.00%> (+0.11%) ⬆️
plugins-express-session 26.82% <100.00%> (+0.11%) ⬆️
plugins-fastify 42.36% <100.00%> (-0.09%) ⬇️
plugins-fetch 38.52% <100.00%> (-0.09%) ⬇️
plugins-fs 38.76% <100.00%> (-0.10%) ⬇️
plugins-generic-pool 26.06% <100.00%> (+0.11%) ⬆️
plugins-google-cloud-pubsub 45.69% <100.00%> (-0.06%) ⬇️
plugins-grpc 41.02% <100.00%> (-0.09%) ⬇️
plugins-handlebars 27.05% <100.00%> (+0.11%) ⬆️
plugins-hapi 40.28% <100.00%> (-0.09%) ⬇️
plugins-hono 40.61% <100.00%> (-0.09%) ⬇️
plugins-ioredis 38.60% <100.00%> (-0.10%) ⬇️
plugins-knex 26.68% <100.00%> (+0.11%) ⬆️
plugins-langgraph 37.99% <100.00%> (-0.09%) ⬇️
plugins-ldapjs 24.55% <100.00%> (+0.11%) ⬆️
plugins-light-my-request 26.42% <100.00%> (+0.11%) ⬆️
plugins-limitd-client 32.61% <100.00%> (-0.10%) ⬇️
plugins-lodash 26.15% <100.00%> (+0.11%) ⬆️
plugins-mariadb 39.62% <100.00%> (-0.10%) ⬇️
plugins-memcached 38.34% <100.00%> (-0.10%) ⬇️
plugins-microgateway-core 39.34% <100.00%> (-0.09%) ⬇️
plugins-moleculer 40.64% <100.00%> (-0.09%) ⬇️
plugins-mongodb 39.28% <100.00%> (-0.09%) ⬇️
plugins-mongodb-core 39.15% <100.00%> (-0.07%) ⬇️
plugins-mongoose 38.93% <100.00%> (-0.09%) ⬇️
plugins-multer 26.82% <100.00%> (+0.11%) ⬆️
plugins-mysql 39.46% <100.00%> (-0.10%) ⬇️
plugins-mysql2 39.41% <100.00%> (-0.09%) ⬇️
plugins-node-serialize 27.12% <100.00%> (+0.11%) ⬆️
plugins-opensearch 37.75% <100.00%> (-0.09%) ⬇️
plugins-passport-http 26.87% <100.00%> (+0.11%) ⬆️
plugins-postgres 35.57% <100.00%> (-0.08%) ⬇️
plugins-process 26.73% <ø> (ø)
plugins-pug 27.08% <100.00%> (+0.11%) ⬆️
plugins-redis 39.04% <100.00%> (-0.10%) ⬇️
plugins-router 43.37% <100.00%> (-0.09%) ⬇️
plugins-sequelize 25.66% <100.00%> (+0.11%) ⬆️
plugins-test-and-upstream-amqp10 38.62% <100.00%> (-0.10%) ⬇️
plugins-test-and-upstream-amqplib 44.37% <100.00%> (-0.10%) ⬇️
plugins-test-and-upstream-apollo 39.21% <15.01%> (-0.11%) ⬇️
plugins-test-and-upstream-avsc 38.70% <100.00%> (-0.10%) ⬇️
plugins-test-and-upstream-bunyan 33.94% <100.00%> (-0.10%) ⬇️
plugins-test-and-upstream-connect 40.94% <100.00%> (-0.09%) ⬇️
plugins-test-and-upstream-graphql 40.33% <88.53%> (-0.05%) ⬇️
plugins-test-and-upstream-koa 40.52% <100.00%> (-0.09%) ⬇️
plugins-test-and-upstream-protobufjs 38.92% <100.00%> (-0.10%) ⬇️
plugins-test-and-upstream-rhea 44.40% <100.00%> (-0.07%) ⬇️
plugins-undici 39.37% <100.00%> (-0.09%) ⬇️
plugins-url 26.73% <ø> (ø)
plugins-valkey 38.31% <100.00%> (+0.04%) ⬆️
plugins-vm 26.73% <ø> (ø)
plugins-winston 34.27% <100.00%> (+0.05%) ⬆️
plugins-ws 42.12% <100.00%> (-0.09%) ⬇️
profiling-macos 40.66% <100.00%> (-0.09%) ⬇️
profiling-ubuntu 40.78% <100.00%> (-0.09%) ⬇️
profiling-windows 42.30% <100.00%> (-0.09%) ⬇️
serverless-azure-functions-client 25.74% <100.00%> (+0.11%) ⬆️
serverless-azure-functions-eventhubs 25.74% <100.00%> (+0.11%) ⬆️
serverless-azure-functions-servicebus 25.74% <100.00%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 12, 2026

Overall package size

Self size: 5.46 MB
Deduped: 6.3 MB
No deduping: 6.3 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 3.0.0 | 81.15 kB | 815.98 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@datadog-prod-us1-6
Copy link
Copy Markdown

datadog-prod-us1-6 bot commented Mar 12, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 82.17%
Overall Coverage: 68.68% (-0.04%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 297e8ac | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 12, 2026

Benchmarks

Benchmark execution time: 2026-04-03 22:07:08

Comparing candidate commit 297e8ac in PR branch crysmags/graphql-migration with baseline commit e90f4e5 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 229 metrics, 31 unstable metrics.

@crysmags crysmags changed the title This is a draft made with AIT - feat(graphql): migrate shimmer to orchestrion instrumentation feat(graphql): migrate shimmer to orchestrion instrumentation Apr 3, 2026
crysmags and others added 2 commits April 3, 2026 12:15
Replaces shimmer-based graphql instrumentation with orchestrion's
tracingChannel approach. All 219 tests passing.

- graphql.js: switch from addHook/shimmer to orchestrion channels
- rewriter/instrumentations/graphql.js: orchestrion config for
  execute, parse, validate, resolveField across graphql versions
- execute.js: use tracePromise with asyncEnd lifecycle
- parse.js: use traceSync with document/source caching via WeakMap
- validate.js: use traceSync with error tag on validation failures
- resolve.js: use bindStart/end with field info from exeContext
- state.js: shared WeakMap state for cross-plugin span coordination

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cking

- Add `hooks.resolve` user callback for graphql.resolve spans, called with
  the span and a FieldContext object (fieldName, path, error, result)
- Add `FieldContext` TypeScript interface and `resolve?` to hooks types
- Fix latent AppSec blocking bug: align channel name (startGraphqlResolver)
  and publish payload ({ abortController }) with handler expectations;
  previously `context?.abortController?.abort()` was always a no-op
- Add test coverage for hooks.resolve

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@crysmags crysmags force-pushed the crysmags/graphql-migration branch from 827acf3 to 6c4c132 Compare April 3, 2026 16:21
crysmags and others added 3 commits April 3, 2026 12:46
The IAST plugin subscribes to 'apm:graphql:resolve:start' to taint
resolver args when they originate from a tainted query source (e.g.
hardcoded query literals like books(title: "ls")). The orchestrion
resolve.js was only publishing to the AppSec channel, so IAST never
received field args for hardcoded arguments — only variable args were
tainted via HTTP request body tracking.

Publish { rootCtx, args, info, path, pathString } to the IAST channel
from bindStart to restore parity with the shimmer instrumentation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The orchestrion migration published a reconstructed resolverArgs object
to 'apm:graphql:resolve:start'. The IAST subscriber calls taintObject()
on the published args, which replaces string properties with new tainted
string objects. Since the reconstructed args are a different object from
what graphql passes to the actual resolver, the taint never reached the
resolver body — so hardcoded literal arguments (e.g. books(title: "ls"))
were never detected as COMMAND_INJECTION sources.

Variable arguments were unaffected because they flow through
variableValues, which is tainted during HTTP body parsing before
graphql execution begins.

Fix: when the IAST channel has subscribers, temporarily replace
fieldDef.resolve with a thin wrapper that fires the channel with the
actual args object graphql constructs via getArgumentValues. The wrapper
restores fieldDef.resolve synchronously as its first action — safe
because there is no async gap between bindStart and the resolver call
inside executeField.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@crysmags crysmags force-pushed the crysmags/graphql-migration branch from 28daea8 to 297e8ac Compare April 3, 2026 21:59
@crysmags crysmags marked this pull request as ready for review April 3, 2026 22:13
@crysmags crysmags requested review from a team as code owners April 3, 2026 22:13
@crysmags crysmags added the ai-generated PR created with AI assistance label Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-generated PR created with AI assistance semver-minor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant