Skip to content

Releases: apollographql/router

v2.14.0

28 Apr 17:19
1c91fc9

Choose a tag to compare

🚀 Features

Add expand_json_string_values option to JSON log formatter (PR #9156)

When expand_json_string_values: true is set on a stdout or file JSON log formatter, string attribute values that contain valid JSON objects or arrays are emitted as native JSON instead of quoted strings. This enables log aggregators like Splunk to index sub-fields such as errors{}.extensions.code.

This is useful when telemetry selectors like response_errors: "$[*]" produce structured data: OpenTelemetry's attribute model serializes objects to JSON strings, but log formatters can now expand those strings back to native JSON at emit time. OTLP exporters are unaffected.

By @carodewig in #9156

Normalize supergraph.path to support queries with and without trailing slashes (/) (PR #8860)

Normalize trailing / for supergraph.path to support /graphql and /graphql/. This works by stripping trailing / from both the configured path and the incoming query path to ensure they match, regardless of whether the config or query includes a trailing slash.

By @Jephuff in #8860

Accept JWTs without exp on a per-JWKS basis while still rejecting expired tokens (Issue #8910, PR #8911)

Adds a per-JWKS allow_missing_exp configuration option to Router JWT authentication. When enabled for a JWKS entry, tokens without an exp claim are accepted for that JWKS. Tokens that include an exp claim continue to be validated and rejected if expired.

This is useful for deployments that rely on long-lived machine-to-machine or service tokens that omit exp, without relaxing expiry validation globally.

By @fernando-apollo in #8911

Add selective body field filtering for coprocessor responses (Issue #5020)

Adds the ability to selectively send only specific parts of GraphQL response bodies (data, errors, or extensions) to the coprocessor, instead of the entire response body. This reduces serialization/deserialization overhead and network payload size when the coprocessor only needs to inspect certain fields.

Previously, the body configuration was a boolean that sent either the entire response body or nothing. Now it supports selective field filtering:

coprocessor:
  url: http://127.0.0.1:8081

  # Supergraph responses
  supergraph:
    response:
      body:
        data: false
        errors: true        # Only send errors
        extensions: true    # and extensions

  # Execution responses
  execution:
    response:
      body:
        data: true
        errors: false
        extensions: false   # Only send data

  # Subgraph responses
  subgraph:
    all:
      response:
        body:
          data: false
          errors: true      # Only send errors
          extensions: false

The boolean syntax (body: true or body: false) continues to work for backward compatibility. When using selective filtering, the coprocessor can only modify the fields that were sent to it. Other fields are preserved from the original response.

This feature is available for the supergraph, execution, and subgraph response stages.

By @zachfettersmoore in #9019

Emit apollo.router.operations.rhai.duration histogram metric for Rhai script callbacks (PR #9072)

A new apollo.router.operations.rhai.duration histogram metric (unit: s, value type: f64) is now emitted for every Rhai script callback execution across all pipeline stages. This mirrors the existing apollo.router.operations.coprocessor.duration metric.

Attributes on each datapoint:

  • rhai.stage — the pipeline stage (e.g. RouterRequest, SubgraphResponse)
  • rhai.succeededtrue if the callback returned without throwing an error

By @theJC in #9072

Add intern_strings configuration option for the Rhai plugin (PR #9070)

The Rhai plugin now exposes an intern_strings option that controls Rhai's internal string interning. Under high concurrency, threads encountering new strings must acquire a write lock, which can serialize Rhai execution across concurrent requests.

Setting intern_strings: false disables interning, eliminating the lock:

rhai:
  scripts: ./rhai
  main: main.rhai
  intern_strings: false

String interning can alleviate memory allocation and make string equality checks a little faster. For deployments serving many concurrent requests, the cost likely outweighs the benefit, so we recommend experimenting with intern_strings: false and observing if it improves performance.

The default (true) preserves the existing behavior.

By @theJC in #9070

Add request_duration router selector (PR #9187)

Adds a new request_duration selector for the router service that returns the total elapsed time from when the router received the request. The unit is configurable:

  • seconds (float)
  • milliseconds (integer)
  • nanoseconds (integer)

The selector can be used as a custom instrument attribute or combined with conditions to filter based on request duration. For example, to count requests that complete in under 10 seconds:

telemetry:
  instrumentation:
    instruments:
      router:
        my.short.requests:
          type: counter
          value: unit
          unit: reqs
          description: "Requests completing in under 10 seconds"
          condition:
            lt:
              - request_duration: seconds
              - 10

By @carodewig in #9187

Add subscription and defer observability: end reason span attributes and termination metrics (PR #8858)

Adds new span attributes and metrics to improve observability of streaming responses.

Span attributes:

  • apollo.subscription.end_reason: Records the reason a subscription was terminated. Possible values are server_close, subgraph_error, heartbeat_delivery_failed, client_disconnect, schema_reload, and config_reload.
  • apollo.defer.end_reason: Records the reason a deferred query ended. Possible values are completed (all deferred chunks were delivered successfully) and client_disconnect (the client disconnected before all deferred data was delivered).

Both attributes are added dynamically to router spans only when relevant (i.e., only on requests that actually use subscriptions or @defer), instead of being present on every router span.

Metrics:

A single counter is emitted when a subscription terminates:

  • apollo.router.operations.subscriptions.terminated.client (default attributes: reason, subgraph.name): Incremented once per client connection when a subscription stream ends. The reason attribute indicates why (possible values: server_close, subgraph_error, client_disconnect, heartbeat_delivery_failed, schema_reload, config_reload). The subgraph.name attribute is populated if available. When deduplication is enabled, a single subgraph WebSocket closure produces one terminated event per deduplicated client sharing that connection (each with reason=server_close).

    Attributes for this metric are configurable. By default, reason and subgraph.name are enabled. You can also enable client.name via configuration:

    telemetry:
      instrumentation:
        instruments:
          router:
            apollo.router.operations.subscriptions.terminated.client:
              attributes:
                reason: true
                subgraph.name: true
                client.name: true

The following counter is emitted when a subscription request is rejected:

  • apollo.router.operations.subscriptions.rejected (attributes: reason, subgraph.name): A subscription request was rejected. The reason attribute indicates why: max_opened_subscriptions_limit_reached (the router has reached its max_opened_subscriptions limit) or subgraph (the subgraph WebSocket connection failed, e.g. connection refused, protocol error, or failed subscription handshake). The subgraph.name attribute is populated when available, and defaults to an empty string otherwise.

The following counter is emitted when a subgraph ends a subscription:

  • apollo.router.operations.subscriptions.terminated.subgraph (attributes: subgraph.name): Incremented once per subgraph WebSocket closure. Each deduplicated client sharing that connection will also emit a corresponding apollo.router.operations.subscriptions.terminated.client event with reason=server_close.

By @rohan-b99 in #8858

🐛 Fixes

Recognize 204 (No Content) responses without Content-Length header in connectors (PR #9141)

Connectors now correctly handle HTTP 204 (No Content) responses from spec-compliant servers that don't include a Content-Length header.

Previously, empty body detection relied on the presence of a Content-Length: 0 header. Because the HTTP spec explicitly forbids including this header in 204 responses, connectors would fail to recognize empty bodies from compliant servers. The fix checks body.is_empty() directly, with Content-Length: 0 kept as a fallback for non-compliant servers.

By @apollo-mateuswgoettems in #9141

Retry JWKS candidates on issuer/audience mismat...

Read more

v2.14.0-rc.2

24 Apr 15:21

Choose a tag to compare

v2.14.0-rc.2 Pre-release
Pre-release
2.14.0-rc.2

v2.14.0-rc.1

23 Apr 18:12

Choose a tag to compare

v2.14.0-rc.1 Pre-release
Pre-release
2.14.0-rc.1

v2.14.0-rc.0

23 Apr 09:05

Choose a tag to compare

v2.14.0-rc.0 Pre-release
Pre-release
2.14.0-rc.0

v2.13.1

03 Apr 18:43
d104ef3

Choose a tag to compare

🐛 Fixes

Fix spurious REQUEST_RATE_LIMITED errors when no rate limiting is configured (PR #9034)

Under sustained load, the router could return REQUEST_RATE_LIMITED (429) errors even when no rate limiting was configured. An internal queue had an implicit limit that could trigger load shedding, even if the queue was not actually overloaded.

This fix removes that implicit limit, so requests are shed only when the queue is genuinely full. The queue still has explicit limits to ensure quality of service.

By @jhrldev in #9034

v2.13.1-rc.0

01 Apr 04:56

Choose a tag to compare

v2.13.1-rc.0 Pre-release
Pre-release
2.13.1-rc.0

v2.13.0

31 Mar 18:58
5a27601

Choose a tag to compare

🚀 Features

Add context_id selector for telemetry to expose unique per-request identifier (PR #8899)

A new context_id selector is now available for router, supergraph, subgraph, and connector telemetry instrumentation. This selector exposes the unique per-request context ID, which you can use to reliably correlate and debug requests in traces, logs, and custom events.

The context ID was previously accessible in Rhai scripts as request.id but had no telemetry selector. You can now include context_id: true in your telemetry configuration to add the context ID to spans, logs, and custom events.

Example configuration:

telemetry:
  instrumentation:
    spans:
      router:
        attributes:
          "request.id":
            context_id: true
      supergraph:
        attributes:
          "request.id":
            context_id: true
      subgraph:
        attributes:
          "request.id":
            context_id: true
      connector:
        attributes:
          "request.id":
            context_id: true

By @BobaFetters in #8899

Enable Unix Domain Socket paths (PR #8894)

Enables Unix Domain Socket (UDS) paths for both coprocessors and subgraphs. Paths must use ?path= as the query param: unix:///tmp/some.sock?path=some_path

By @aaronArinder in #8894

Add configurable pool_idle_timeout for HTTP client connection pools (PR #9014)

Adds a new pool_idle_timeout configuration option to the HTTP client used by subgraphs, connectors, and coprocessors. This controls how long idle keep-alive connections remain in the connection pool before being evicted. The default is 15 seconds (up from the previous hardcoded 5 seconds). Setting it to null disables the idle eviction interval entirely, meaning pooled connections are never evicted due to idleness.

The option is available at every level where HTTP client configuration applies:

traffic_shaping:
  all:
    pool_idle_timeout: 30s      # applies to all subgraphs
  subgraphs:
    products:
      pool_idle_timeout: 60s    # per-subgraph override
  connector:
    all:
      pool_idle_timeout: 30s    # applies to all connectors
    sources:
      my_source:
        pool_idle_timeout: 60s  # per-source override

coprocessor:
  url: http://localhost:8081
  client:
    pool_idle_timeout: 30s      # coprocessor client

By @aaronArinder in #9014

Add persisted query ID context key (PR #8959)

Adds a context key for the persisted query ID in the router. The PersistedQueryLayer now stores the persisted query ID in the request context, and the Rhai engine can access it via that key.

By @faisalwaseem in #8959

Add retry layer for push metrics exporters (PR #9036)

Adds a RetryMetricExporter layer that retries up to three times with jittered exponential backoff for the apollo metrics and otlp named exporters.

By @rohan-b99 in #9036

🐛 Fixes

Support more types of nullable elements in response/entity cache keys (PR #8923)

PR #8767 (released in Router v2.11.0) changed the entity and response caching keys to support nullable elements. The implementation covered the case of a field explicitly being set to null, but didn't cover the following cases:

  • Nullable field being missing
  • Nullable list items

This change adds support for those cases.

By @carodewig in #8923

Pin transitive h2 dependency at minimum v0.4.13 to pick up critical flow-control, deadlock, and tracing fixes (PR #9033)

h2 0.4.13 (released January 5, 2026) contains three fixes directly relevant to the router, which uses h2 exclusively as a client when connecting to subgraphs:

  • Capacity deadlock under concurrent streams (#860) — high relevance: Under concurrent load with max_concurrent_streams limits in effect, flow-control capacity could be assigned to streams still in pending_open state. Those streams could never consume the capacity, starving already-open streams and permanently freezing all outgoing traffic on the connection with no error surfaced. This is directly triggerable in the router: any subgraph behind Envoy or a gRPC backend advertises a max_concurrent_streams limit (Envoy defaults to 100), and under production load the router will routinely queue more concurrent requests than that limit allows.

  • OTel tracing span lifetime leak (#868) — high relevance: The h2 Connection object captured the active tracing span at connection creation time as its parent, keeping that span alive for the entire lifetime of the connection. Since the router wraps every subgraph request in an OpenTelemetry span and connections are pooled, affected spans could linger indefinitely under sustained traffic — never being exported to the tracing backend and accumulating in memory.

  • Flow-control stall on padded DATA frames (#869) — lower relevance for typical subgraphs, higher for connectors: Padding bytes in DATA frames were not being returned to the flow-control window, causing the connection window to drain to zero and permanently stalling downloads with no error. Typical GraphQL/gRPC subgraphs do not send padded frames, but router connectors calling arbitrary HTTP APIs (e.g., Google Cloud Storage or CDN-backed endpoints) can encounter this.

By @theJC in #9033

Return 503 for rate limit traffic shaping (PR #9013)

Reverts PR #8765.

When the router's rate limit or buffer capacity is exceeded, it now returns HTTP 503 (Service Unavailable) instead of HTTP 429 (Too Many Requests).

HTTP 429 implies that a specific client has sent too many requests and should back off. HTTP 503 more accurately reflects the situation: the router is temporarily unable to handle the request due to overall service load, not because of the behavior of any individual client.

This change affects both router-level and subgraph-level rate limiting. Documentation has been updated to reflect the new status code.

By @carodewig in #9013

Set Cache-Control: no-store when the response cache returns GraphQL errors (PR #8933)

When using the response cache plugin, if a query spans multiple subgraphs and one returns an error or times out, the final HTTP response was still carrying the successful subgraph's Cache-Control header (e.g. max-age=1800, public). This allowed intermediate caches (CDNs, reverse proxies) to cache and serve incomplete or stale partial responses to other clients.

If the response cache plugin is enabled and was going to set a Cache-Control header, but the response contains any GraphQL errors, it now sets Cache-Control: no-store instead of the merged subgraph cache control value.

By @carodewig in #8933

Apply entity-less subgraph errors to the nearest parent instead of every entity

When making an entity resolution, if entity resolution fails (for example, because the path from the subgraph was malformed), the router applied errors to every item in the list of entities expected. For example, if 2000 entities were expected but 2000 errors were returned instead, each error was applied to every entity. This causes an explosion of errors and leads to significant memory allocations that can cause OOMKills.

When the router can't determine where an error should be applied, it now applies it to the most immediate parent of the targeted entity — for a list of users, it applies to the list itself rather than to each index of that list.

By @aaronArinder in #8962

Report http.client.response.body.size and http.server.response.body.size consistently when content-length is absent or compression is used (PR #8972)

Reporting these metrics previously relied on either the Content-Length header or the size_hint of the body, which reports the uncompressed size. OpenTelemetry semantic conventions recommend reporting the compressed size.

The router now consistently reports the compressed size when compression is used, even when Content-Length is absent, for:

  • Router → client responses
  • Subgraph → router responses
  • Connector → router responses

By @rohan-b99 in #8972

Ensure query planning allocation stats are still recorded if cooperative cancellation is not enabled (PR #8902)

The metric apollo.router.query_planner.memory was unintentionally only showing allocations during the query_parsing compute job if cooperative cancellation for query planning was not enabled. Both query_parsing and query_planning should now be available.

By @rohan-b99 in #8902

Align ServiceMonitor naming with other chart resources using the router.fullname helper ([Issue #TSH-22160](https...

Read more

v2.13.0-rc.0

27 Mar 14:29

Choose a tag to compare

v2.13.0-rc.0 Pre-release
Pre-release
2.13.0-rc.0

v2.12.1

24 Mar 16:49

Choose a tag to compare

🔒 Security

Note

For more information on the impact of the fix in this release and how your deployment might be affected or remediated, see the corresponding GitHub Security Advisory (GHSA) linked below. Updating to a patched Router version will resolve any vulnerabilities.

Reject GET requests with a non-application/json Content-Type header (GHSA-hff2-gcpx-8f4p)

The router now rejects GraphQL GET requests that include a Content-Type header with a value other than application/json (with optional parameters such as ; charset=utf-8). Any other value is rejected with a 415 status code.

GET requests without a Content-Type header continue to be allowed (subject to the router's existing CSRF prevention check), since GET requests have no body and therefore technically do not require this header.

This improvement makes the router's CSRF prevention more resistant to browsers that implement CORS in non-spec-compliant ways. Apollo is aware of one browser which as of March 2026 has a bug allowing an attacker to circumvent the router's CSRF prevention to carry out read-only XS-Search-style attacks. The browser vendor is in the process of patching this vulnerability; upgrading to this version of the router mitigates the vulnerability.

If your graph uses cookies (or HTTP Basic Auth) for authentication, Apollo encourages you to upgrade to this version.

This is technically a backwards-incompatible change. Apollo is not aware of any GraphQL clients that provide non-empty Content-Type headers on GET requests with types other than application/json. If your use case requires such requests, please contact support, and we may add more configurability in a follow-up release.

By @glasser and @carodewig in GHSA-hff2-gcpx-8f4p

v2.12.1-rc.0

24 Mar 15:22

Choose a tag to compare

v2.12.1-rc.0 Pre-release
Pre-release
2.12.1-rc.0