Releases: apollographql/router
v2.10.4
🐛 Fixes
Preserve null propagation when multiple fragments select the same non-null field (PR #9032)
When a query uses multiple fragment spreads on the same parent type and a subgraph response is missing a required non-null field on a union member, the router now correctly returns null for the affected field rather than a partial object like {"__typename": "A"}.
The GraphQL specification requires that a non-null violation propagates null upward to the nearest nullable parent. Previously, if one fragment nullified a field, a subsequent fragment on the same parent could overwrite that null with a partial result — producing a spec-incorrect response.
Reject invalid values for client library name and version (PR #8934)
Rejects invalid values (validated against a regex) for library name and version provided in headers or operation extensions, which are used for the Client Awareness feature and telemetry.
Only delete coprocessor context keys from those that were sent in a given stage (PR #9519)
Addresses a race condition where context keys added by concurrent parallel subgraph stages could unintentionally be deleted.
By @rohan-b99 in #9519
Normalize supergraph.path to support queries with and without trailing slashes (/) (PR #8860)
Normalize trailing / for supergraph.path to support /graphql and /graphql/. This works by stripping trailing / from both the configured path and the incoming query path to ensure they match, regardless of whether the config or query includes a trailing slash.
v2.10.4-rc.0
Release v2.10.4-rc.0
v2.10.3
🐛 Fixes
Support non-ASCII (UTF-8) WebSocket header values (Issue #1485, PR #9051)
The router can now handle WebSocket connections with UTF-8 encoded header values, including non-ASCII characters like "Montréal". Previously, such connections failed because of serialization issues in the underlying tungstenite library.
The fix comes from updating tokio-tungstenite from v0.28.0 to v0.29.0.
By @BobaFetters in #9051
Handle both deprecated enum values when merging coprocessor context (PR #8913)
A change to coprocessor context merges in Router v2.10 caused keys to be deleted when context: true is used as the coprocessor context selector in the router configuration file.
The workaround was to pass context: deprecated instead. This change brings parity when context: true is provided.
By @carodewig in #8913
🛠 Maintenance
Pin transitive h2 dependency at minimum v0.4.13 to pick up critical flow-control, deadlock, and tracing fixes (PR #9033)
h2 0.4.13 (released January 5, 2026) contains three fixes directly relevant to the router, which uses h2 exclusively as a client when connecting to subgraphs:
-
Capacity deadlock under concurrent streams (#860) — high relevance: Under concurrent load with
max_concurrent_streamslimits in effect, flow-control capacity could be assigned to streams still inpending_openstate. Those streams could never consume the capacity, starving already-open streams and permanently freezing all outgoing traffic on the connection with no error surfaced. This is directly triggerable in the router: any subgraph behind Envoy or a gRPC backend advertises amax_concurrent_streamslimit (Envoy defaults to 100), and under production load the router will routinely queue more concurrent requests than that limit allows. -
OTel tracing span lifetime leak (#868) — high relevance: The h2
Connectionobject captured the active tracing span at connection creation time as its parent, keeping that span alive for the entire lifetime of the connection. Since the router wraps every subgraph request in an OpenTelemetry span and connections are pooled, affected spans could linger indefinitely under sustained traffic — never being exported to the tracing backend and accumulating in memory. -
Flow-control stall on padded DATA frames (#869) — lower relevance for typical subgraphs, higher for connectors: Padding bytes in
DATAframes were not being returned to the flow-control window, causing the connection window to drain to zero and permanently stalling downloads with no error. Typical GraphQL/gRPC subgraphs do not send padded frames, but router connectors calling arbitrary HTTP APIs (e.g., Google Cloud Storage or CDN-backed endpoints) can encounter this.
v2.15.0
🚀 Features
Add ignore_auth_context option to subscription deduplication config (PR #9078)
When the router's JWT authentication plugin validates a token, it decodes the claims and stores them internally on the request — before any subgraph request is built. The router then factors those stored claims into its check for whether two subscriptions are identical, separately from any HTTP headers it may forward downstream.
This means that on any router with JWT authentication enabled, every authenticated user effectively gets their own subgraph WebSocket connection — even if the subscription data is identical for all users, and even if the Authorization header is never forwarded to the subgraph at all. Adding authorization to ignored_headers doesn't help here, because it only affects HTTP headers; the decoded claims live in a different layer that ignored_headers never touches.
Two new capabilities are added to the deduplication config block:
ignore_auth_context: bool(default:false) — whentrue, the router skips stored JWT claims when checking subscription identity, allowing all authenticated users to share a single subgraph WebSocket connection when the subscription data is truly non-personalized (e.g., product price updates, stock price feeds).- Per-subgraph deduplication control via
all:/subgraphs:— deduplication settings can now be set globally with a default and overridden per subgraph by name, using the standardSubgraphConfiguration<T>pattern already used elsewhere in the router config.
subscription:
deduplication:
all:
enabled: true
subgraphs:
stocks:
ignore_auth_context: trueAdd include_cache_control_header_on_router_response to suppress Cache-Control on client responses (PR #9002)
The response cache plugin now supports a include_cache_control_header_on_router_response boolean config option (defaults to true). When set to false, the router omits the Cache-Control header from supergraph responses sent to clients, while all internal caching behavior — Redis storage, TTL enforcement, cache key computation, and the cache debugger — remains unchanged.
This is useful when the router sits behind a CDN or reverse proxy that manages its own caching headers, or when you want to prevent clients from caching responses locally while keeping server-side caching active.
response_cache:
enabled: true
include_cache_control_header_on_router_response: false # default: true
subgraph:
all:
enabled: true
redis:
urls: ["redis://..."]Add per-subgraph and per-connector HTTP response size limits (PR #9160)
The router can now cap the number of bytes it reads from subgraph and connector HTTP response bodies, protecting against out-of-memory conditions when a downstream service returns an unexpectedly large payload.
The limit is enforced as the response body streams in — the router stops reading and returns a GraphQL error as soon as the limit is exceeded, without buffering the full body first.
Configure a global default and optional per-subgraph or per-source overrides:
limits:
subgraph:
all:
http_max_response_size: 10MB # 10 MB for all subgraphs
subgraphs:
products:
http_max_response_size: 20MB # 20 MB override for 'products'
connector:
all:
http_max_response_size: 5MB # 5 MB for all connector sources
sources:
products.rest:
http_max_response_size: 10MB # 10 MB override for 'products.rest'There is no default limit; responses are unrestricted unless you configure this option.
When a response is aborted due to the limit, the router:
- Returns a GraphQL error to the client with extension code
SUBREQUEST_HTTP_ERROR - Increments the
apollo.router.limits.subgraph_response_size.exceededorapollo.router.limits.connector_response_size.exceededcounter - Records
apollo.subgraph.response.aborted: "response_size_limit"orapollo.connector.response.aborted: "response_size_limit"on the relevant span
Configuration migration: Existing limits fields (previously at the top level of limits) are now nested under limits.router. A configuration migration is included that updates your config file automatically.
By @carodewig in #9160
Add apollo.router.connection.acquire.duration metric for TCP/TLS connection timing (PR #9309)
Adds a new histogram metric, apollo.router.connection.acquire.duration, that records how long it takes to establish a new TCP or Unix socket connection to a downstream service (subgraph, connector, or coprocessor). The metric fires only when the connection pool opens a new connection — pool hits are not recorded.
This metric is useful for diagnosing connection establishment latency. For example, if a subgraph shows elevated overall response latency, a high connection.acquire.duration indicates the delay is in TCP/TLS setup; a near-zero value (or no data) points to post-connection causes like slow server responses.
Attributes:
network.transport:tcpfor HTTP connections,unixfor Unix socket connectionssubgraph.name: name of the subgraph (for subgraph connections)connector.source.name: name of the connector source (for connector connections)coprocessor:true(for coprocessor connections)
By @carodewig in #9309
Add max_lifetime configuration for subscriptions (PR #9216)
Adds a new max_lifetime field to the subscription configuration block, allowing operators to set a maximum duration for how long a subscription can remain open. After the configured duration the subscription is closed and the client receives a terminal error with extension code SUBSCRIPTION_MAX_LIFETIME_EXCEEDED.
subscription:
enabled: true
max_lifetime: 10m # close subscriptions after 10 minutes
mode:
callback:
public_url: "https://my-router.example.com/subscription/callback"By default (max_lifetime unset) there is no lifetime limit, preserving the existing behaviour.
By @BobaFetters in #9216
Add operation_body_timeout for file upload requests (PR #9243)
Adds a new operation_body_timeout limit to the file uploads plugin, allowing operators to set a tight deadline on reading the operations field (GraphQL query + variables) from multipart request bodies, independently of the overall router timeout.
File uploads is the only router flow where the request body is read as a stream in the plugin layer: the multipart body must be parsed to extract the operations field before query planning can begin. This means a slow or stalled client can hold a connection open until the global router timeout fires. The new operation_body_timeout lets you set a tighter deadline specifically for that body-reading phase.
If operation_body_timeout is not set, no additional body-read timeout is applied — the overall router timeout remains the only bound.
preview_file_uploads:
enabled: true
protocols:
multipart:
enabled: true
limits:
operation_body_timeout: 5s # optional; no defaultWhen the timeout fires, the router returns a 504 Gateway Timeout response with extension code GATEWAY_TIMEOUT.
By @carodewig in #9243
🐛 Fixes
Resolve @connect field values when a root query alias is combined with field-level aliases (Issue #9347)
Queries that aliased both the root query field and one or more of its subfields on a @connect-backed type returned null for every aliased subfield, which could cascade into null propagation for non-nullable types. Either alias in isolation worked correctly — only the combination of both triggered the bug.
Given this query:
{
items: search_items(query: "test") {
results {
id
link: viewUri
}
}
}Before
Aliased fields returned null, and null propagation bubbled up through non-nullable types until the entire result was nullified:
{
"data": { "items": null },
"extensions": {
"valueCompletion": [
{ "message": "Null value found for non-nullable type String", "path": ["items", "results", 0] },
{ "message": "Null value found for non-nullable type Item", "path": ["items", "results", 0] },
{ "message": "Null value found for non-nullable type [Item!]", "path": ["items", "results"] }
]
}
}After
Root and field aliases now work together as expected:
{
"data": {
"items": {
"results": [
{ "id": "1", "link": "https://example.com/docs/001" }
]
}
}
}Record apollo.router.operations.coprocessor.duration even when the coprocessor call times out (PR #9296)
apollo.router.operations.coprocessor.duration is now recorded even when a coprocessor call is cut short by a router timeout. Previously, the metric was only emitted when the call completed normally, leaving timeout latencies invisible in the histogram.
By [@conwuegb](https://github.c...
v2.10.3-rc.0
Release v2.10.3-rc.0
v2.15.0-rc.0
Release v2.15.0-rc.0
v2.14.2
🐛 Fixes
Recover the Redis-backed caches after cluster events and honor required_to_start: true on startup
The router's Redis-backed caches (query planner, entity cache, APQ, response cache) could silently stall after a network event involving Redis replicas or the full cluster — accumulating queued commands, command timeouts, latency, and memory pressure until the router was redeployed. The router now detects when the underlying Redis client has given up reconnecting, drains the connection pool, and rebuilds it on the next request. In deployments where the broadcast cluster topology contains nodes that aren't routeable from the router's network position (for example, internal IPs reserved for replica promotion), a new replica filter screens those nodes out before they enter the routing table.
The required_to_start: true flag — available on each cache under supergraph.query_planning.cache.redis, apq.router.cache.redis, preview_entity_cache.subgraph.all.redis, and experimental_response_cache.subgraph.all.redis — now actually fails the router's startup fast when Redis is unreachable, instead of hanging indefinitely or silently returning success under broadcast overflow.
The router also now supports required_to_start: false, allowing the router to start when Redis is unavailable at boot and to begin caching once Redis becomes reachable.
For more technical internal details, see PR #9023 and PR #9418. For more details on configuring the router's Redis-backed caches, see Response Cache Customization and the related caching docs.
By @aaronArinder in #9023 and #9418
v2.14.2-rc.0
Release v2.14.2-rc.0
v2.14.1
🐛 Fixes
Avoid spurious "meter provider after shutdown" error during router shutdown (PR #9248)
The router no longer emits the spurious cannot use meter provider after shutdown error during shutdown. The metrics aggregation layer now returns a noop instrument in that path instead of panicking.
@rohan-b99 in #9248
Use lazy idle eviction in connection pool to avoid inter-request TCP closes (PR #9308)
When pool_idle_timeout was introduced in v2.13.0, the router unconditionally enabled a background timer that proactively closed idle connections exceeding the timeout. In some network environments, the TCP close sent by this background task raced with a new connection attempt and caused significant latency spikes on the next request.
The router now uses lazy eviction: connections are only closed at checkout time, when a request finds a pooled connection that has exceeded pool_idle_timeout. No TCP closes are sent between requests. This matches router behavior before v2.13.0.
@carodewig in #9308
Include URL and failure category in JWKS fetch error logs (PR #9258)
When the JWKS server is unreachable, the router now logs a specific, actionable message including the URL and the failure category (timed out, connection failed, or generic failure) — replacing the previous vague "could not get url" message.
@carodewig in #9258
Pick up hickory 0.26.1 to close two upstream DNS DoS advisories (RUSTSEC-2026-0119, RUSTSEC-2026-0120)
The router's DNS resolver (via hickory-resolver) inherits two upstream advisories in hickory-proto / hickory-net 0.26.0. Both are fixed in 0.26.1, which is now pinned in Cargo.lock.
Source-built consumers were already insulated by caret-semver dependency declarations; this change picks up the fix in Apollo's pre-built binaries and Docker images.
@carodewig in #9321
Return a single JSON response for unsupported defer-with-batch queries (PR #9311)
Batched queries that use @defer are not supported by the router. Previously these requests produced a malformed multipart response; they now return a single JSON response with errors that explicitly indicates the lack of support.
@rohan-b99 in #9311
Apply http_max_request_bytes only to the operations field, not file streams (PR #9226, PR #9327)
Previously, limits.http_max_request_bytes (default 2 MB) was applied to the entire multipart body of file upload requests, causing large file uploads to be rejected even when preview_file_uploads.protocols.multipart.limits.max_file_size was configured to allow them.
The limit now applies only to the GraphQL operations field (the query and variables). File data is bounded separately by max_file_size, enforced by the multer parser.
@carodewig in #9226 and #9327
🛠 Maintenance
Instrument experimental config features with OTLP gauges (PR #9330)
Adds apollo.router.config.experimental_* OTLP gauge metrics for all customer-facing experimental config flags, using the existing populate_config_instrument! pattern in configuration/metrics.rs. This enables Apollo to track adoption of experimental features so we can inform decisions about which to promote or remove in future releases.
Features now instrumented:
experimental_chaosexperimental_type_conditioned_fetchingexperimental_hoist_orphan_errorsexperimental_log_on_broken_pipeexperimental_plans_limitexperimental_paths_limitexperimental_reuse_query_plansexperimental_cooperative_cancellationexperimental_prewarm_query_plan_cacheexperimental_local_field_metricsexperimental_response_trace_idexperimental_otlp_endpointexperimental_otlp_tracing_protocolexperimental_otlp_metrics_protocolexperimental_http2experimental_http2_keep_alive_intervalexperimental_http2_keep_alive_timeoutexperimental_mock_subgraphsexperimental.expose_query_plan(recorded asapollo.router.config.experimental_expose_query_plan)
The mandatory experimental_diagnostics plugin is intentionally excluded because it is loaded on every router and would always report adoption as 100%.
Avoid unnecessary clones on subgraph requests (PR #9266)
The router now avoids some unnecessary memory allocations when making subgraph requests, particularly on the APQ (Automatic Persisted Queries) path.
@carodewig in #9266
Improve query plan cache throughput with an in-memory fast path (PR #9279)
Every query plan cache lookup — including cache hits — previously acquired the wait_map mutex before checking whether the value was in memory. On a warm cache this was pure overhead: the mutex was locked twice, a broadcast::Sender was allocated, and a cleanup task was spawned, all to be immediately discarded.
A fast path now checks the in-memory cache before acquiring the mutex. On a hit the value is returned immediately; the wait_map path is only entered on a miss, which is the only case where deduplication is needed.
v2.14.1-rc.0
2.14.1-rc.0