feat(limits): add per-subgraph and per-connector response size limits#9160
feat(limits): add per-subgraph and per-connector response size limits#9160
Conversation
Adds `limits.subgraph.all.http_max_response_bytes` and per-subgraph overrides under `limits.subgraph.subgraphs.<name>` to cap how many bytes are read from a subgraph response body, preventing OOM when a subgraph sends an unexpectedly large payload. Enforcement happens in `do_fetch()` via `http_body_util::Limited`, which rejects the body mid-stream as each chunk arrives rather than buffering everything first. The limit is signaled through the request `Context` via a new `SubgraphResponseSizeLimit` extension type set by `LimitsPlugin::subgraph_service()`. This is a breaking config change: all existing `limits.*` fields move under `limits.router.*`. A new migration (2045) handles automatic upgrade of old configs. The `limits.subgraph` section is additive.
This reverts commit 1f86a96.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
✅ Docs preview readyThe preview is ready to be viewed. View the preview File Changes 0 new, 3 changed, 0 removedBuild ID: 728911f536b13046ca6ce7d5 URL: https://www.apollographql.com/docs/deploy-preview/728911f536b13046ca6ce7d5
|
This comment has been minimized.
This comment has been minimized.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Applied suggestions from AI review 20260409-911566dd-9160-9cb68e9860742752: - docs/source/routing/security/request-limits.mdx:445: Strings within code font should not be surrounded by quotes.; Use the present te... - docs/source/routing/security/request-limits.mdx:402: Use imperative verbs for instructions.; Use an authoritative voice to guide the ... - docs/source/routing/security/request-limits.mdx:400: Do not surround strings in code font with quotes.; Use active voice to describe ... - docs/source/routing/security/request-limits.mdx:397: Avoid using "When" if it can be replaced by "After" to indicate sequence. - docs/source/routing/security/request-limits.mdx:383: Prescribe the specific tool and method for measuring response sizes to provide a... - docs/source/routing/observability/router-telemetry-otel/enabling-telemetry/standard-instruments.mdx:284: Introduce the list item with a colon instead of a hyphen to separate the term fr... - docs/source/routing/security/request-limits.mdx:414: Use active voice for the identification step. - docs/source/routing/security/request-limits.mdx:430: Unordered lists should be introduced with a sentence or fragment that ends in a ... Review: #9160 Triggered by: caroline.rodewig@apollographql.com
aaronArinder
left a comment
There was a problem hiding this comment.
would be super nice to use ByteSize so folks don't have to do the math for bytes, but otherwise lgtm!
| limits: | ||
| subgraph: | ||
| all: | ||
| http_max_response_bytes: 10485760 # 10 MB for all subgraphs |
There was a problem hiding this comment.
why not use the friendly ByteSize to get something like 10mb?
There was a problem hiding this comment.
Good idea! I just cribbed from router.http_max_request_bytes but bytesize would be much friendlier.
| Some(SubgraphResponseSizeLimit(limit)) => { | ||
| router::body::into_bytes_limited(body, limit) | ||
| .instrument(tracing::debug_span!("aggregate_response_data")) | ||
| .await | ||
| .map_err(|err| { | ||
| tracing::error!(fetch_error = ?err); | ||
| let reason = if err.downcast_ref::<LengthLimitError>().is_some() { | ||
| u64_counter!( | ||
| "apollo.router.limits.subgraph_response_size.exceeded", | ||
| "Number of subgraph responses aborted because they exceeded the configured response size limit", | ||
| 1, | ||
| subgraph.name = service_name.to_string() | ||
| ); | ||
| tracing::Span::current() | ||
| .record("apollo.subgraph.response.aborted", "response_size_limit"); | ||
| format!("subgraph response body exceeded limit of {limit} bytes") | ||
| } else { | ||
| err.to_string() | ||
| }; | ||
| FetchError::SubrequestHttpError { | ||
| status_code: Some(parts.status.as_u16()), | ||
| service: service_name.to_string(), | ||
| reason, | ||
| } | ||
| }) | ||
| } |
There was a problem hiding this comment.
I take this (and logic for into_bytes_limited) to be the meat of this pr and it looks solid to me! flagging because I only glanced over all the config shuffling
There was a problem hiding this comment.
Yep that's accurate! It had to happen to get the config nested but it definitely clouds the real changes.
Summary
limits.subgraph.all.http_max_response_bytesandlimits.subgraph.subgraphs.<name>.http_max_response_bytesconfig to cap the number of bytes read from subgraph HTTP response bodies, protecting against OOM from unexpectedly large payloadslimits.connector.all.http_max_response_bytesandlimits.connector.sources.<source>.http_max_response_bytesconfig for the same protection on connector responseshttp_body_util::Limited) — the router stops reading as soon as the limit is exceeded without buffering the full bodylimits.*fields moved underlimits.router.*; a config migration entry handles the breaking change transparentlyNew metrics and span attributes
apollo.router.limits.subgraph_response_size.exceededcounter (attribute:subgraph.name)apollo.router.limits.connector_response_size.exceededcounter (attribute:connector.source)apollo.subgraph.response.abortedspan attribute set to"response_size_limit"when a subgraph response is abortedapollo.connector.response.abortedspan attribute set to"response_size_limit"when a connector response is abortedTest plan
plugins::limits— all config resolution and service injection behaviorservices::subgraph_service— limit enforced and error returned; no counter increment for non-limit errorsplugins::connectors::handle_responses— limit enforced for connector responsesservices::router::body—into_bytes_limitedbehaviorrouting/security/request-limits.mdx,routing/observability/router-telemetry-otel/enabling-telemetry/standard-instruments.mdx🤖 Generated with Claude Code