Skip to content

fix(api): cap executor payload response size in FetchPayloadsHTTP#1281

Open
ouicate wants to merge 2 commits into
gonka-ai:mainfrom
ouicate:fix/validator-payload-fetch-size-limit
Open

fix(api): cap executor payload response size in FetchPayloadsHTTP#1281
ouicate wants to merge 2 commits into
gonka-ai:mainfrom
ouicate:fix/validator-payload-fetch-size-limit

Conversation

@ouicate

@ouicate ouicate commented May 30, 2026

Copy link
Copy Markdown

Summary

FetchPayloadsHTTP (decentralized-api/internal/validation/payload_retrieval.go) decoded the HTTP body returned by an executor with json.NewDecoder(resp.Body).Decode(&payloadResp) and no size cap. The payloadRetrievalClient 30-second timeout bounds duration but not bytes, so a malicious in-group executor that streams a multi-gigabyte JSON body — pacing the bytes to fit inside 30 s — forces every validating slot that fetches from it to allocate the whole body and then materialize two large []byte fields (PromptPayload, ResponsePayload) while decoding. The victims are the honest validators in the group, not an external attacker, so this is an insider liveness/OOM primitive — every slot is hit independently. The non-2xx error branch had the same defect via an unbounded io.ReadAll(resp.Body) that was then embedded into the returned error string.

Root Cause

if resp.StatusCode != http.StatusOK {
    body, _ := io.ReadAll(resp.Body)                                // unbounded
    return nil, fmt.Errorf("executor returned status %d: %s", resp.StatusCode, string(body))
}

var payloadResp PayloadResponse
if err := json.NewDecoder(resp.Body).Decode(&payloadResp); err != nil {  // unbounded
    return nil, fmt.Errorf("failed to decode response: %w", err)
}

There is no io.LimitReader, no http.MaxBytesReader, no chunked-size check anywhere on the response path. The codebase already enforces a 10 MiB cap on inbound request bodies (MaxRequestBodySize in post_chat_handler.go), and the devshard transport uses http.MaxBytesReader (devshard/transport/server.go) — outbound peer responses on this path are simply an oversight.

Fix

  • Introduce MaxPayloadResponseSize int64 = 32 * 1024 * 1024 (32 MiB) as a package-level var so tests can lower the cap; production code must not mutate it. The value is sized to comfortably cover two raw payloads at the inbound MaxRequestBodySize (10 MiB each) with base64 overhead (~33%, since []byte fields are base64-encoded in JSON) and JSON wrapping — i.e. it derives from existing codebase limits, not a guess.
  • Replace json.NewDecoder(resp.Body).Decode(...) with io.ReadAll(io.LimitReader(resp.Body, MaxPayloadResponseSize+1)) followed by an explicit len(body) > MaxPayloadResponseSize check that returns "executor payload response exceeds maximum size of N bytes" before json.Unmarshal ever sees the buffer. The +1 byte distinguishes "exactly at the cap" from "over the cap" without silently truncating a body that happens to decode to valid-looking JSON.
  • Bound the non-2xx error path with io.LimitReader(resp.Body, maxPayloadErrorBodySize) where maxPayloadErrorBodySize = 4 * 1024 (constant; the body here is only used to format an error string).

The fix is local to FetchPayloadsHTTP. The 30 s client timeout is unchanged; it now operates alongside an explicit byte cap rather than as the only defense.

Why This Closes The Vulnerability

The exploit required exactly one condition: an honest validator reads an arbitrary number of bytes from an attacker-controlled HTTP peer before any size check. This PR removes that condition. After the change the validating slot allocates at most MaxPayloadResponseSize + 1 bytes per fetch (the io.LimitReader ceiling) and fails fast with a typed error message that the calling retry/abandon logic can treat the same way it treats any other fetch failure. A malicious executor can still serve a junk response that gets rejected, but it can no longer use that response to OOM the process. The chain-side validation surface and signing/verification flow are untouched; this is a narrowly scoped resource-control fix at the HTTP-client boundary.

Test plan

  • go test ./internal/validation/... in decentralized-api (Linux + cgo toolchain — the package transitively pulls in wasmvm and blst, so cross-compiling on Windows/no-cgo is not sufficient).
  • TestFetchPayloadsHTTP_RejectsOversizedResponse (new) — httptest server returns a 200 with a body larger than the test-shrunk MaxPayloadResponseSize; assert error contains "exceeds maximum size" and *PayloadResponse is nil. No json.Unmarshal runs.
  • TestFetchPayloadsHTTP_AcceptsResponseUnderLimit (new) — httptest server returns a valid PayloadResponse under the cap; assert decode succeeds and all four fields round-trip.
  • TestFetchPayloadsHTTP_BoundsErrorBody (new) — httptest server returns 502 with a 1 MiB body; assert error contains "executor returned status 502" and the error string length is bounded by maxPayloadErrorBodySize + 256 (proving the error body was capped, not the full 1 MiB).
  • Existing tests in payload_retrieval_test.go continue to pass (signature/URL/hash helpers are untouched).

ouicate added 2 commits May 30, 2026 15:02
FetchPayloadsHTTP decoded the executor response with
"json.NewDecoder(resp.Body).Decode(&payloadResp)" and no size cap. The
30s payloadRetrievalClient timeout bounds duration, not bytes, so a
malicious in-group executor can stream an arbitrarily large JSON body
to an honest validating slot and force it to OOM while allocating the
two []byte payloads. Each validating slot in the group can be hit
independently. The non-2xx error branch had the same problem via an
unbounded io.ReadAll on resp.Body.

Cap reads with io.LimitReader before json.Unmarshal:

- MaxPayloadResponseSize (32 MiB, package var so tests can lower it):
  covers two raw payloads at the inbound MaxRequestBodySize (10 MiB
  each) with base64 (~33%) and JSON overhead. ReadAll on
  io.LimitReader(resp.Body, MaxPayloadResponseSize+1) followed by an
  explicit "exceeds maximum size" length check before decode.
- maxPayloadErrorBodySize (4 KiB const): the non-2xx body is only
  used to format an error string; bound it with io.LimitReader.

Tests:
- TestFetchPayloadsHTTP_RejectsOversizedResponse: oversized 200 body
  returns "exceeds maximum size" and never decodes.
- TestFetchPayloadsHTTP_AcceptsResponseUnderLimit: well-formed
  under-cap body decodes normally (no regression).
- TestFetchPayloadsHTTP_BoundsErrorBody: 1 MiB error body produces
  an error string bounded by maxPayloadErrorBodySize + prefix.

@a-kuprin a-kuprin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny valid fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants