fix(api): cap executor payload response size in FetchPayloadsHTTP#1281
Open
ouicate wants to merge 2 commits into
Open
fix(api): cap executor payload response size in FetchPayloadsHTTP#1281ouicate wants to merge 2 commits into
ouicate wants to merge 2 commits into
Conversation
FetchPayloadsHTTP decoded the executor response with "json.NewDecoder(resp.Body).Decode(&payloadResp)" and no size cap. The 30s payloadRetrievalClient timeout bounds duration, not bytes, so a malicious in-group executor can stream an arbitrarily large JSON body to an honest validating slot and force it to OOM while allocating the two []byte payloads. Each validating slot in the group can be hit independently. The non-2xx error branch had the same problem via an unbounded io.ReadAll on resp.Body. Cap reads with io.LimitReader before json.Unmarshal: - MaxPayloadResponseSize (32 MiB, package var so tests can lower it): covers two raw payloads at the inbound MaxRequestBodySize (10 MiB each) with base64 (~33%) and JSON overhead. ReadAll on io.LimitReader(resp.Body, MaxPayloadResponseSize+1) followed by an explicit "exceeds maximum size" length check before decode. - maxPayloadErrorBodySize (4 KiB const): the non-2xx body is only used to format an error string; bound it with io.LimitReader. Tests: - TestFetchPayloadsHTTP_RejectsOversizedResponse: oversized 200 body returns "exceeds maximum size" and never decodes. - TestFetchPayloadsHTTP_AcceptsResponseUnderLimit: well-formed under-cap body decodes normally (no regression). - TestFetchPayloadsHTTP_BoundsErrorBody: 1 MiB error body produces an error string bounded by maxPayloadErrorBodySize + prefix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
FetchPayloadsHTTP(decentralized-api/internal/validation/payload_retrieval.go) decoded the HTTP body returned by an executor withjson.NewDecoder(resp.Body).Decode(&payloadResp)and no size cap. ThepayloadRetrievalClient30-second timeout bounds duration but not bytes, so a malicious in-group executor that streams a multi-gigabyte JSON body — pacing the bytes to fit inside 30 s — forces every validating slot that fetches from it to allocate the whole body and then materialize two large[]bytefields (PromptPayload,ResponsePayload) while decoding. The victims are the honest validators in the group, not an external attacker, so this is an insider liveness/OOM primitive — every slot is hit independently. The non-2xx error branch had the same defect via an unboundedio.ReadAll(resp.Body)that was then embedded into the returned error string.Root Cause
There is no
io.LimitReader, nohttp.MaxBytesReader, no chunked-size check anywhere on the response path. The codebase already enforces a 10 MiB cap on inbound request bodies (MaxRequestBodySizeinpost_chat_handler.go), and the devshard transport useshttp.MaxBytesReader(devshard/transport/server.go) — outbound peer responses on this path are simply an oversight.Fix
MaxPayloadResponseSize int64 = 32 * 1024 * 1024(32 MiB) as a package-levelvarso tests can lower the cap; production code must not mutate it. The value is sized to comfortably cover two raw payloads at the inboundMaxRequestBodySize(10 MiB each) with base64 overhead (~33%, since[]bytefields are base64-encoded in JSON) and JSON wrapping — i.e. it derives from existing codebase limits, not a guess.json.NewDecoder(resp.Body).Decode(...)withio.ReadAll(io.LimitReader(resp.Body, MaxPayloadResponseSize+1))followed by an explicitlen(body) > MaxPayloadResponseSizecheck that returns"executor payload response exceeds maximum size of N bytes"beforejson.Unmarshalever sees the buffer. The+1byte distinguishes "exactly at the cap" from "over the cap" without silently truncating a body that happens to decode to valid-looking JSON.io.LimitReader(resp.Body, maxPayloadErrorBodySize)wheremaxPayloadErrorBodySize = 4 * 1024(constant; the body here is only used to format an error string).The fix is local to
FetchPayloadsHTTP. The 30 s client timeout is unchanged; it now operates alongside an explicit byte cap rather than as the only defense.Why This Closes The Vulnerability
The exploit required exactly one condition: an honest validator reads an arbitrary number of bytes from an attacker-controlled HTTP peer before any size check. This PR removes that condition. After the change the validating slot allocates at most
MaxPayloadResponseSize + 1bytes per fetch (theio.LimitReaderceiling) and fails fast with a typed error message that the calling retry/abandon logic can treat the same way it treats any other fetch failure. A malicious executor can still serve a junk response that gets rejected, but it can no longer use that response to OOM the process. The chain-side validation surface and signing/verification flow are untouched; this is a narrowly scoped resource-control fix at the HTTP-client boundary.Test plan
go test ./internal/validation/...indecentralized-api(Linux + cgo toolchain — the package transitively pulls inwasmvmandblst, so cross-compiling on Windows/no-cgo is not sufficient).TestFetchPayloadsHTTP_RejectsOversizedResponse(new) —httptestserver returns a 200 with a body larger than the test-shrunkMaxPayloadResponseSize; assert error contains"exceeds maximum size"and*PayloadResponseis nil. Nojson.Unmarshalruns.TestFetchPayloadsHTTP_AcceptsResponseUnderLimit(new) —httptestserver returns a validPayloadResponseunder the cap; assert decode succeeds and all four fields round-trip.TestFetchPayloadsHTTP_BoundsErrorBody(new) —httptestserver returns502with a 1 MiB body; assert error contains"executor returned status 502"and the error string length is bounded bymaxPayloadErrorBodySize + 256(proving the error body was capped, not the full 1 MiB).payload_retrieval_test.gocontinue to pass (signature/URL/hash helpers are untouched).