Json: Fixes unbounded recursion in binary value-length decoder to prevent DoS#5909
Json: Fixes unbounded recursion in binary value-length decoder to prevent DoS#5909tvaron3 wants to merge 6 commits into
Conversation
JsonBinaryEncoding.ValueLengths.GetValueLength recursively decoded Arr1 (0xE1) and Obj1 (0xE9) type markers with no depth check. A crafted binary payload (first byte 0x80) of ~78 KB nesting Arr1 or Obj1 markers could exhaust the CLR stack and crash the host with an unrecoverable StackOverflowException. Changes: - Add a depth counter to the internal GetValueLength overload and throw JsonMaxNestingExceededException when depth reaches JsonObjectState.JsonMaxNestingDepth (256), matching the streaming reader's existing nesting policy. - Have JsonObjectState.Push also throw JsonMaxNestingExceededException (was InvalidOperationException) so both depth-cap paths can be caught with a single internal exception type. - Widen JsonBinaryEncoding.TryGetValueLength's catch from JsonParseException to Exception so malformed payloads honor the Try-pattern contract (returns false) instead of leaking IndexOutOfRangeException / ArgumentOutOfRangeException / etc. - Document the new exception on JsonNavigator.Create and both JsonReader.Create overloads. - Add regression tests that build deeply nested Arr1 / Obj1 payloads and verify the depth guard fires through JsonBinaryEncoding, JsonNavigator.Create and JsonReader.Create. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
# Conflicts: # Microsoft.Azure.Cosmos/src/Json/JsonReader.cs # changelog.md
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Test-coverage follow-upsGoing back through this with the test-coverage lens — fix is correct, security-clean, and the binary 🟡 Major —
|
…of decoder depth guard
Layers additional defenses on top of the GetValueLength depth cap landed
earlier in this PR, addressing review feedback:
- JsonBinaryWriter.ForceRewriteRawJsonValue: new RewriteResolvedReferenceString
helper bounds-checks the resolved target offset (unsigned compare so a
negative StrR4 offset is rejected) and asserts the target byte is a real
own FixReferenceStringOffsets invariant and rejects malformed
reference-to-reference chains, cycles, and non-string targets at hop 1
with JsonInvalidTokenException. Plus EnsureSufficientExecutionStack as a
layered backstop at the top of ForceRewriteRawJsonValue.
- Recursive walkers on materialized CosmosElement graphs now call
RuntimeHelpers.EnsureSufficientExecutionStack so deeply-nested
length-prefixed payloads (ArrL1/L2/L4, ObjL1/L2/L4 -- not covered by the
decoder depth guard) raise a catchable InsufficientExecutionStackException
instead of an unrecoverable StackOverflowException:
* DistinctHash.CosmosElementHasher.Visit(CosmosArray|CosmosObject)
* CosmosArray.Equals/GetHashCode, CosmosObject.Equals/GetHashCode
* CosmosElementToQueryLiteral.Visit(CosmosArray|CosmosObject)
* JsonSerializer.DeserializationVisitor.Visit(CosmosArray|CosmosObject)
* CosmosElementToSqlScalarExpressionVisitor.Visit(CosmosArray|CosmosObject)
Tests: 9 new StrR coverage cases in JsonBinaryReferenceStringTests, plus
DeeplyNested* regression tests for each walker that exercises the
end-to-end attack chain (hand-crafted ArrL4 binary payload --> deserializer
--> guard fires) on an explicitly small-stack (256 KB) thread for
cross-host determinism.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…json-recursion # Conflicts: # changelog.md
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Adds 7 tests exercising the EnsureSufficientExecutionStack guards on: - DistinctHash.Visit(CosmosObject) - CosmosArray.GetHashCode / CosmosObject.GetHashCode - CosmosArray.Equals / CosmosObject.Equals - CosmosElementToSqlScalarExpressionVisitor.Visit (array + object) All run on a small-stack worker thread so the budget guard triggers without depending on host stack size. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run dotnet-v3-ci |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
ananth7592
left a comment
There was a problem hiding this comment.
Overall,LGTM, Tomas — well-crafted, easy to verify. One small note on the changelog wording below.
One small thing on the changelog wording. The current entry reads as if every new exception type replaces a process-fatal StackOverflowException — true for the new decoder/walker/StrR guards, but slightly off for the JsonObjectState.Push change. That path was already catchable today as InvalidOperationException; it now throws JsonMaxNestingExceededException. Realistic blast radius is essentially zero (almost nobody writes 256+ nested levels), but a customer who only reads "DoS fix… instead of terminating the host" won't realize their existing catch (InvalidOperationException) on a writer call site just stopped catching. Would something like work?:
Note: the existing 256-deep writer nesting check (JsonObjectState.Push) now throws JsonMaxNestingExceededException instead of InvalidOperationException, so a single catch covers both reader and writer paths.
POSITIVES
Commenting throughout is excellent. The RewriteResolvedReferenceString XML doc, in particular, does real work — it spells out the writer invariant, the attack shape, and explains why the type check is strictly cheaper than a depth counter. The TryGetValueLength comment enumerating the five expected exception types and explicitly justifying the [AggressiveInlining] removal is the kind of "future-me thank-you" that prevents the inevitable review nit next year. And the BuildObj1ChainPayload helper's "do NOT reuse this for shallow Obj1 tests" warning is the kind of small detail that prevents the next person from misusing it.
Test coverage is impeccable. A few specifics worth calling out:
Boundary tests at exactly the cap and cap−1 — the contract is precise, not vague.
The dedicated maxStackSize: 256 * 1024 thread pattern, with the Linux-main vs ThreadPool vs macOS-pthread comment, makes the walker tests deterministic across hosts. Would have been easy to ship a flaky CI-only version instead.
The end-to-end DeeplyNestedDeserializationVisitorThrowsCatchableException going through JsonSerializer.Monadic.Deserialize proves the actual MitM attack chain is blocked, not just the unit-level guard.
Exception-type pinning (rather than just "any exception") means a future regression that surfaces a different type will fail loudly.
Per Ananth's review feedback, calls out that the existing 256-deep writer nesting check now throws JsonMaxNestingExceededException instead of InvalidOperationException so customers can use a single catch across reader + writer paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Description
Hardens the Microsoft.Azure.Cosmos binary JSON code paths against malformed / hostile payloads that could crash the host process with an unrecoverable
StackOverflowException. The originally reported attack isJsonBinaryEncoding.ValueLengths.GetValueLengthrecursively decodingArr1(0xE1) andObj1(0xE9) type markers with no depth check — a ~78 KB payload starting with0x80was enough to abort the process with SIGABRT.Note: this requires the attacker to control the response bytes (MitM, rogue / hijacked endpoint, misconfigured emulator URL), so it only matters after auth. Severity is correspondingly low, but the fix is straightforward and brings the binary decoder in line with the streaming reader's existing 256-level nesting policy.
While auditing the rest of the binary path for the same shape of bug, a few related gaps were found and fixed in the same PR:
JsonBinaryWriter.ForceRewriteRawJsonValuerecursively re-serializes nested arrays / objects with no stack-budget check, and follows resolved-reference-string (StrR1–StrR4) offsets read straight from the input buffer without validating that the target is in range or is itself a plain string. A crafted payload could either stack-overflow the writer or cause an out-of-range / cycle traversal.CosmosElementwalker stack-budget guards (defense-in-depth). The decoder cap is 256 nested levels, which is the right cap to let legitimate payloads through. But on a small-stack worker thread (ASP.NET request thread, Azure Functions, container default ~512KB, etc.), a tree the decoder correctly lets through at depth 256 can still exhaust the stack inside a downstream recursive walker, which is also an unrecoverable crash.RuntimeHelpers.EnsureSufficientExecutionStack()is budget-based rather than depth-based, so adding it at the top of each recursiveVisit/Equals/GetHashCodeconverts the crash into a catchableInsufficientExecutionStackExceptionwithout imposing a hard depth limit that would break customers with legitimately deep data.Of the walkers covered:
DeserializationVisitor(e.g.ReadItemAsync<T>/ query-result materialization) is the only one with a realistic standalone MitM attack chain — it walks attacker-influenced bytes on the caller's thread. The rest (CosmosElementToQueryLiteral,CosmosElementToSqlScalarExpressionVisitor,DistinctHash,CosmosArray/CosmosObjectEquals/GetHashCode) are defense-in-depth: they're cheap (a handful of nanoseconds per call) and they compose — the walkers call into each other, so guarding only the entry point would leave gaps as the code evolves.Type of change
Changes
Decoder (original report)
JsonBinaryEncoding.ValueLengths.cs: add adepthparameter to the privateGetValueLengthoverload and throwJsonMaxNestingExceededExceptionwhendepth >= JsonObjectState.JsonMaxNestingDepth(256). The public single-arg entry point delegates withdepth: 0; theArr1/Obj1recursive calls passdepth + 1.JsonObjectState.cs: promoteJsonMaxNestingDepthfromprivate consttointernal constand havePush()throwJsonMaxNestingExceededExceptioninstead ofInvalidOperationException, so both depth-cap paths share one internal exception type.JsonMaxNestingExceededExceptionalready derives fromJsonParseException, so external callers still see the normal SDK exception hierarchy.JsonBinaryEncoding.cs::TryGetValueLength: widen the catch fromJsonParseExceptiontoExceptionso malformed payloads honor the Try-pattern contract by returningfalseinstead of leakingIndexOutOfRangeException/ArgumentOutOfRangeException/ etc.GetValueLengthis a pure function over aReadOnlySpan<byte>with no side effects, so swallowing is safe. (Also removed the now-misleading[MethodImpl(AggressiveInlining)]— methods with try/catch can't be inlined by the JIT.)JsonNavigator.cs/JsonReader.cs: document the new exception on the publicCreateoverloads via<exception cref="JsonMaxNestingExceededException">.Binary writer / StrR reference-string hardening
JsonWriter.JsonBinaryWriter.cs::ForceRewriteRawJsonValue: callRuntimeHelpers.EnsureSufficientExecutionStack()at the top so deep nested arrays / objects throw a catchableInsufficientExecutionStackExceptioninstead of crashing the process. (EnsureSufficientExecutionStackis budget-based; the decoder-sideJsonMaxNestingExceededExceptioncap at 256 remains the primary depth guard.)RewriteResolvedReferenceStringreplaces the four open-coded StrR1 / StrR2 / StrR3 / StrR4 call sites. It does an unsigned in-range bounds check on the resolved offset and rejects any target whose type marker is notIsString && !IsReferenceString— matching the writer's own invariant (StrR redirects to a plain string, never to another StrR). This prevents both out-of-range follows and StrR→StrR cycles in attacker-crafted payloads.CosmosElement walker stack-budget guards
RuntimeHelpers.EnsureSufficientExecutionStack()at the top of each recursiveVisit/Equals/GetHashCodeso a deepCosmosArray/CosmosObjecttree converts to a catchable exception instead of a process abort on small-stack worker threads:JsonSerializer.cs::DeserializationVisitor.Visit(CosmosArray, Type)andVisit(CosmosObject, Type)— realistic MitM chain viaReadItemAsync<T>/ query-result deserialization.CosmosElementToQueryLiteral.cs— bothVisitoverloads (continuation tokens, diagnostic log strings). Defense-in-depth.CosmosElementToSqlScalarExpressionVisitor.cs— bothVisitoverloads (LINQ → SQL translation of in-memory constants). Defense-in-depth.DistinctHash.cs::Visit(CosmosArray)andVisit(CosmosObject)— DISTINCT query hashing over result items. Defense-in-depth.CosmosArray.csandCosmosObject.cs—EqualsandGetHashCode. Indirect; called byDistinctHash, partition-key equality, hash-based caches.Test coverage
Microsoft.Azure.Cosmos.TestsFullyQualifiedName~Jsonfilter: 424 passed, 0 failed, 2 skipped locally after merge withupstream/main.JsonBinaryEncodingValueLengthsTests.cs:Arr1/Obj1/ArrL4payloads throwJsonMaxNestingExceededException/InsufficientExecutionStackException(instead of stack-overflowing) — exercised throughJsonBinaryEncoding.GetValueLength,JsonNavigator.Create,JsonReader.Create,CosmosArray.GetHashCode,CosmosElementToQueryLiteral, andDeserializationVisitor.JsonMaxNestingDepththrows; payload atJsonMaxNestingDepth - 1still decodes (boundary).TryGetValueLengthreturnsfalsefor deep / empty / truncated payloads.CosmosElementToQueryLiteraldeep-nesting case.maxStackSize: 256 * 1024thread so the budget-based guard is deterministic regardless of host main-thread stack size.JsonBinaryReferenceStringTests.cs(9 tests): each StrR1–StrR4 variant with bad offset, target-is-StrR, and target-out-of-bounds — all surface as catchable parse exceptions.JsonWriterTests.cs: push tests at the nesting cap to confirm the writer still accepts depth-256 nesting.