Skip to content

ReadMany: Adds point-read fast path for single-tuple partitions#5905

Open
ananth7592 wants to merge 8 commits into
mainfrom
users/amudumba/readmany-single-physical-partition
Open

ReadMany: Adds point-read fast path for single-tuple partitions#5905
ananth7592 wants to merge 8 commits into
mainfrom
users/amudumba/readmany-single-physical-partition

Conversation

@ananth7592
Copy link
Copy Markdown
Member

Summary

Adds the per-physical-partition point-read fast path to ReadManyItems[Stream]Async. When a request resolves to a physical partition that contains exactly one (id, partitionKey) tuple, the SDK now issues a point read for that tuple instead of a parameterized SELECT * FROM c WHERE c.id IN (...) (or OR-of-AND for non-/id PK) query. The decision is made independently per PartitionKeyRange, so a single ReadMany call may dispatch a mix of point reads and queries.

Closes #4369.

Why

Point reads on small (<32 KB) documents are ~1 RU; the query path pays a fixed cover charge (~2.82 RU minimum) on top of the per-item cost. For workloads that read a few items spread across many partitions (a common cache-fill / fan-out shape), this is a >2x RU and latency win.

Cross-SDK references

SDK File Method Status
Java RxDocumentClientImpl.java pointReadsForReadMany / queryForReadMany / getRangeQueryMap Already shipped via Azure/azure-sdk-for-java#31723; 404 swallowing tuned by PRs #34966 and #35513.
Python _read_items_helper.py _execute_query_chunk_worker (if len(items_for_query) == 1:) Shipped.
.NET ReadManyQueryHelper.cs ReadManyTaskHelperAsync Previously missing — addressed in this PR.

What changed

  • Microsoft.Azure.Cosmos/src/ReadManyQueryHelper.cs
    • ReadManyTaskHelperAsync: branch on entry.Value.Count == 1 to dispatch a point read instead of building a QueryDefinition.
    • New GeneratePointReadResponseForPartitionAsync: issues ContainerCore.ReadItemStreamAsync, swallows 404/0 as "item not present" (preserving the RU charge in the aggregate), short-circuits other failures via QueryResponse.CreateFailure, and wraps successes as a synthetic single-element QueryResponse so the existing CombineStreamsFromQueryResponses / CombineFeedResponseFromQueryResponses<T> aggregation works unchanged.
    • Tracing: new child trace scope "Point read for a partitionkeyrange" (Query / Info), paralleling the existing "Execute query for a partitionkeyrange".
  • Microsoft.Azure.Cosmos/src/ReadManyRequestOptions.cs
    • New internal ItemRequestOptions ConvertToItemRequestOptions() mirroring the existing ConvertToQueryRequestOptions(). Maps ConsistencyLevel, ReadConsistencyStrategy, SessionToken, IfMatchEtag, IfNoneMatchEtag, Properties, AddRequestHeaders, ExcludeRegions.

Tests added

In Microsoft.Azure.Cosmos.EmulatorTests/CosmosReadManyItemsTests.cs:

  • ReadMany_AllSingleTuplePartitions_UsesPointReadsOnWire — uses a counting RequestHandler to assert zero queries and at least N point reads are issued when every PK is distinct.
  • ReadMany_MixedPartitionLayout_UsesBothPaths — asserts both branches execute and the merged result is correct when a request mixes single- and multi-tuple partitions.
  • ReadMany_SingleTuplePartitionMissingItem_SilentlyOmitted — asserts a 404 on a point-read branch is treated as missing-item (matches Java pointReadsForReadMany + Java PRs #34966/#35513) and does not fail the overall call.

Verification

  • dotnet build Microsoft.Azure.Cosmos/src/Microsoft.Azure.Cosmos.csproj -c Release — 0 warnings, 0 errors.
  • dotnet build Microsoft.Azure.Cosmos/tests/Microsoft.Azure.Cosmos.Tests/... — 0 warnings, 0 errors.
  • dotnet build Microsoft.Azure.Cosmos/tests/Microsoft.Azure.Cosmos.EmulatorTests/... — 0 warnings, 0 errors.
  • Unit-test run: 2781 passed / 16 pre-existing failures (all BinaryEncodingOverTheWireTests socket errors from external dataset downloads + one unrelated parsing-baseline test). None ReadMany-related.
  • Emulator tests not run locally (emulator unavailable on the build agent); will run in PR CI.

Out of scope

  • Doc-size-based dispatch heuristic (Java/Python don't do this either).
  • Public API changes — no new methods or option flags.
  • Python's additional "all tuples share one logical PK -> pk = @p AND c.id IN (...)" optimization (Java lacks it too — tracking separately if needed).
  • Trace baseline XML for EndToEndTraceWriterBaselineTests.ReadManyAsync — a one-time regeneration (under review) may be needed to capture the new child trace nodes; flagged for follow-up commit if CI surfaces a diff.

Risk / behavior change

Customer-observable: RU and latency profile improves on the sparse-multi-partition workload. The only externally-visible semantic change is on a path that previously returned a query failure: a 404 on a missing single-tuple item is now treated as "item not present" (consistent with Java and Python; previously this scenario already returned an empty result set via the query path because SELECT * WHERE c.id IN ('missing') returns 0 documents -- so end-to-end behavior is unchanged in practice).


Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

ananth7592 and others added 4 commits June 1, 2026 10:04
When ReadManyItems[Stream]Async resolves a request to a physical partition
that contains only one (id, partitionKey) tuple, the SDK now issues a point
read for that tuple instead of a parameterized query. Point reads on small
documents cost ~1 RU; the existing query path pays a fixed cover charge
(~2.82 RU minimum) on top of the per-item cost, so this materially reduces
RU and latency for sparse multi-partition lookups.

Partitions that still contain more than one requested tuple continue to use
the existing batched `SELECT * FROM c WHERE c.id IN (...)` / `OR`-of-`AND`
query path. The decision is made independently per PartitionKeyRange.

Mirrors the existing Java SDK (pointReadsForReadMany in
RxDocumentClientImpl, Azure/azure-sdk-for-java#31723) and Python SDK
(_execute_query_chunk_worker in _read_items_helper.py) behavior. 404/0
on the point-read branch is silently swallowed -- the RU charge still
aggregates, the missing item simply does not contribute to the result set
(matches azure-sdk-for-java PRs 34966 and 35513).

Tests added (Microsoft.Azure.Cosmos.EmulatorTests):
- ReadMany_AllSingleTuplePartitions_UsesPointReadsOnWire: asserts no
  queries are issued when every partition has a single tuple.
- ReadMany_MixedPartitionLayout_UsesBothPaths: asserts both branches
  run when a request mixes single- and multi-tuple partitions, and the
  merged result is correct.
- ReadMany_SingleTuplePartitionMissingItem_SilentlyOmitted: asserts that
  a 404 on a point-read branch is treated as `item not present` rather
  than surfaced as a failure.

Closes #4369

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
 IfMatchEtag / IfNoneMatchEtag is silently ignored by the query path but
 strictly honored by the new single-tuple point-read path, so the same
 ReadMany call could succeed or throw CosmosException(304/412) based
 purely on hash bucketing. Skip the fast path when either ETag header is
 set so dispatch stays invariant to partition geometry. Java sidesteps the
 issue at the API boundary by not exposing ETag at all. Adds three
 deterministic emulator tests covering both bypass directions and a no-
 options sanity guard.
Single-physical-partition ReadMany calls now take the point-read fast
path, which produces a shorter trace tree (no GetCollectionCache /
Routing-Component nodes; DiagnosticsHandler appears in their place).

Refreshes EndToEndTraceWriterBaselineTests.ReadManyAsync.xml so the
trace-baseline assertion matches the new fast-path output. No
production behavior change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…line

The ReadManyAsync trace baseline test built its itemList with one
(id, partitionKey) tuple per distinct partition key. Whether each tuple
ended up alone in its physical-partition bucket depended on how those PK
strings hashed against the live container layout, so individual sub-reads
could non-deterministically take the point-read fast path or the multi-id
query path across runs - producing different trace trees and an unstable
baseline.

Pairs two ids with each partition key in the test so every bucket holds
at least two tuples. ReadManyQueryHelper.ReadManyTaskHelperAsync only
picks the fast path when entry.Value.Count == 1, so this routes every
sub-read through the query path and keeps the trace shape stable. Adds
an inline comment pointing at the helper for future readers.

No production behavior change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ananth7592 ananth7592 force-pushed the users/amudumba/readmany-single-physical-partition branch from f359423 to 40b271e Compare June 1, 2026 17:04
ananth7592 and others added 2 commits June 2, 2026 18:21
…d failure path

Addresses PR #5905 review:

  Source change (ReadManyQueryHelper.cs, GeneratePointReadResponseForPartitionAsync):
  - The synthetic CosmosQueryResponseMessageHeaders constructed on the point-read
    branch was hardcoding SubStatusCode = Unknown for every failure. That lost the
    diagnostic signal on the stream overload (whose combined ResponseMessage carries
    the failed branch's Headers verbatim) and on the typed overload's thrown
    exception path when the underlying ReadItemStreamAsync handed back a bare
    ResponseMessage with no CosmosException attached.
  - Replaced the hardcoded Unknown with pointReadResponse.Headers?.SubStatusCode
    so the synthetic header mirrors the real wire value.
  - Added an inline comment noting that the 404-swallow guard immediately below
    reads from pointReadResponse.Headers directly (not the synthetic header) and
    therefore stays anchored to the wire SubStatusCode regardless of how the
    synthetic header is shaped.

  Tests (CosmosReadManyItemsTests.cs):
  - ReadMany_SingleTuplePartition_404WithRealSubStatus_SurfacesAsFailure: negative
    test that a 404 with a non-Unknown SubStatusCode (1002 / ReadSessionNotAvailable)
    is NOT swallowed by the missing-item guard and surfaces as a CosmosException
    with the right SubStatusCode.
  - ReadMany_SingleTuplePartition_NonSuccess_PreservesRealSubStatusInSyntheticHeader:
    stream-overload counterpart, verifies the synthetic-header mirroring fix
    directly via response.Headers.SubStatusCode after a 400 failure.
  - ReadMany_StreamOverload_FastPathEngages_ReturnsAllItems: behavioral coverage
    for the stream overload's fast-path output -- asserts ReadCount, QueryCount,
    aggregated RU, and that the combined body's Documents array contains one
    element per point-read.
  - ReadMany_MixedPartitionLayout_FastPathFailure_QueryPath200_SurfacesFastPathError:
    partial-failure mix -- 4 tuples on one PP succeed via the query path, 1 tuple
    on another PP fails via the fast path (handler injects 400 with a distinctive
    SubStatusCode 1007). Asserts both paths dispatch (ReadCount==1, QueryCount==1)
    and that the fast-path failure surfaces as the thrown CosmosException.
  - New FailingReadStatusCodeHandler helper shared by the three failure tests.

  Coverage: 21 pre-existing CosmosReadManyItemsTests + 4 new tests = 25/25 pass.
  Trace baseline test (EndToEndTraceWriterBaselineTests.ReadManyAsync) also passes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

using (pointReadResponse)
{
string containerRid = await this.container.GetCachedRIDAsync(forceRefresh: false, trace, cancellationToken);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT - you do not need the Container RID from Container metadata cache - each document response has _rid - which for a document conatins the Container Rid in the Uri

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1d5e515. GeneratePointReadResponseForPartitionAsync now parses the response body once and derives the parent container RID from the document's _rid field via a small TryGetContainerRidFromDocument helper, mirroring the canonical pattern used by CollectionCache and NetworkAttachedDocumentContainer. The GetCachedRIDAsync round-trip is gone.

}
}

private static async Task<CosmosElement> ReadStreamAsCosmosElementAsync(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be internal and thoroughly unit tested with various stream types and edge cases (different positions in stream, allow vs. disallow getBuffer etc.)

The .Net Stream API is very powerful but full of subtle gotchas - so, testing the heck out of this method with unit tests will help codifying expectations/contracts.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 1d5e515.

  • Visibility flipped to internal static.
  • New Microsoft.Azure.Cosmos.Tests/ReadManyQueryHelperUnitTests.cs adds 31 invocations that pin the contract end-to-end:
    • Null / empty inputs: null stream short-circuits; MemoryStream with Length == 0 returns null; non-MemoryStream that yields zero bytes returns null.
    • MemoryStream variants: publicly-visible (default ctor + WriteTryGetBuffer fast path) and private-buffer (new MemoryStream(byte[])ToArray() fallback) — each sanity-asserted via TryGetBuffer first so the test fixture itself can't drift.
    • Position semantics, explicitly asymmetric: MemoryStream at Position == Length still returns full content (TryGetBuffer/ToArray ignore position); non-MemoryStream at non-zero position reads from current position to EOF (CopyToAsync contract). Both pinned as separate tests so a future "unify the branches" refactor can't silently regress either side.
    • Chunked-read reassembly: parametric across 1-byte, 8-byte, and single-shot chunk sizes via a custom ChunkedReadOnlyStream : Stream fixture.
    • Cancellation: pre-cancelled token throws TaskCanceledException on the non-MemoryStream branch (CopyToAsync honors the token) but does not throw on the MemoryStream branch (no async ops, token never consulted) — captured as two tests so any future "honor cancellation everywhere" change is forced to be deliberate.
    • JSON shape parametric: object / array / number / string / bool(true&false) / null — each asserted to surface as the right Cosmos* element subtype.

Teeth-checked: sabotaging the helper (skipping the Length == 0 short-circuit + truncating the buffer) trips 17 of the stream tests and the canonical valid-RID test as well — so the suite is sensitive to the contract, not just to surface presence.

Copy link
Copy Markdown
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM exceptr for the missing unit tests on the stream processing

…m-to-CosmosElement helper

Addresses Fabian's two PR #5905 review comments:

1. NIT on GetCachedRIDAsync: drop the extra Container metadata cache round-trip
   on the point-read fast path. Every document response already carries the
   parent container RID hierarchically encoded in its _rid field, so we parse
   the body once (which we have to do anyway to feed the synthetic
   QueryResponse) and extract the container RID via a small helper that
   mirrors the canonical pattern used by CollectionCache and
   NetworkAttachedDocumentContainer.

2. ReadStreamAsCosmosElementAsync visibility/testing: flip from private static
   to internal static and pin its behavior with a focused unit-test suite.
   The .NET Stream API has subtle gotchas around TryGetBuffer success vs
   fallback, current-position semantics, and cancellation propagation, all of
   which now have regression coverage.

New helpers on ReadManyQueryHelper:
- BuildSyntheticQueryResponseHeaders(pointReadResponse, containerRid) wraps
  the per-branch CosmosQueryResponseMessageHeaders construction so the 404,
  failure, and success branches share one consistent shape (and the success
  branch can pass the freshly-derived containerRid through).
- TryGetContainerRidFromDocument(element) returns the parent container RID
  on a well-formed document or null when the element is missing/malformed/
  not an object; null is acceptable on the synthetic header.

New test file (Microsoft.Azure.Cosmos.Tests/ReadManyQueryHelperUnitTests.cs,
31 invocations) pins:
- ReadStreamAsCosmosElementAsync: null/empty short-circuits, publicly-visible
  MemoryStream fast path, byte[]-ctor MemoryStream ToArray fallback, position
  semantics (ignored for MemoryStream, honored for non-MemoryStream),
  chunked-read reassembly, pre-cancelled-token behavior on both branches
  (throws on non-MemoryStream, no-ops on MemoryStream), and parametric
  coverage of object/array/number/string/bool/null JSON shapes.
- TryGetContainerRidFromDocument: null/non-object/missing-rid/non-string-rid/
  null-rid/malformed-rid/empty-rid all yield null; valid document RID yields
  the same string as ResourceId.Parse(...).DocumentCollectionId.ToString().

Verification:
- Build clean (0 warnings, 0 errors).
- All 31 new unit-test invocations pass.
- Teeth-check: sabotaging either helper trips 18/31 of the new tests, including
  the canonical valid-RID case and every JSON-shape parse case.
- Broader local sweep (ReadMany|CosmosBufferedStreamWrapperTests|Routing|
  PartitionKeyRangeCache|Query): 423/429 pass; the 1 failure is the pre-existing
  baseline TestTextDistributionPlanParsingFromStream mismatch unrelated to this
  change (confirmed via git stash on prior commit).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…mosElementAsync

Now that ReadStreamAsCosmosElementAsync is internal and reachable from other
call sites, the pre-existing pattern of wrapping the input in a using block
("stream as MemoryStream ?? new MemoryStream()") is a contract violation:
when the caller passes its own MemoryStream the helper would dispose it on
return. On the point-read fast path this was benign because the outer
"using (pointReadResponse)" already disposes ResponseMessage.Content, but
the hazard is real for any other future caller.

Split the two branches so disposal ownership is explicit:
- MemoryStream input -> read its buffer directly, do NOT dispose (caller owns).
- Non-MemoryStream input -> allocate an owned local MemoryStream, copy into
  it, and dispose only that local. Caller source stream is never disposed.

Shared buffer-extraction logic factored into a small ReadFromMemoryStreamBuffer
helper so both branches share the empty-stream short-circuit and the
TryGetBuffer / ToArray fallback.

Two new unit tests pin the disposal contract:
- _DoesNotDisposeCallerOwnedMemoryStream asserts CanRead and re-reads a byte
  after the helper returns; teeth-checked by restoring the old hazard in src
  and confirming the test fails (CanRead == false on the disposed MemoryStream).
- _DoesNotDisposeCallerOwnedNonMemoryStream pins the symmetric contract for
  non-MemoryStream input so a future using-regression on the source stream
  is caught immediately.

All 33 unit tests in ReadManyQueryHelperUnitTests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ReadMany: Dynamically choose between point read and query

2 participants