Describe the bug
Cross-partition queries in direct mode fails with ServiceUnavailable (503) on subsequent (>1) pages of a FeedIterator.
To Reproduce
Run a FeedIterator paged query in direct mode without setting ResponseContinuationTokenLimitInKb
var query = new QueryDefinition("SELECT * FROM c WHERE c.family = @family")
.WithParameter("@family", "simple-objects");
using var iterator = container.GetItemQueryIterator<Item>(
query,
requestOptions: new QueryRequestOptions
{
// ResponseContinuationTokenLimitInKb = 10 // <-- omit, or set > 10, to reproduce
});
while (iterator.HasMoreResults)
{
var page = await iterator.ReadNextAsync(); // throws on 2nd page
}
I was able to reproduce it only by adding a WHERE clause to the request query: SELECT * FROM c would successfully retrieve all pages, SELECT * FROM c WHERE c.family = @family on the other hand would fail with cryptic exception when fetching second page. Reproduced across different machines, OS vendors and ISPs.
Only reproducible against a large container (millions of docs.); smaller containers with the same schema and a subset of data do not exhibit the issue.
Expected behavior
All pages are fetched without exceptions
Actual behavior
Microsoft.Azure.Cosmos.CosmosException: Response status code does not indicate success: ServiceUnavailable (503); Substatus: 20001; ActivityId: c4776d10-baa3-4e72-b460-9154c51d1fc1;
Reason: (The request failed because the client was unable to establish connections to 4 endpoints across 1 regions.
Please check for client resource starvation issues and verify connectivity between client and server.
More info: https://aka.ms/cosmosdb-tsg-service-unavailable ...);
---> GoneException: The requested resource is no longer available at the server.
---> TransportException: A client transport error occurred: The connection failed. ... error code: ConnectionBroken [0x0012] ... payload sent: True
---> TransportException: The remote system closed the connection. ... error code: ReceiveStreamClosed [0x0011]
Environment summary
SDK Version: Microsoft.Azure.Cosmos v3.35.4, v3.60.0
Ubuntu 24.04 (host), Windows 11
Additional context
Every direct (RNTBD) call to the partition fails identically; every gateway call (HTTPS:443) succeeds. payload sent: True on every failure.
Cosmos Diagnostics summary:
DirectCalls: { "(410, 20001)": 80 }
GatewayCalls: { "(200, 0)": 26 }
duration: 31189 ms
System Info: CPU 7-14%, isThreadStarving: False, availableThreads: 32763+, openTcpConnections: 3-4
ContactedReplicas: []
FailedReplicas: [all 4 replicas of the partition]
Describe the bug
Cross-partition queries in direct mode fails with
ServiceUnavailable (503)on subsequent (>1) pages of aFeedIterator.To Reproduce
Run a FeedIterator paged query in direct mode without setting
ResponseContinuationTokenLimitInKbI was able to reproduce it only by adding a
WHEREclause to the request query:SELECT * FROM cwould successfully retrieve all pages,SELECT * FROM c WHERE c.family = @familyon the other hand would fail with cryptic exception when fetching second page. Reproduced across different machines, OS vendors and ISPs.Only reproducible against a large container (millions of docs.); smaller containers with the same schema and a subset of data do not exhibit the issue.
Expected behavior
All pages are fetched without exceptions
Actual behavior
Environment summary
SDK Version: Microsoft.Azure.Cosmos v3.35.4, v3.60.0
Ubuntu 24.04 (host), Windows 11
Additional context
Every direct (RNTBD) call to the partition fails identically; every gateway call (HTTPS:443) succeeds.
payload sent: Trueon every failure.Cosmos Diagnostics summary: