[@azure/cosmos] Reuse shared partition key range cache for cross-partition queries#39144
Open
amanrao23 wants to merge 1 commit into
Open
[@azure/cosmos] Reuse shared partition key range cache for cross-partition queries#39144amanrao23 wants to merge 1 commit into
amanrao23 wants to merge 1 commit into
Conversation
Cross-partition queries re-fetched /pkranges on every query because SmartRoutingMapProvider built its own cache and was recreated per query. Use the shared ClientContext cache (hybrid queries previously fetched per component query). Make the cache failure-safe: dedupe concurrent fetches, evict on failure so transient errors don't poison later lookups, and keep the last known-good map until a forceRefresh succeeds. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR eliminates a redundant GET /pkranges metadata round-trip on every cross-partition query in @azure/cosmos. Previously SmartRoutingMapProvider constructed its own PartitionKeyRangeCache and was recreated per query, so every parallel/ORDER BY/hybrid query started with a cold cache. The provider now reuses the long-lived clientContext.partitionKeyRangeCache that reads, bulk, and change feed already share. The shared cache is also hardened to dedupe concurrent fetches and to avoid cache poisoning on transient failures.
Changes:
SmartRoutingMapProviderreusesclientContext.partitionKeyRangeCacheand threads a newforceRefreshparameter throughgetOverlappingRanges; split recovery inparallelQueryExecutionContextBasenow callsforceRefresh = trueinstead of allocating a fresh provider.PartitionKeyRangeCacheseparates known-good maps from in-flight fetches: concurrent lookups (cold or forced) dedupe to one request, the map is published only on success, and a failed refresh keeps serving the last known-good map.- Adds unit tests for dedupe/eviction/forced-refresh behavior and a functional test asserting
/pkrangesis fetched once across repeated cross-partition queries; updatesMockedClientContextto expose the shared cache; adds a changelog entry.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/routing/partitionKeyRangeCache.ts |
Adds pendingByCollectionId dedup map; publishes resolved maps only on success and clears pending in finally for failure-safety. |
src/routing/smartRoutingMapProvider.ts |
Switches to import type, reuses the shared client cache, and forwards a new forceRefresh flag. |
src/queryExecutionContext/parallelQueryExecutionContextBase.ts |
Split recovery reuses the existing provider with forceRefresh = true instead of constructing a new provider. |
test/internal/unit/partitionKeyRangeCache.spec.ts |
New unit tests: dedupe, cache hit, evict-on-failure, keep-prior-on-failed-refresh, concurrent forceRefresh dedupe. |
test/public/functional/partitionKeyRangeCacheReuse.spec.ts |
New functional test counting /pkranges requests to verify cache reuse across queries. |
test/public/common/MockClientContext.ts |
Mock now exposes a real partitionKeyRangeCache so provider/cache tests share one instance. |
CHANGELOG.md |
Documents the redundant-fetch fix and failure-safety improvement under Bugs Fixed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Every cross-partition query issues a redundant
GET /pkrangesmetadata call instead of reusing the cache.CosmosClientkeeps a long-lived sharedPartitionKeyRangeCacheonClientContext(used by reads, bulk, change feed), butSmartRoutingMapProvider— the helper that resolves overlapping ranges for parallel/ORDER BY queries — constructed its own cache, and a new provider is created per query. Each provider started cold and re-fetched. Hybrid queries are worst-hit: the global-statistics query plus each component query spun up its own cold cache, so one hybrid query triggered several/pkrangesfetches.Impact: extra metadata round-trip + latency on every query, scaling with query volume.
Fix
SmartRoutingMapProvidernow uses the sharedclientContext.partitionKeyRangeCache. Parallel, ORDER BY, and hybrid all share one warm cache.forceRefreshinstead of allocating a fresh provider.partitionKeyRangeCache.ts): concurrent fetches (cold or forceRefresh) dedupe to one request; the map is published only on success, so a transient failure no longer poisons later lookups (the next call retries) and a failedforceRefreshkeeps serving the last known-good map.Testing
test/internal/unit/partitionKeyRangeCache.spec.ts(5 tests): dedupe, cache hit, evict-on-failure, keep-prior-on-failed-refresh, dedupe concurrent forceRefresh.test/public/functional/partitionKeyRangeCacheReuse.spec.ts: counts/pkrangesrequests — 1 with the fix, 4 without.