[Internal] PPAF: Adds Hub Region Caching Per Partition Level#5648
[Internal] PPAF: Adds Hub Region Caching Per Partition Level#5648
Conversation
NaluTripician
left a comment
There was a problem hiding this comment.
One question, other than that LGTM
tvaron3
left a comment
There was a problem hiding this comment.
PR Deep Review — Per Partition Hub Region Caching
Overall: The caching architecture is clean and the feature intent is solid. However, there is a correctness issue with sessionTokenRetryCount not being incremented when bypassing ShouldRetryOnSessionNotAvailable (reinforces @NaluTripician's existing comment), and the hub cache has no eviction mechanism which could lead to unbounded growth.
Existing comments: 1 comment from @NaluTripician about sessionTokenRetryCount — overlaps with finding #1.
| # | Severity | Summary |
|---|---|---|
| 1 | 🔴 Blocking | sessionTokenRetryCount not incremented — potential unbounded retry |
| 2 | 🟡 | Unbounded cache growth — no eviction or TTL |
| 3 | 🟡 | Stale cache entries after partition splits |
| 4 | 🟡 | documentServiceRequest may be null |
| 5 | 🟡 | Concrete type check breaks retry handler abstraction |
| 6 | 🟢 | Cache lookup on every request (minor perf note) |
|
Syncedup offline about write-location == Hub for SM which is scope right now. |
| /// For single master accounts, the write region is the hub region, so this | ||
| /// stores the hub URI in the existing write-location cache. | ||
| /// </summary> | ||
| public abstract void CacheDiscoveredHubRegionForPartition(PartitionKeyRange partitionKeyRange, Uri hubRegion, string collectionRid); |
There was a problem hiding this comment.
Clarification: Non-PPAF accounts what's the expected behavior?
There was a problem hiding this comment.
A non-PPAF single-master account gets: 404/1002 × 2 → hub header set → region cycling via normal retry on 403/3 retries currently work → it gets next region info from AccountProperties → eventually gets 200 response →updates location cache
| return; | ||
| } | ||
| else | ||
|
|
There was a problem hiding this comment.
Isn't else needed? With-out it RouteToHub work will be overriden by index based flow
There was a problem hiding this comment.
Probably relying on below to reset index to ZERO. I think its better to not take dependency on that behavior.
There was a problem hiding this comment.
Updated the code logic, fixed this by adding an explicit return after RouteToLocation(hubUri) and setting this.locationEndpoint. Choosing over else because else because ResolveServiceEndpoint further down would still override the hub routing. The return exits the method entirely since routing is fully resolved at that point.
| // a previous request already discovered the hub region for this partition. | ||
| // If cached, route directly there (skipping the 403/3 discovery chain). | ||
| // Normal first-attempt requests never enter this block (addHubRegionProcessingOnlyHeader = false). | ||
| if (!this.canUseMultipleWriteLocations && this.addHubRegionProcessingOnlyHeader) |
There was a problem hiding this comment.
This can be purely based on addHubRegionProcessingOnlyHeader.
addHubRegionProcessingOnlyHeader decision can include MM in its decision.
There was a problem hiding this comment.
but we dont have support for HubRegion header in multi master
| { | ||
| request.RequestContext.RouteToLocation(this.globalEndpointManager.GetHubUri()); | ||
|
|
||
| this.locationEndpoint = request.RequestContext.LocationEndpointToRoute; |
There was a problem hiding this comment.
this.locationEndpoint is getting updated for MM also, what are the side-affects of it?
There was a problem hiding this comment.
This comment has been addressed already. The updated code always sets this.locationEndpoint by ResolveServiceEndpoint at the end of the method for all paths including multimaster. The hub caching early return is in a separate block guarded by addHubRegionProcessingOnlyHeader, which is only set for single-master, so multimaster never reaches it.
| /// </summary> | ||
| private static bool IsHubRegionRoutingActive(DocumentServiceRequest request) | ||
| { | ||
| return request?.Headers?[HttpConstants.HttpHeaders.ShouldProcessOnlyInHubRegion] != null; |
There was a problem hiding this comment.
How about check the value as well, so make it more bounded.
There was a problem hiding this comment.
Updated IsHubRegionRoutingActive to check the value equals "True" (case-insensitive) instead of just checking for header presence.
| request, | ||
| this.PartitionKeyRangeToLocationForWrite); | ||
| } | ||
| else if (GlobalPartitionEndpointManagerCore.IsHubRegionRoutingActive(request)) |
There was a problem hiding this comment.
How about club this if caluse with above one?
There was a problem hiding this comment.
combined the PPAF and hub region branches in TryAddPartitionLevelLocationOverride since they had identical code
| request, | ||
| this.PartitionKeyRangeToLocationForWrite); | ||
| } | ||
| else if (GlobalPartitionEndpointManagerCore.IsHubRegionRoutingActive(request)) |
There was a problem hiding this comment.
How different is it from above if block?
If they are same how about just inlne the if condtion with existing one?
There was a problem hiding this comment.
They are different in behavior as the PPAF block calls TryAddOrUpdatePartitionFailoverInfoAndMoveToNextLocation, which caches the failed location and picks the next region from the endpoint list as the failover target, then returns true so the retry routes to that cached location. The hub block does the opposite,it calls TryRemove to delete any stale cached hub entry and returns false. This is intentional because during hub discovery, a 403/3 tells us a region is NOT the hub, but it does not tell us which region IS the hub. Caching the next region from the endpoint list would be incorrect because that region is just the next one in ordering, not necessarily the hub.
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
kirankumarkolli
left a comment
There was a problem hiding this comment.
Can TryMarkEndpointUnavailableForPkRange be leveraged on getting 403.3?
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
|
|
||
| // Note: This can be triggered by the read requests as well. In that case, we will set the isReadRequest to | ||
| // false to ensure that we mark the endpoint unavailable for writes only. | ||
| return await this.ShouldRetryOnEndpointFailureAsync( |
There was a problem hiding this comment.
Note: Keeping 403.3 markdown logic common for both read and writes, may force refresh the locations (account properties) by calling await this.globalEndpointManager.RefreshLocationAsync(forceRefresh);
This can be a behavior change where, now the reads will have the ablity to do force refresh locations in the event of getting a 403.3 Considering this as an edge case, the impact will be minimal though.
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
Description
This PR introduces per-partition hub region caching for the Azure Cosmos DB .NET SDK v3. On single-master accounts encountering repeated
404/1002(ReadSessionNotAvailable) errors, the SDK now discovers the hub region via a403/3(WriteForbidden) discovery chain and caches the result. Subsequent requests to the same partition route directly to the cached hub, eliminating redundant discovery round-trips.Follow-up of: PR #5447
Branch:
users/aavasthy/hubregioncachingType: New feature (non-breaking)
Lines changed: +1,966 / −363
Problem
404/1002on a single-master account, the SDK gave up and returned the error to the callerSolution Summary
404/1002errors on a single-master account, set thex-ms-cosmos-hub-region-processing-onlyheader.403/3(WriteForbidden) — the SDK retries to the next region (the 403/3 discovery chain).200 OK, the SDK caches the hub region URI for that partition inPartitionKeyRangeToLocationForWrite.Existing vs. New Behavior
404/1002(single-master)404/1002404/1002403/3on read path with hub headercheckHubRegionOverrideInCacheGatewayStoreModelPKRange resolutionEnd-to-End Flow
Cold Cache — First-Time Hub Region Discovery
Warm Cache — Subsequent Request
Sequence Diagram — Cold Cache (First Discovery)
sequenceDiagram participant App as Application participant CRP as ClientRetryPolicy participant GW as GatewayStoreModel participant GPEM as GlobalPartitionEndpoint<br/>ManagerCore participant R1 as Preferred Region<br/>(East US) participant R2 as Write Region<br/>(West US) participant R3 as Non-Hub Region<br/>(North EU) participant Hub as Hub Region<br/>(Central US) App->>CRP: ReadItemAsync(P1) CRP->>GW: OnBeforeSendRequest() GW->>R1: GET /P1 (no hub header) R1-->>GW: 404/1002 (ReadSessionNotAvailable) GW-->>CRP: ShouldRetryOnSessionNotAvailable() Note over CRP: sessionTokenRetryCount = 1<br/>Single-master → retry to write region CRP->>GW: OnBeforeSendRequest() GW->>R2: GET /P1 (no hub header) R2-->>GW: 404/1002 (ReadSessionNotAvailable) GW-->>CRP: ShouldRetryOnSessionNotAvailable() Note over CRP: sessionTokenRetryCount > 1<br/>Set addHubRegionProcessingOnlyHeader = true<br/>Check cache → MISS CRP->>GW: OnBeforeSendRequest() [hub header set] GW->>GPEM: IsHubRegionRoutingActive() → true GPEM-->>GW: Resolve PKRange, TryAddPartitionLevelLocationOverride → no cache GW->>R3: GET /P1 + x-ms-cosmos-hub-region-processing-only R3-->>GW: 403/3 (WriteForbidden) GW-->>CRP: ShouldRetryInternal(403/3) CRP->>GPEM: TryMarkEndpointUnavailableForPartitionKeyRange() Note over GPEM: Cache advanced:<br/>P1 → next region CRP->>GW: OnBeforeSendRequest() [hub header set] GW->>GPEM: IsHubRegionRoutingActive() → true GPEM-->>GW: TryAddPartitionLevelLocationOverride → route to Hub GW->>Hub: GET /P1 + x-ms-cosmos-hub-region-processing-only Hub-->>GW: 200 OK GW-->>CRP: Success Note over GPEM: Cache populated:<br/>P1 → Hub URI CRP-->>App: 200 OKSequence Diagram — Warm Cache (Subsequent Request)
sequenceDiagram participant App as Application participant CRP as ClientRetryPolicy participant GW as GatewayStoreModel participant GPEM as GlobalPartitionEndpoint<br/>ManagerCore participant R1 as Preferred Region<br/>(East US) participant Hub as Hub Region<br/>(Central US) App->>CRP: ReadItemAsync(P1) CRP->>GW: OnBeforeSendRequest() GW->>R1: GET /P1 (no hub header) R1-->>GW: 404/1002 (ReadSessionNotAvailable) GW-->>CRP: ShouldRetryOnSessionNotAvailable() Note over CRP: sessionTokenRetryCount = 1<br/>Single-master → retry to write region CRP->>GW: OnBeforeSendRequest() GW->>R1: GET /P1 (routed to write region) R1-->>GW: 404/1002 (ReadSessionNotAvailable) GW-->>CRP: ShouldRetryOnSessionNotAvailable() Note over CRP: sessionTokenRetryCount > 1<br/>addHubRegionProcessingOnlyHeader = true<br/>TryAddPartitionLevelLocationOverride<br/>→ CACHE HIT! CRP->>GW: OnBeforeSendRequest() [hub header + cache override] GW->>GPEM: IsHubRegionRoutingActive() → true GPEM-->>GW: TryAddPartitionLevelLocationOverride → route to cached Hub GW->>Hub: GET /P1 + x-ms-cosmos-hub-region-processing-only Hub-->>GW: 200 OK GW-->>CRP: Success CRP-->>App: 200 OKComponent Architecture
flowchart TB subgraph "Retry Layer" CRP["ClientRetryPolicy"] CRP -->|"OnBeforeSendRequest()"| HDR["Set x-ms-cosmos-hub-region-<br/>processing-only header"] CRP -->|"ShouldRetryOnSessionNotAvailable()"| DEC{"sessionTokenRetryCount > 1<br/>+ single-master<br/>+ hub processing enabled?"} DEC -->|Yes| FLAG["addHubRegionProcessingOnlyHeader = true"] FLAG --> CACHE_CHECK{"Cache hit?<br/>TryAddPartitionLevel<br/>LocationOverride()"} CACHE_CHECK -->|Yes - Warm Path| DIRECT["Route to cached hub"] CACHE_CHECK -->|No - Cold Path| CYCLE["Region cycling<br/>(403/3 discovery)"] DEC -->|No| ORIG["Original retry behavior<br/>(RetryOnSessionNotAvailableRouteToWriteRegion)"] end subgraph "Gateway Layer" GW["GatewayStoreModel"] GW -->|"IsHubRegionRoutingActive()"| PKRANGE["Resolve PKRange"] PKRANGE --> OVERRIDE["TryAddPartitionLevel<br/>LocationOverride()"] end subgraph "Partition Cache Layer" GPEM["GlobalPartitionEndpoint<br/>ManagerCore"] GPEM --> WRITE_CACHE["PartitionKeyRangeToLocationForWrite<br/>(ConcurrentDictionary)"] GPEM -->|"TryMarkEndpoint<br/>Unavailable()"| ADVANCE["Advance cache to<br/>next region"] GPEM -->|"IsHubRegionRoutingActive()"| CHECK["Check hub header<br/>+ hub property"] GPEM -->|"IsRequestEligibleFor<br/>PartitionOrHubRegionFailover()"| GATE["Gate: PPAF OR circuit breaker<br/>OR checkHubRegionOverrideInCache"] end CRP --> GW GW --> GPEM DIRECT --> GW CYCLE --> GWKey Files Changed
ClientRetryPolicy.csTryAddPartitionLevelLocationOverride, feature flag gateGlobalPartitionEndpointManagerCore.csPartitionKeyRangeToLocationForWrite),IsHubRegionRoutingActive(),IsHubRegionHeaderPresentInRequest(),IsHubRegionPropertyPresentInRequest(), eligibility gate update,IsHubRegionProcessingEnabled()GlobalPartitionEndpointManager.csHubRegionOverridePresentInCacheconstant,IsHubRegionProcessingEnabled()abstract method,checkHubRegionOverrideInCacheparameterGlobalPartitionEndpointManagerNoOp.csIsHubRegionProcessingEnabled()GatewayStoreModel.csTryAddPartitionLevelLocationOverridecall extended forIsHubRegionRoutingActive(enables non-PPAF accounts)ConfigurationManager.csIsHubRegionProcessingEnabled()— reads environment variable once at initializationClientRetryPolicyTests.csLocationCacheTests.csCosmosItemIntegrationTests.csFeature Flag
The entire hub region processing feature is gated by:
#if !INTERNAL— internal builds retain the existing 2-retry behaviorConfigurationManager.IsHubRegionProcessingEnabled()— read once atGlobalPartitionEndpointManagerCoreinitialization from an environment variableReview Highlights from PR Discussion
ShouldRetryOnEndpointFailureAsyncmarks read endpoints unavailable — unintended side effectoverwriteEndpointDiscovery: trueto skip endpoint marking#if !INTERNALand#elseblocksRetryOnSessionNotAvailableRouteToWriteRegion()TryAddPartitionLevelLocationOverride"True"comparisonsessionTokenRetryCountnot incremented in hub bypass path>= 1, and after hub header the code path diverges (403/3 or 200), capped byfailoverRetryCountCosmosItemIntegrationTests.cs. PPAF drill will also serve as E2E validationClosing issues
To automatically close an issue: closes #IssueNumber