Skip to content

LocationCache: Fixes read fallback to use WriteEndpoints[0] when PPAF enabled and all regions excluded#5823

Open
ananth7592 wants to merge 1 commit intomainfrom
users/ananth/5821
Open

LocationCache: Fixes read fallback to use WriteEndpoints[0] when PPAF enabled and all regions excluded#5823
ananth7592 wants to merge 1 commit intomainfrom
users/ananth/5821

Conversation

@ananth7592
Copy link
Copy Markdown
Member

@ananth7592 ananth7592 commented Apr 30, 2026

Problem

When ApplicationPreferredRegions == ExcludeRegions, LocationCache.GetApplicableEndpoints falls back to his.defaultEndpoint — a static, region-agnostic URI set once at init and never updated. After a write region (hub) switch, the GlobalAddressResolver's cached EndpointCache for this default endpoint has a stale AddressResolver.location, causing incorrect region tracking in diagnostics, per-partition routing, and retry logic.

Fix

When PPAF (IsPartitionLevelFailoverEnabled) is enabled, GetApplicableEndpoints now uses WriteEndpoints[0] (dynamic, tracks current write region) as the read fallback instead of his.defaultEndpoint.

This aligns with:

  • UpdateLocationCache (L756-760) which already uses WriteEndpoints[0] for ReadEndpoints fallback
  • Java SDK: writeRegionalRoutingContexts.get(0)
  • Python SDK: get_write_regional_routing_contexts()[0]

PPAF Gating

The fix is gated behind Func isPartitionLevelFailoverEnabled wired from ConnectionPolicy.EnablePartitionLevelFailover through GlobalEndpointManager, supporting dynamic enablement per PR #5310. When PPAF is disabled, original behavior (defaultEndpoint fallback) is preserved.

Changes

  • LocationCache.cs: Added isPartitionLevelFailoverEnabled parameter; gated read fallback behind it
  • GlobalEndpointManager.cs: Wires ConnectionPolicy.EnablePartitionLevelFailover into LocationCache
  • LocationCacheTests.cs: 3 new tests covering PPAF on/off/dynamic toggle scenarios

Testing

All 94 LocationCacheTests pass.

Fixes #5821

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

Comment thread Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs
…ind PPAF

When ExcludeRegions filters out all preferred read regions and PPAF
(Partition Level Failover) is enabled, GetApplicableEndpoints now falls back
to WriteEndpoints[0] (dynamic, tracks current write region) instead of
this.defaultEndpoint (static, region-agnostic URI set once at init).

The fix is gated behind isPartitionLevelFailoverEnabled (Func<bool>) wired
from ConnectionPolicy.EnablePartitionLevelFailover through GlobalEndpointManager,
supporting dynamic enablement per PR #5310.

When PPAF is disabled, original behavior (defaultEndpoint fallback) is preserved.

Fixes #5821

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ananth7592 ananth7592 force-pushed the users/ananth/5821 branch from 8b6cfce to 038e757 Compare May 1, 2026 16:59
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

@kundadebdatta
Copy link
Copy Markdown
Member

Shall we also update the verbaige in the RequestOptions.ExcludeRegions property to set the correct expectation here ?

// never updated after init). This aligns with UpdateLocationCache which already uses
// WriteEndpoints[0] as the ReadEndpoints fallback, and matches Java/Python SDK behavior.
Uri fallbackEndpoint = (isReadRequest && this.isPartitionLevelFailoverEnabled?.Invoke() == true)
? databaseAccountLocationsInfoSnapshot.WriteEndpoints[0]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we always pick the databaseAccountLocationsInfoSnapshot.WriteEndpoints[0] or if we should condition it to check for the PPAF cache override as well? If the override is present, then always pick that endpoint as the hub endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reads routed via defaultEndpoint do not failover after write region switch when ExcludeRegions filters all preferred regions

3 participants