Summary
When ApplicationPreferredRegions == ExcludeRegions, the SDK falls back to the region-agnostic defaultEndpoint for reads. After a write region switch, writes failover correctly (they use AvailableWriteLocations[0] directly), but reads remain pinned to the old region's address resolution because the GlobalAddressResolver's cached EndpointCache for the default endpoint has a stale location string frozen at client init time.
Root Cause
There are two layers to this bug:
Layer 1: LocationCache.GetApplicableEndpoints uses static defaultEndpoint as read fallback
LocationCache.cs#L372-L388
public ReadOnlyCollection<Uri> GetApplicableEndpoints(DocumentServiceRequest request, bool isReadRequest)
{
// ...
return GetApplicableEndpoints(
isReadRequest ? this.locationInfo.AvailableReadEndpointByLocation : this.locationInfo.AvailableWriteEndpointByLocation,
effectivePreferredLocations,
this.defaultEndpoint, // ← BUG: Static, region-agnostic, never updated
request.RequestContext.ExcludeRegions);
}
The private static helper at L416-L449 uses this fallback when no endpoints survive filtering:
if (applicableEndpoints.Count == 0)
{
applicableEndpoints.Add(fallbackEndpoint); // ← "myaccount.documents.azure.com"
}
Key insight: The fix already exists in UpdateLocationCache! At L756-L760, ReadEndpoints correctly uses WriteEndpoints[0] as fallback:
nextLocationInfo.ReadEndpoints = this.GetPreferredAvailableEndpoints(
endpointsByLocation: nextLocationInfo.AvailableReadEndpointByLocation,
orderedLocations: nextLocationInfo.AvailableReadLocations,
expectedAvailableOperation: OperationType.Read,
fallbackEndpoint: nextLocationInfo.WriteEndpoints[0]); // ← Dynamic, correct!
But GetApplicableEndpoints (the ExcludeRegions path) bypasses this and uses this.defaultEndpoint instead.
Layer 2: GlobalAddressResolver.GetOrAddEndpoint caches stale AddressResolver.location
GlobalAddressResolver.cs#L327-L336 — Once an endpoint is cached at init, TryGetValue returns immediately without validating location:
if (this.addressCacheByEndpoint.TryGetValue(endpoint, out EndpointCache existingCache))
{
return existingCache; // ← Never checks if location drifted
}
AddressResolver.cs#L34 — location is private readonly string, frozen at creation.
AddressResolver.cs#L72 — Propagates stale value: request.RequestContext.RegionName = this.location
Proposed Fix
Change GetApplicableEndpoints to use WriteEndpoints[0] as the read fallback instead of this.defaultEndpoint.
This aligns with:
- The existing pattern in
UpdateLocationCache (L760) which already uses WriteEndpoints[0] for ReadEndpoints
- Java SDK:
LocationCache.java#L266 — writeRegionalRoutingContexts.get(0)
- Python SDK:
_location_cache.py#L241 — get_write_regional_routing_contexts()[0]
public ReadOnlyCollection<Uri> GetApplicableEndpoints(DocumentServiceRequest request, bool isReadRequest)
{
if (request.RequestContext.ExcludeRegions == null || request.RequestContext.ExcludeRegions.Count == 0)
{
return isReadRequest ? this.ReadEndpoints : this.WriteEndpoints;
}
DatabaseAccountLocationsInfo databaseAccountLocationsInfoSnapshot = this.locationInfo;
ReadOnlyCollection<string> effectivePreferredLocations = databaseAccountLocationsInfoSnapshot.EffectivePreferredLocations;
Uri fallbackEndpoint = isReadRequest
? databaseAccountLocationsInfoSnapshot.WriteEndpoints[0] // Dynamic: tracks current write region
: this.defaultEndpoint;
return GetApplicableEndpoints(
isReadRequest ? this.locationInfo.AvailableReadEndpointByLocation : this.locationInfo.AvailableWriteEndpointByLocation,
effectivePreferredLocations,
fallbackEndpoint,
request.RequestContext.ExcludeRegions);
}
Why Read-Path Only
- Writes in single-master bypass
ExcludeRegions entirely at L347-348 — they use AvailableWriteLocations directly
- Reads hit
GetApplicableEndpoints → all filtered → fallback to defaultEndpoint → stale AddressResolver.location
Impact
| What |
Before Fix |
After Fix |
| Read fallback endpoint |
defaultEndpoint (region-agnostic, static) |
WriteEndpoints[0] (region-specific, dynamic) |
AddressResolver.location |
Stale after hub switch |
Correct (cache keyed by regional URI, recreated) |
request.RequestContext.RegionName |
Reports wrong region |
Reports correct region |
| Diagnostics |
Wrong region in traces |
Correct region |
| Per-partition routing |
May make wrong decisions |
Correct decisions |
Cross-SDK Comparison
| SDK |
Read Fallback When All Excluded |
Correct? |
| .NET (current) |
this.defaultEndpoint |
❌ Static, stale |
| .NET (proposed) |
WriteEndpoints[0] |
✅ Dynamic |
| Java |
writeRegionalRoutingContexts.get(0) |
✅ Dynamic |
| Python |
get_write_regional_routing_contexts()[0] |
✅ Dynamic |
| Rust |
self.default_endpoint (gateway-only, no AddressResolver) |
⚠️ azure-sdk-for-rust#4322 |
Reproduction Scenario
- Configure client with
ApplicationPreferredRegions = ["East US"]
- Send reads with
ExcludeRegions = ["East US"]
- Reads fall back to
defaultEndpoint → AddressResolver.location = "East US" (init-time write region)
- Trigger write region failover: East US → West US
WriteEndpoints[0] now points to West US regional endpoint (correctly updated)
- But
GetApplicableEndpoints still returns defaultEndpoint → stale cache hit in GlobalAddressResolver
request.RequestContext.RegionName incorrectly reports "East US"
Summary
When
ApplicationPreferredRegions == ExcludeRegions, the SDK falls back to the region-agnosticdefaultEndpointfor reads. After a write region switch, writes failover correctly (they useAvailableWriteLocations[0]directly), but reads remain pinned to the old region's address resolution because theGlobalAddressResolver's cachedEndpointCachefor the default endpoint has a stalelocationstring frozen at client init time.Root Cause
There are two layers to this bug:
Layer 1:
LocationCache.GetApplicableEndpointsuses staticdefaultEndpointas read fallbackLocationCache.cs#L372-L388
The private static helper at L416-L449 uses this fallback when no endpoints survive filtering:
Key insight: The fix already exists in
UpdateLocationCache! At L756-L760,ReadEndpointscorrectly usesWriteEndpoints[0]as fallback:But
GetApplicableEndpoints(the ExcludeRegions path) bypasses this and usesthis.defaultEndpointinstead.Layer 2:
GlobalAddressResolver.GetOrAddEndpointcaches staleAddressResolver.locationGlobalAddressResolver.cs#L327-L336 — Once an endpoint is cached at init,
TryGetValuereturns immediately without validating location:AddressResolver.cs#L34 —
locationisprivate readonly string, frozen at creation.AddressResolver.cs#L72 — Propagates stale value:
request.RequestContext.RegionName = this.locationProposed Fix
Change
GetApplicableEndpointsto useWriteEndpoints[0]as the read fallback instead ofthis.defaultEndpoint.This aligns with:
UpdateLocationCache(L760) which already usesWriteEndpoints[0]forReadEndpointsLocationCache.java#L266—writeRegionalRoutingContexts.get(0)_location_cache.py#L241—get_write_regional_routing_contexts()[0]Why Read-Path Only
ExcludeRegionsentirely at L347-348 — they useAvailableWriteLocationsdirectlyGetApplicableEndpoints→ all filtered → fallback todefaultEndpoint→ staleAddressResolver.locationImpact
defaultEndpoint(region-agnostic, static)WriteEndpoints[0](region-specific, dynamic)AddressResolver.locationrequest.RequestContext.RegionNameCross-SDK Comparison
this.defaultEndpointWriteEndpoints[0]writeRegionalRoutingContexts.get(0)get_write_regional_routing_contexts()[0]self.default_endpoint(gateway-only, no AddressResolver)Reproduction Scenario
ApplicationPreferredRegions = ["East US"]ExcludeRegions = ["East US"]defaultEndpoint→AddressResolver.location = "East US"(init-time write region)WriteEndpoints[0]now points to West US regional endpoint (correctly updated)GetApplicableEndpointsstill returnsdefaultEndpoint→ stale cache hit inGlobalAddressResolverrequest.RequestContext.RegionNameincorrectly reports "East US"