Diagnostics: Fixes null contacted region name for multimaster hub fallback (410/21005)#5618
Diagnostics: Fixes null contacted region name for multimaster hub fallback (410/21005)#5618NaluTripician wants to merge 21 commits intomainfrom
Conversation
…egions-hub-fallback
There was a problem hiding this comment.
Pull request overview
Fixes diagnostics region-name resolution when gateway requests in multi-master accounts fall back to the account’s global/default endpoint after 410/21005 retries with excluded regions, preventing (null, <global-endpoint>) entries in GetContactedRegions().
Changes:
- Update
LocationCache.GetLocation(Uri)to map the default/global endpoint to a write location name for multi-master accounts as well. - Update
LocationCache.TryGetLocationForGatewayDiagnostics(Uri, out string)to return the hub/write region for default-endpoint hostnames in multi-master scenarios, and to returnfalsewhen no region can be resolved. - Add a regression test validating default-endpoint diagnostics mapping in multi-master mode.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs | Adjusts default-endpoint → region-name resolution used by gateway diagnostics. |
| Microsoft.Azure.Cosmos/tests/Microsoft.Azure.Cosmos.Tests/LocationCacheTests.cs | Adds coverage for multi-master default-endpoint diagnostics mapping; normalizes formatting in touched hunks. |
Comments suppressed due to low confidence (1)
Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs:180
GetLocationselects the “first available write location” viaAvailableWriteEndpointByLocation.First(). Enumeration order ofDictionary/ReadOnlyDictionaryis not guaranteed by contract, so this can produce non-deterministic region names across runtimes. Prefer using the ordered list already maintained inlocationInfo.AvailableWriteLocations(e.g., index 0) to pick the hub/first write location deterministically.
string location = this.locationInfo.AvailableWriteEndpointByLocation.FirstOrDefault(uri => uri.Value == endpoint).Key ?? this.locationInfo.AvailableReadEndpointByLocation.FirstOrDefault(uri => uri.Value == endpoint).Key;
if (location == null && endpoint == this.defaultEndpoint)
{
if (this.locationInfo.AvailableWriteEndpointByLocation.Any())
{
return this.locationInfo.AvailableWriteEndpointByLocation.First().Key;
}
…egions-hub-fallback
…egions-hub-fallback
|
@NaluTripician I've opened a new pull request, #5640, to work on those changes. Once the pull request is ready, I'll request review from you. |
…egions-hub-fallback
…egions-hub-fallback
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
NaluTripician
left a comment
There was a problem hiding this comment.
Review Summary
The core fix is correct — removing the guard that prevented multimaster accounts from resolving the default endpoint to a write region name, and adding a null-safety check on the return path. A couple of concerns below.
Whitespace noise: ~200 lines of trailing-whitespace-only changes obscure the ~15 lines of real logic. Consider a separate formatting commit or .editorconfig enforcement to keep semantic diffs clean.
Test coverage gap: The new test covers useMultipleWriteLocations: true but not the asymmetric case (account is multi-master, client didn't opt in). PR #5640 addresses this — recommend folding it in before merge.
…solution - Use enableMultipleWriteLocations (server-side setting) instead of CanUseMultipleWriteLocations() (client opt-in AND server) so diagnostics resolve the hub region even when the client has UseMultipleWriteLocations disabled. - Use AvailableWriteLocations[0] (ordered ReadOnlyCollection) instead of AvailableWriteEndpointByLocation.First().Key (unordered Dictionary) for deterministic region name selection. - Add test for asymmetric case: account multi-master, client opt-out. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
…ter diagnostics - Adds inline comment explaining why enableMultipleWriteLocations (account-level) is used instead of CanUseMultipleWriteLocations() (requires client opt-in) in TryGetLocationForGatewayDiagnostics, since diagnostics should resolve the hub region regardless of client multi-write configuration. - Adds test for unknown/unresolvable non-default endpoint (validates the return regionName != null fix returns false for unknown endpoints). - Adds test for pre-OnDatabaseAccountRead state (validates correct behavior when AvailableWriteLocations is empty and enableMultipleWriteLocations defaults to false). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
- Use .Any() instead of .Count > 0 for AvailableWriteLocations guard (per Kiran) - Collapse inner enableMultipleWriteLocations check into ternary (per Kiran) - Reset files to master baseline to eliminate all CRLF/whitespace-only diffs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
…egions-hub-fallback
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
- Collapse nested if in GetLocation into single condition - Revert defensive 'return regionName != null' to original 'return true' for non-default endpoint path (unrelated to fix) - Replace 'hub' terminology with 'first available write region' in comments and test variables (MM first write region is not necessarily the hub) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
…egions-hub-fallback
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
…egions-hub-fallback
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
…dpoints The fallthrough path returned 'true' unconditionally after calling GetLocation, even when GetLocation returned null for unrecognized endpoints. Changed to 'return regionName != null' to match the contract and the default-endpoint branch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
…egions-hub-fallback
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
…egions-hub-fallback
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
Addresses kushagraThapar review feedback on PR #5618: the existing summary on GetLocation was stale after removing the !CanUseMultipleWriteLocations() guard. Updated doc to: (1) cover the broader write+read regional endpoint case, (2) note that the default-endpoint fallback now applies to both single-master and multi-master, and (3) clarify that the returned write location is just the first in the list, not necessarily the hub region (per his line-179 note). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
…egions-hub-fallback
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
…ions-hub-fallback
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
kushagraThapar
left a comment
There was a problem hiding this comment.
LGTM, thanks @NaluTripician
Summary
Fixes issue #5095 where
GetContactedRegions()can return(null, <global-endpoint-uri>)for multimaster reads when requests fall back to the account default endpoint after410/21005retries withExcludeRegions.What this change does
LocationCache.GetLocation(Uri endpoint)to resolve the default/global endpoint to the first available write location name for multimaster as well as single-master.LocationCache.TryGetLocationForGatewayDiagnostics(Uri endpoint, out string regionName)so that for default-endpoint hostnames in multimaster accounts, diagnostics resolve to hub/write region instead of returningnull.LocationCacheTestsfor multimaster default-endpoint diagnostics mapping.Root cause
For multimaster accounts:
GetApplicableEndpoints(...)can fall back todefaultEndpointwhen preferred/applicable regional endpoints are exhausted or excluded.TryGetLocationForGatewayDiagnosticstreated endpoints with host matchingdefaultEndpointas non-regional and returnedregionName = null.GetLocation(defaultEndpoint)also returnednullin multimaster scenarios due to a guard condition.This caused diagnostics to record contacted regions with an empty/null region name despite correct endpoint routing.
Repro and verification
Repro test added
ValidateTryGetLocationForGatewayDiagnosticsOnDefaultEndpointForMultiMasterMicrosoft.Azure.Cosmos/tests/Microsoft.Azure.Cosmos.Tests/LocationCacheTests.csGetLocation(DefaultEndpoint)returns the hub/write region name.TryGetLocationForGatewayDiagnostics(DefaultEndpoint, out regionName)returnstrueand non-null hub region.new Uri(DefaultEndpoint, "random/path")).Red/green proof
dotnet test .\\Microsoft.Azure.Cosmos\\tests\\Microsoft.Azure.Cosmos.Tests\\Microsoft.Azure.Cosmos.Tests.csproj --filter "FullyQualifiedName~ValidateTryGetLocationForGatewayDiagnosticsOnDefaultEndpointForMultiMaster" --nologoExpected:<location1>. Actual:<(null)>.Files changed
Microsoft.Azure.Cosmos/src/Routing/LocationCache.csMicrosoft.Azure.Cosmos/tests/Microsoft.Azure.Cosmos.Tests/LocationCacheTests.csNotes
false+ null region name) in existing test coverage.