Skip to content

Diagnostics: Fixes null contacted region name for multimaster hub fallback (410/21005)#5618

Open
NaluTripician wants to merge 21 commits intomainfrom
users/nalutripician/issue-5095-contacted-regions-hub-fallback
Open

Diagnostics: Fixes null contacted region name for multimaster hub fallback (410/21005)#5618
NaluTripician wants to merge 21 commits intomainfrom
users/nalutripician/issue-5095-contacted-regions-hub-fallback

Conversation

@NaluTripician
Copy link
Copy Markdown
Contributor

@NaluTripician NaluTripician commented Feb 18, 2026

Summary

Fixes issue #5095 where GetContactedRegions() can return (null, <global-endpoint-uri>) for multimaster reads when requests fall back to the account default endpoint after 410/21005 retries with ExcludeRegions.

What this change does

  • Updates LocationCache.GetLocation(Uri endpoint) to resolve the default/global endpoint to the first available write location name for multimaster as well as single-master.
  • Updates LocationCache.TryGetLocationForGatewayDiagnostics(Uri endpoint, out string regionName) so that for default-endpoint hostnames in multimaster accounts, diagnostics resolve to hub/write region instead of returning null.
  • Adds regression coverage in LocationCacheTests for multimaster default-endpoint diagnostics mapping.

Root cause

For multimaster accounts:

  • GetApplicableEndpoints(...) can fall back to defaultEndpoint when preferred/applicable regional endpoints are exhausted or excluded.
  • Existing diagnostics resolution in TryGetLocationForGatewayDiagnostics treated endpoints with host matching defaultEndpoint as non-regional and returned regionName = null.
  • GetLocation(defaultEndpoint) also returned null in multimaster scenarios due to a guard condition.

This caused diagnostics to record contacted regions with an empty/null region name despite correct endpoint routing.

Repro and verification

Repro test added

  • ValidateTryGetLocationForGatewayDiagnosticsOnDefaultEndpointForMultiMaster
    • File: Microsoft.Azure.Cosmos/tests/Microsoft.Azure.Cosmos.Tests/LocationCacheTests.cs
    • Validates that in multimaster mode:
      • GetLocation(DefaultEndpoint) returns the hub/write region name.
      • TryGetLocationForGatewayDiagnostics(DefaultEndpoint, out regionName) returns true and non-null hub region.
      • Same behavior for default endpoint with path (new Uri(DefaultEndpoint, "random/path")).

Red/green proof

  • Without fix (temporarily reverted locally):
    • Command: dotnet test .\\Microsoft.Azure.Cosmos\\tests\\Microsoft.Azure.Cosmos.Tests\\Microsoft.Azure.Cosmos.Tests.csproj --filter "FullyQualifiedName~ValidateTryGetLocationForGatewayDiagnosticsOnDefaultEndpointForMultiMaster" --nologo
    • Result: Failed with Expected:<location1>. Actual:<(null)>.
  • With fix restored:
    • Same command
    • Result: Passed.

Files changed

  • Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs
  • Microsoft.Azure.Cosmos/tests/Microsoft.Azure.Cosmos.Tests/LocationCacheTests.cs

Notes

  • Single-master behavior for default endpoint diagnostics remains unchanged (false + null region name) in existing test coverage.
  • Change is focused to diagnostics-region resolution behavior and corresponding regression test coverage.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good!

@NaluTripician NaluTripician changed the title Fix null contacted region name for multimaster hub fallback (410/21005) Diagnostics: Fixes null contacted region name for multimaster hub fallback (410/21005) Feb 18, 2026
@NaluTripician NaluTripician marked this pull request as draft February 18, 2026 02:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes diagnostics region-name resolution when gateway requests in multi-master accounts fall back to the account’s global/default endpoint after 410/21005 retries with excluded regions, preventing (null, <global-endpoint>) entries in GetContactedRegions().

Changes:

  • Update LocationCache.GetLocation(Uri) to map the default/global endpoint to a write location name for multi-master accounts as well.
  • Update LocationCache.TryGetLocationForGatewayDiagnostics(Uri, out string) to return the hub/write region for default-endpoint hostnames in multi-master scenarios, and to return false when no region can be resolved.
  • Add a regression test validating default-endpoint diagnostics mapping in multi-master mode.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs Adjusts default-endpoint → region-name resolution used by gateway diagnostics.
Microsoft.Azure.Cosmos/tests/Microsoft.Azure.Cosmos.Tests/LocationCacheTests.cs Adds coverage for multi-master default-endpoint diagnostics mapping; normalizes formatting in touched hunks.
Comments suppressed due to low confidence (1)

Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs:180

  • GetLocation selects the “first available write location” via AvailableWriteEndpointByLocation.First(). Enumeration order of Dictionary/ReadOnlyDictionary is not guaranteed by contract, so this can produce non-deterministic region names across runtimes. Prefer using the ordered list already maintained in locationInfo.AvailableWriteLocations (e.g., index 0) to pick the hub/first write location deterministically.
            string location = this.locationInfo.AvailableWriteEndpointByLocation.FirstOrDefault(uri => uri.Value == endpoint).Key ?? this.locationInfo.AvailableReadEndpointByLocation.FirstOrDefault(uri => uri.Value == endpoint).Key;

            if (location == null && endpoint == this.defaultEndpoint)
            {
                if (this.locationInfo.AvailableWriteEndpointByLocation.Any())
                {
                    return this.locationInfo.AvailableWriteEndpointByLocation.First().Key;
                }

Comment thread Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs Outdated
Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 25, 2026

@NaluTripician I've opened a new pull request, #5640, to work on those changes. Once the pull request is ready, I'll request review from you.

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor Author

@NaluTripician NaluTripician left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

The core fix is correct — removing the guard that prevented multimaster accounts from resolving the default endpoint to a write region name, and adding a null-safety check on the return path. A couple of concerns below.

Whitespace noise: ~200 lines of trailing-whitespace-only changes obscure the ~15 lines of real logic. Consider a separate formatting commit or .editorconfig enforcement to keep semantic diffs clean.

Test coverage gap: The new test covers useMultipleWriteLocations: true but not the asymmetric case (account is multi-master, client didn't opt in). PR #5640 addresses this — recommend folding it in before merge.

Comment thread Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs Outdated
Comment thread Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs Outdated
Comment thread Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs
…solution

- Use enableMultipleWriteLocations (server-side setting) instead of
  CanUseMultipleWriteLocations() (client opt-in AND server) so diagnostics
  resolve the hub region even when the client has UseMultipleWriteLocations
  disabled.
- Use AvailableWriteLocations[0] (ordered ReadOnlyCollection) instead of
  AvailableWriteEndpointByLocation.First().Key (unordered Dictionary) for
  deterministic region name selection.
- Add test for asymmetric case: account multi-master, client opt-out.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

…ter diagnostics

- Adds inline comment explaining why enableMultipleWriteLocations (account-level)
  is used instead of CanUseMultipleWriteLocations() (requires client opt-in) in
  TryGetLocationForGatewayDiagnostics, since diagnostics should resolve the hub
  region regardless of client multi-write configuration.
- Adds test for unknown/unresolvable non-default endpoint (validates the return
  regionName != null fix returns false for unknown endpoints).
- Adds test for pre-OnDatabaseAccountRead state (validates correct behavior when
  AvailableWriteLocations is empty and enableMultipleWriteLocations defaults to false).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@NaluTripician NaluTripician marked this pull request as ready for review April 9, 2026 22:29
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

- Use .Any() instead of .Count > 0 for AvailableWriteLocations guard (per Kiran)
- Collapse inner enableMultipleWriteLocations check into ternary (per Kiran)
- Reset files to master baseline to eliminate all CRLF/whitespace-only diffs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

Comment thread Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs
Comment thread Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

- Collapse nested if in GetLocation into single condition
- Revert defensive 'return regionName != null' to original 'return true' for non-default endpoint path (unrelated to fix)
- Replace 'hub' terminology with 'first available write region' in comments and test variables (MM first write region is not necessarily the hub)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

…dpoints

The fallthrough path returned 'true' unconditionally after calling
GetLocation, even when GetLocation returned null for unrecognized
endpoints. Changed to 'return regionName != null' to match the
contract and the default-endpoint branch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

Comment thread Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs Outdated
Addresses kushagraThapar review feedback on PR #5618: the existing summary on GetLocation was stale after removing the !CanUseMultipleWriteLocations() guard. Updated doc to: (1) cover the broader write+read regional endpoint case, (2) note that the default-endpoint fallback now applies to both single-master and multi-master, and (3) clarify that the returned write location is just the first in the list, not necessarily the hub region (per his line-179 note).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @NaluTripician

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants