Skip to content

Revert Dual Endpoint Tracking #40451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

tvaron3
Copy link
Member

@tvaron3 tvaron3 commented Apr 9, 2025

Background

The SDK used to track two endpoints for writes. One was the global endpoint, https://account.documents.azure.com, and the other was the regional endpointhttps://account-eastus.documents.azure.com. Gateway would return these randomly for the write region and the sdk would use that one to write and use the other endpoint as the backup. This was done for write availability in single write region accounts and for load balancing across compute federations.

Changes

The dual endpoint logic is being abstracted away using a service called Azure Traffic Manager https://learn.microsoft.com/en-us/azure/traffic-manager/traffic-manager-overview. Gateway will now only send the regional endpoint to the sdk and the global endpoint should only be used for metadata calls in the sdk. This pr removes all the dual endpoint logic from the sdk and tests.

Other Changes

The dual endpoints introduced a regression for multiple write region accounts in the sdk where we would failover even if the client was not setup with multiple write regions enabled. This is fixed in this pr. Simplified health check logic as no alternate endpoints to check anymore.

@azure-sdk
Copy link
Collaborator

API change check

API changes are not detected in this pull request.

@tvaron3 tvaron3 marked this pull request as ready for review April 10, 2025 17:34
@Copilot Copilot AI review requested due to automatic review settings April 10, 2025 17:34
@tvaron3 tvaron3 requested review from annatisch and a team as code owners April 10, 2025 17:34
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 18 out of 18 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (3)

sdk/cosmos/azure-cosmos/tests/test_service_retry_policies_async.py:386

  • Verify that the expected number of request endpoints (4) now reflects the intended in-region retry behavior after the dual endpoint removal and that no necessary retries are inadvertently omitted.
assert len(connection_retry_policy.request_endpoints) == 4

sdk/cosmos/azure-cosmos/tests/test_fault_injection_transport.py:349

  • Confirm that enabling multiple write locations in this test fully exercises the new code path and does not impact tests expecting single write location behavior when not explicitly enabled.
use_multiple_write_locations=True,

sdk/cosmos/azure-cosmos/azure/cosmos/_location_cache.py:59

  • Ensure that downstream components correctly handle regional routing contexts that now contain only a primary endpoint, without relying on any alternate endpoint logic.
def get_regional_routing_contexts_by_loc(new_locations):

Copy link
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @tvaron3

for endpoint in endpoints:
if endpoint not in endpoints_attempted:
if success_count >= 4:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should still have a max health check count. Do you recall one internal customer using 100 regions. Let's still limit this to maybe 3 max.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agree, but the endpoints now will be max 3 because there is no alternate endpoints. We grab first two write regions and the first two read regions. This should max get us three unique endpoints. I can add some unit tests for endpoints to health check to verify this behavior.

@@ -23,7 +23,7 @@ class TestServiceRetryPolicies(unittest.TestCase):
REGION1 = "West US"
REGION2 = "East US"
REGION3 = "West US 2"
REGIONAL_ENDPOINT = RegionalRoutingContext(host, host)
REGIONAL_ENDPOINT = RegionalRoutingContext(host)

@classmethod
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expand to test all operation types and with fault injection.

Copy link
Member

@simorenoh simorenoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a small comment but looks good otherwise - thanks Tomas!

Comment on lines +70 to +71
connection_policy = ConnectionPolicy()
connection_policy.UseMultipleWriteLocations = use_multiple_write_locations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two lines can be changed to just pass in the setting in the client initialization - multiple_write_locations=use_multiple_write_locations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants