Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Internal] ClientRetryPolicy: Adds Code to Apply Partition Level Override When a Requested Cancellation Token Expires #5063

Conversation

kundadebdatta
Copy link
Member

@kundadebdatta kundadebdatta commented Mar 11, 2025

Pull Request Template

Description

Background:

During one of the backend drills, it was identified that when the following quorum loss condition is met, and the user provides a cancellation token, SDK honors the token, however doesn't apply the partition level fail over for the offending partition:

  • Quorum loss injected with the quorum replicas (3 out of 4 replicas are down).
  • The primary replica is specifically down.
  • A cancellation token with 5 seconds of timeout value is provided.

Observation:

  • SDK doesn't apply the partition level override and the subsequent write requests fails on the current faulty region/ partition.

Fix:

This PR is fixing the above behavior to apply partition level override, when a cancellation token gets expired. In order to avoid false positives, the cancellation token based failovers are currently based on certain thresholds, which could be overridden by setting the following environment variables: AZURE_COSMOS_PPCB_CONSECUTIVE_FAILURE_COUNT_FOR_READS and AZURE_COSMOS_PPCB_CONSECUTIVE_FAILURE_COUNT_FOR_WRITES.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Closing issues

To automatically close an issue: closes #5060

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good!

@kundadebdatta kundadebdatta changed the title Code changes to apply partition level override on ct expiry. [PPAF] - Adds Code to Apply Partition Level Override When a Requested Cancellation Token Expires Mar 11, 2025
@kundadebdatta kundadebdatta self-assigned this Mar 11, 2025
@kundadebdatta kundadebdatta added auto-merge Enables automation to merge PRs PerPartitionAutomaticFailover labels Mar 11, 2025
@kundadebdatta kundadebdatta changed the title [PPAF] - Adds Code to Apply Partition Level Override When a Requested Cancellation Token Expires [Internal] ClientRetryPolicy: Adds Code to Apply Partition Level Override When a Requested Cancellation Token Expires Mar 11, 2025
@kundadebdatta kundadebdatta marked this pull request as ready for review March 11, 2025 18:33
@microsoft-github-policy-service microsoft-github-policy-service bot enabled auto-merge (squash) March 11, 2025 18:34
ananth7592
ananth7592 previously approved these changes Mar 11, 2025
@kundadebdatta kundadebdatta force-pushed the users/kundadebdatta/5060_ppaf_add_partition_level_override_on_ct_expiry branch from c50b59a to 30de5b2 Compare March 19, 2025 20:51
@kundadebdatta kundadebdatta marked this pull request as draft March 21, 2025 05:15
auto-merge was automatically disabled March 21, 2025 05:15

Pull request was converted to draft

@kundadebdatta kundadebdatta marked this pull request as ready for review March 22, 2025 04:13
@microsoft-github-policy-service microsoft-github-policy-service bot enabled auto-merge (squash) March 22, 2025 04:14
NaluTripician
NaluTripician previously approved these changes Mar 24, 2025
Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@microsoft-github-policy-service microsoft-github-policy-service bot merged commit 2faaba4 into master Mar 26, 2025
26 checks passed
@microsoft-github-policy-service microsoft-github-policy-service bot deleted the users/kundadebdatta/5060_ppaf_add_partition_level_override_on_ct_expiry branch March 26, 2025 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Enables automation to merge PRs PerPartitionAutomaticFailover
Projects
None yet
5 participants