Skip to content

Early yield on 429 throttling on barrier requests#48914

Open
mbhaskar wants to merge 5 commits intoAzure:mainfrom
mbhaskar:early-yield-on-429-throttling
Open

Early yield on 429 throttling on barrier requests#48914
mbhaskar wants to merge 5 commits intoAzure:mainfrom
mbhaskar:early-yield-on-429-throttling

Conversation

@mbhaskar
Copy link
Copy Markdown
Member

@mbhaskar mbhaskar commented Apr 23, 2026

Description

This PR introduces early yield on 429s during barrier requests.
When receiving 429s with strong consistency, quorum reader/ writer code does not yield early enough creating multiple stack traces resulting into resource constraints on the client side.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

mbhaskar and others added 2 commits April 20, 2026 15:28
Port of .NET PR #1667829: When receiving repeated 429 (Too Many Requests)
responses with strong consistency, QuorumReader and ConsistencyWriter now
handle throttling more efficiently.

QuorumReader (reads):
- waitForReadBarrierAsync: yield early when all replicas return 429 in both
  single-region and multi-region barrier loops
- ensureQuorumSelectedStoreResponse: yield early when all replicas throttled
  during initial quorum read
- All cases throw the 429 exception to let ResourceThrottleRetryPolicy
  handle retry with appropriate backoff

ConsistencyWriter (writes):
- waitForWriteBarrierAsync: track lastAttemptWasThrottled flag per iteration
- Do NOT yield early (preserves idempotency guarantees)
- When all retries exhausted due to consistent throttling, throw
  RequestTimeoutException (408) with substatus SERVER_WRITE_BARRIER_THROTTLED
  (21013) instead of returning barrier-not-met

Other changes:
- Added isThrottledException field to StoreResult
- Added SERVER_WRITE_BARRIER_THROTTLED (21013) substatus code
- Unit tests for all throttling scenarios

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ed replica yields early

Port of .NET test ValidatesReadMultipleReplicaAsyncExcludesGoneReplicas.
Validates that when replicas return a mix of 410 (Gone) and 429 (TooManyRequests):
- Gone replicas are excluded from results by StoreReader (isValid=false for GONE)
- The 429 replica with valid LSN headers is kept (isValid=true for non-GONE with lsn>=0)
- Since all remaining replicas are throttled, early yield triggers
- The 429 exception propagates to ResourceThrottleRetryPolicy

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 23, 2026 18:04
@mbhaskar mbhaskar requested review from a team and kirankumarkolli as code owners April 23, 2026 18:04
@mbhaskar mbhaskar changed the title Early yield on 429 throttling Early yield on 429 throttling on barrier requests Apr 23, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Cosmos direct connectivity quorum/barrier logic to “yield early” when replica reads are uniformly throttled (HTTP 429), allowing the existing ResourceThrottleRetryPolicy to apply appropriate backoff instead of progressing into additional quorum/primary/barrier attempts.

Changes:

  • Add StoreResult.isThrottledException to cheaply detect 429 responses.
  • In QuorumReader, propagate 429 immediately when all collected replica results are throttled (including barrier paths).
  • In ConsistencyWriter, track throttling during write barriers and, when retries are exhausted and the last attempt was fully throttled, throw a RequestTimeoutException with a new substatus code; add unit tests for the new behaviors.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/StoreResult.java Adds a computed flag to identify throttling (429) on replica results.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/QuorumReader.java Early-yields on replica-wide throttling to let throttle retry policy handle backoff.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/ConsistencyWriter.java Tracks throttling during write barriers and surfaces a distinct timeout substatus when retries are exhausted.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/HttpConstants.java Introduces a new substatus code for write-barrier throttling exhaustion.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/directconnectivity/QuorumReaderTest.java Adds unit tests covering 429 propagation and Gone+429 interactions.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/directconnectivity/ConsistencyWriterTest.java Adds unit tests for write-barrier behavior under sustained throttling and mixed outcomes.

return Mono.error(responseResult.get(0).getException());
}

int responseCount= (int) responseResult.stream().filter(response -> response.isValid).count();
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WhitespaceAround checkstyle: add spaces around the '=' operator (e.g., int responseCount = ...). As written, this will likely fail checkstyle and break the build.

Suggested change
int responseCount= (int) responseResult.stream().filter(response -> response.isValid).count();
int responseCount = (int) responseResult.stream().filter(response -> response.isValid).count();

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e80964d.

// For writes, we do NOT yield early - continue retries to preserve idempotency.
// If all retries are exhausted due to consistent throttling, we throw 408 (RequestTimeout).
if (responses != null && !responses.isEmpty()
&& responses.stream().allMatch(r -> r.isThrottledException)) {
Copy link
Copy Markdown
Member

@xinlian12 xinlian12 Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking · Correctness: Stale State

lastAttemptWasThrottled not reset when avoidQuorumSelection path bypasses throttle tracking

The isAvoidQuorumSelectionStoreResult branch (lines 548—562) returns early without updating lastAttemptWasThrottled. If a prior iteration set this flag to true (all replicas returned 429), and a subsequent iteration enters the avoidQuorum path (replica returns 410/LeaseNotFound), the flag remains stale.

When isBarrierMeetPossibleInPresenceOfAvoidQuorumSelectionException returns Mono.empty() (continue loop) and retries exhaust, the check at line 520 sees lastAttemptWasThrottled == true and throws RequestTimeoutException(408, 21013) instead of the correct behavior of returning falseGoneException.

Concrete scenario:

  1. Iteration N: all replicas return 429 → lastAttemptWasThrottled = true· retryCount decremented
  2. Iteration N+1: a replica returns 410/LeaseNotFound → avoidQuorum path fires, lastAttemptWasThrottled NOT reset
  3. isBarrierMeetPossibleInPresenceOfAvoidQuorumSelectionException → primary check returns insufficient LSN → Mono.empty() → loop continues
  4. Retries exhaust → incorrect 408/21013 instead of the normal barrier-not-met flow

Suggested fix: Reset the flag before the avoidQuorum early return:

if (isAvoidQuorumSelectionStoreResult) {
    writeBarrierRetryCount.decrementAndGet();
    lastAttemptWasThrottled.set(false);  // This attempt was not a throttle
    return this.isBarrierMeetPossibleInPresenceOfAvoidQuorumSelectionException(...);
}

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in e80964d. Added \lastAttemptWasThrottled.set(false)\ before the avoidQuorumSelection early return to prevent the stale flag scenario you described.

}

if (writeBarrierRetryCount.get() == 0) {
if (lastAttemptWasThrottled.get()) {
Copy link
Copy Markdown
Member

@xinlian12 xinlian12 Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation · Correctness: Behavioral Change

Write barrier throttle failure changes exception type from retryable to non-retryable

Before this PR, when write barrier retries exhausted (for any reason), waitForWriteBarrierAsync returned false, and the caller at barrierForWriteRequests (line 419—427) threw GoneException (substatus GLOBAL_STRONG_WRITE_BARRIER_NOT_MET). GoneException IS retried by GoneAndRetryWithRetryPolicy.

With this change, throttle-specific exhaustion throws RequestTimeoutException(408, 21013) directly, which is not retried by GoneAndRetryWithRetryPolicy (isNonRetryableException() returns true for RequestTimeoutException).

This is likely the intended behavior (avoid re-executing committed writes), but it is a significant behavioral change:

  • Customers under heavy throttling will now see terminal 408 errors instead of automatic retry
  • The .NET SDK (azure-cosmos-dotnet-v3#5155) takes the opposite approach — yielding early with 429 (which IS retried by the throttle policy)

Please confirm this is intentional. If so, consider documenting the new substatus code 21013 so customers can handle it in their retry logic. Also consider whether ClientRetryPolicy.shouldRetryOnRequestTimeout should avoid marking the endpoint as unavailable for this specific substatus (since it's a transient throttle issue, not an endpoint problem).


⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional — the .NET SDK PR (1667829) takes the same approach for writes: continue barrier retries and throw \RequestTimeoutException(408, Server_WriteBarrierThrottled)\ when retries exhaust. The design rationale is to preserve idempotency guarantees for writes (avoiding re-execution of committed writes). The \ClientRetryPolicy.shouldRetryOnRequestTimeout\ point is worth investigating separately — will track.

if (!responseResult.isEmpty() && responseResult.stream().allMatch(r -> r.isThrottledException)) {
logger.info("QuorumReader: ensureQuorumSelectedStoreResponse - All replicas returned 429 Too Many Requests. "
+ "Yielding early to ResourceThrottleRetryPolicy.");
return Mono.error(responseResult.get(0).getException());
Copy link
Copy Markdown
Member

@xinlian12 xinlian12 Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Correctness: Exception Selection

Exception selection uses first response without considering retryAfterInMs

responseResult.get(0).getException() propagates the exception from the first (arbitrary) replica. Different replicas may return different Retry-After header values. ResourceThrottleRetryPolicy uses this value for backoff timing. Picking the first response could use a too-short backoff, causing the retry to be immediately throttled again.

Suggested improvement:

StoreResult maxRetryResult = responseResult.stream()
    .max(Comparator.comparingDouble(r -> r.retryAfterInMs != null ? r.retryAfterInMs : 0))
    .get();
return Mono.error(maxRetryResult.getException());

The same pattern applies at lines 708 and 803.


⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not acting on this for now — the .NET PR design notes explicitly state: \different replica may return different retry-after value, but it won't matter which one we use as load between replica should be relatively equal. The first response is acceptable.

@xinlian12
Copy link
Copy Markdown
Member

@sdkReviewAgent

@xinlian12
Copy link
Copy Markdown
Member

Review complete (49:16)

Posted 1 inline comment(s).

Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage

mbhaskar and others added 3 commits April 27, 2026 14:19
Fixes from xinlian12 (blocking):
- Fix lastAttemptWasThrottled stale state: reset flag before
  avoidQuorumSelection early return to prevent incorrect 408 when
  prior iteration was throttled but current iteration hits 410
- Fix readStrong_AllReplicasThrottled_Returns429 false positive: set LSN
  on exception so StoreReader marks isValid=true, ensuring the early yield
  path is actually exercised. Add transport invocation count assertions
  to verify primary read is NOT attempted.
- Add readStrong_BarrierRequestsThrottled_Returns429 test covering the
  waitForReadBarrierAsync barrier path (quorum succeeds, then barrier
  HEAD requests return 429)

Fixes from Copilot review:
- Fix checkstyle: add missing spaces around = operator (2 places)
- Fix log wording: 'All replicas' -> 'All contacted replicas' (more
  accurate since not all replicas may be contacted per attempt)
- Fix ConsistencyWriter log: 'consistent throttling' -> 'last attempt
  was throttled' (flag only tracks last attempt, not all attempts)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tling

Unit tests (4 new, 10 total throttling tests):
- writeBarrier_AvoidQuorumSelectionAfterThrottling_NoFalse408: validates
  lastAttemptWasThrottled reset on avoidQuorumSelection path (stale state fix)
- writeBarrier_NRegionCommit_AllReplicasThrottled_Returns408: N-region
  synchronous commit barrier throttling produces 408/21013
- readStrong_QuorumNotSelected_PrimaryThrottled_Returns429: primary 429
  propagates correctly through QuorumNotSelected → readPrimary path
- readStrong_BarrierPartialThrottle_StillSucceeds: barrier succeeds when
  one replica is throttled but other meets LSN (no false-negative yield)

Fault injection E2E tests (3 new, require strong consistency account):
- faultInjection_readBarrierThrottled_yieldsEarly: inject 429 on
  HEAD_COLLECTION + GCLSN interceptor → verify early yield on reads
- faultInjection_writeBarrierThrottled_returns408: inject 429 on
  HEAD_COLLECTION + GCLSN interceptor → verify 408 on writes
- faultInjection_readBarrierThrottled_thenRecovers: inject 429 with
  hitLimit(2) → verify read succeeds after throttle clears

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Read and write barrier requests are only triggered on multi-region strong
consistency accounts (numberOfReadRegions > 0). The emulator is single-region,
so the GCLSN interceptor never triggers barriers and the tests fail with
empty supplementalResponseStatisticsList.

Added accountLevelReadRegions.size() > 1 skip check to all three E2E fault
injection tests so they correctly skip on single-region environments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mbhaskar
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

No pipelines are associated with this pull request.

@mbhaskar
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants