Early yield on 429 throttling on barrier requests#48914
Early yield on 429 throttling on barrier requests#48914mbhaskar wants to merge 5 commits intoAzure:mainfrom
Conversation
Port of .NET PR #1667829: When receiving repeated 429 (Too Many Requests) responses with strong consistency, QuorumReader and ConsistencyWriter now handle throttling more efficiently. QuorumReader (reads): - waitForReadBarrierAsync: yield early when all replicas return 429 in both single-region and multi-region barrier loops - ensureQuorumSelectedStoreResponse: yield early when all replicas throttled during initial quorum read - All cases throw the 429 exception to let ResourceThrottleRetryPolicy handle retry with appropriate backoff ConsistencyWriter (writes): - waitForWriteBarrierAsync: track lastAttemptWasThrottled flag per iteration - Do NOT yield early (preserves idempotency guarantees) - When all retries exhausted due to consistent throttling, throw RequestTimeoutException (408) with substatus SERVER_WRITE_BARRIER_THROTTLED (21013) instead of returning barrier-not-met Other changes: - Added isThrottledException field to StoreResult - Added SERVER_WRITE_BARRIER_THROTTLED (21013) substatus code - Unit tests for all throttling scenarios Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ed replica yields early Port of .NET test ValidatesReadMultipleReplicaAsyncExcludesGoneReplicas. Validates that when replicas return a mix of 410 (Gone) and 429 (TooManyRequests): - Gone replicas are excluded from results by StoreReader (isValid=false for GONE) - The 429 replica with valid LSN headers is kept (isValid=true for non-GONE with lsn>=0) - Since all remaining replicas are throttled, early yield triggers - The 429 exception propagates to ResourceThrottleRetryPolicy Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates Cosmos direct connectivity quorum/barrier logic to “yield early” when replica reads are uniformly throttled (HTTP 429), allowing the existing ResourceThrottleRetryPolicy to apply appropriate backoff instead of progressing into additional quorum/primary/barrier attempts.
Changes:
- Add
StoreResult.isThrottledExceptionto cheaply detect 429 responses. - In
QuorumReader, propagate 429 immediately when all collected replica results are throttled (including barrier paths). - In
ConsistencyWriter, track throttling during write barriers and, when retries are exhausted and the last attempt was fully throttled, throw aRequestTimeoutExceptionwith a new substatus code; add unit tests for the new behaviors.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/StoreResult.java | Adds a computed flag to identify throttling (429) on replica results. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/QuorumReader.java | Early-yields on replica-wide throttling to let throttle retry policy handle backoff. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/ConsistencyWriter.java | Tracks throttling during write barriers and surfaces a distinct timeout substatus when retries are exhausted. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/HttpConstants.java | Introduces a new substatus code for write-barrier throttling exhaustion. |
| sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/directconnectivity/QuorumReaderTest.java | Adds unit tests covering 429 propagation and Gone+429 interactions. |
| sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/directconnectivity/ConsistencyWriterTest.java | Adds unit tests for write-barrier behavior under sustained throttling and mixed outcomes. |
| return Mono.error(responseResult.get(0).getException()); | ||
| } | ||
|
|
||
| int responseCount= (int) responseResult.stream().filter(response -> response.isValid).count(); |
There was a problem hiding this comment.
WhitespaceAround checkstyle: add spaces around the '=' operator (e.g., int responseCount = ...). As written, this will likely fail checkstyle and break the build.
| int responseCount= (int) responseResult.stream().filter(response -> response.isValid).count(); | |
| int responseCount = (int) responseResult.stream().filter(response -> response.isValid).count(); |
| // For writes, we do NOT yield early - continue retries to preserve idempotency. | ||
| // If all retries are exhausted due to consistent throttling, we throw 408 (RequestTimeout). | ||
| if (responses != null && !responses.isEmpty() | ||
| && responses.stream().allMatch(r -> r.isThrottledException)) { |
There was a problem hiding this comment.
🔴 Blocking · Correctness: Stale State
lastAttemptWasThrottled not reset when avoidQuorumSelection path bypasses throttle tracking
The isAvoidQuorumSelectionStoreResult branch (lines 548—562) returns early without updating lastAttemptWasThrottled. If a prior iteration set this flag to true (all replicas returned 429), and a subsequent iteration enters the avoidQuorum path (replica returns 410/LeaseNotFound), the flag remains stale.
When isBarrierMeetPossibleInPresenceOfAvoidQuorumSelectionException returns Mono.empty() (continue loop) and retries exhaust, the check at line 520 sees lastAttemptWasThrottled == true and throws RequestTimeoutException(408, 21013) instead of the correct behavior of returning false → GoneException.
Concrete scenario:
- Iteration N: all replicas return 429 →
lastAttemptWasThrottled = true· retryCount decremented - Iteration N+1: a replica returns 410/LeaseNotFound → avoidQuorum path fires,
lastAttemptWasThrottledNOT reset isBarrierMeetPossibleInPresenceOfAvoidQuorumSelectionException→ primary check returns insufficient LSN →Mono.empty()→ loop continues- Retries exhaust → incorrect 408/21013 instead of the normal barrier-not-met flow
Suggested fix: Reset the flag before the avoidQuorum early return:
if (isAvoidQuorumSelectionStoreResult) {
writeBarrierRetryCount.decrementAndGet();
lastAttemptWasThrottled.set(false); // This attempt was not a throttle
return this.isBarrierMeetPossibleInPresenceOfAvoidQuorumSelectionException(...);
}There was a problem hiding this comment.
Good catch — fixed in e80964d. Added \lastAttemptWasThrottled.set(false)\ before the avoidQuorumSelection early return to prevent the stale flag scenario you described.
| } | ||
|
|
||
| if (writeBarrierRetryCount.get() == 0) { | ||
| if (lastAttemptWasThrottled.get()) { |
There was a problem hiding this comment.
🟡 Recommendation · Correctness: Behavioral Change
Write barrier throttle failure changes exception type from retryable to non-retryable
Before this PR, when write barrier retries exhausted (for any reason), waitForWriteBarrierAsync returned false, and the caller at barrierForWriteRequests (line 419—427) threw GoneException (substatus GLOBAL_STRONG_WRITE_BARRIER_NOT_MET). GoneException IS retried by GoneAndRetryWithRetryPolicy.
With this change, throttle-specific exhaustion throws RequestTimeoutException(408, 21013) directly, which is not retried by GoneAndRetryWithRetryPolicy (isNonRetryableException() returns true for RequestTimeoutException).
This is likely the intended behavior (avoid re-executing committed writes), but it is a significant behavioral change:
- Customers under heavy throttling will now see terminal 408 errors instead of automatic retry
- The .NET SDK (azure-cosmos-dotnet-v3#5155) takes the opposite approach — yielding early with 429 (which IS retried by the throttle policy)
Please confirm this is intentional. If so, consider documenting the new substatus code 21013 so customers can handle it in their retry logic. Also consider whether ClientRetryPolicy.shouldRetryOnRequestTimeout should avoid marking the endpoint as unavailable for this specific substatus (since it's a transient throttle issue, not an endpoint problem).
There was a problem hiding this comment.
This is intentional — the .NET SDK PR (1667829) takes the same approach for writes: continue barrier retries and throw \RequestTimeoutException(408, Server_WriteBarrierThrottled)\ when retries exhaust. The design rationale is to preserve idempotency guarantees for writes (avoiding re-execution of committed writes). The \ClientRetryPolicy.shouldRetryOnRequestTimeout\ point is worth investigating separately — will track.
| if (!responseResult.isEmpty() && responseResult.stream().allMatch(r -> r.isThrottledException)) { | ||
| logger.info("QuorumReader: ensureQuorumSelectedStoreResponse - All replicas returned 429 Too Many Requests. " | ||
| + "Yielding early to ResourceThrottleRetryPolicy."); | ||
| return Mono.error(responseResult.get(0).getException()); |
There was a problem hiding this comment.
🟢 Suggestion · Correctness: Exception Selection
Exception selection uses first response without considering retryAfterInMs
responseResult.get(0).getException() propagates the exception from the first (arbitrary) replica. Different replicas may return different Retry-After header values. ResourceThrottleRetryPolicy uses this value for backoff timing. Picking the first response could use a too-short backoff, causing the retry to be immediately throttled again.
Suggested improvement:
StoreResult maxRetryResult = responseResult.stream()
.max(Comparator.comparingDouble(r -> r.retryAfterInMs != null ? r.retryAfterInMs : 0))
.get();
return Mono.error(maxRetryResult.getException());The same pattern applies at lines 708 and 803.
There was a problem hiding this comment.
Not acting on this for now — the .NET PR design notes explicitly state: \different replica may return different retry-after value, but it won't matter which one we use as load between replica should be relatively equal. The first response is acceptable.
|
@sdkReviewAgent |
|
✅ Review complete (49:16) Posted 1 inline comment(s). Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
Fixes from xinlian12 (blocking): - Fix lastAttemptWasThrottled stale state: reset flag before avoidQuorumSelection early return to prevent incorrect 408 when prior iteration was throttled but current iteration hits 410 - Fix readStrong_AllReplicasThrottled_Returns429 false positive: set LSN on exception so StoreReader marks isValid=true, ensuring the early yield path is actually exercised. Add transport invocation count assertions to verify primary read is NOT attempted. - Add readStrong_BarrierRequestsThrottled_Returns429 test covering the waitForReadBarrierAsync barrier path (quorum succeeds, then barrier HEAD requests return 429) Fixes from Copilot review: - Fix checkstyle: add missing spaces around = operator (2 places) - Fix log wording: 'All replicas' -> 'All contacted replicas' (more accurate since not all replicas may be contacted per attempt) - Fix ConsistencyWriter log: 'consistent throttling' -> 'last attempt was throttled' (flag only tracks last attempt, not all attempts) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tling Unit tests (4 new, 10 total throttling tests): - writeBarrier_AvoidQuorumSelectionAfterThrottling_NoFalse408: validates lastAttemptWasThrottled reset on avoidQuorumSelection path (stale state fix) - writeBarrier_NRegionCommit_AllReplicasThrottled_Returns408: N-region synchronous commit barrier throttling produces 408/21013 - readStrong_QuorumNotSelected_PrimaryThrottled_Returns429: primary 429 propagates correctly through QuorumNotSelected → readPrimary path - readStrong_BarrierPartialThrottle_StillSucceeds: barrier succeeds when one replica is throttled but other meets LSN (no false-negative yield) Fault injection E2E tests (3 new, require strong consistency account): - faultInjection_readBarrierThrottled_yieldsEarly: inject 429 on HEAD_COLLECTION + GCLSN interceptor → verify early yield on reads - faultInjection_writeBarrierThrottled_returns408: inject 429 on HEAD_COLLECTION + GCLSN interceptor → verify 408 on writes - faultInjection_readBarrierThrottled_thenRecovers: inject 429 with hitLimit(2) → verify read succeeds after throttle clears Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Read and write barrier requests are only triggered on multi-region strong consistency accounts (numberOfReadRegions > 0). The emulator is single-region, so the GCLSN interceptor never triggers barriers and the tests fail with empty supplementalResponseStatisticsList. Added accountLevelReadRegions.size() > 1 skip check to all three E2E fault injection tests so they correctly skip on single-region environments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run java - cosmos - tests |
|
No pipelines are associated with this pull request. |
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description
This PR introduces early yield on 429s during barrier requests.
When receiving 429s with strong consistency, quorum reader/ writer code does not yield early enough creating multiple stack traces resulting into resource constraints on the client side.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines