QuorumReader code not yielding on 429#5155
Conversation
fix:refactoring tests fix: removing secrets
There was a problem hiding this comment.
Please follow the required format: "[Internal] Category: (Adds|Fixes|Refactors|Removes) Description"
Internal should be used for PRs that have no customer impact. This flag is used to help generate the changelog to know which PRs should be included. Examples:
Diagnostics: Adds GetElapsedClientLatency to CosmosDiagnostics
PartitionKey: Fixes null reference when using default(PartitionKey)
[v4] Client Encryption: Refactors code to external project
[Internal] Query: Adds code generator for CosmosNumbers for easy additions in the future.
| @@ -153,7 +153,6 @@ public async Task<ShouldRetryResult> ShouldRetryAsync( | |||
| CancellationToken cancellationToken) | |||
| { | |||
| this.retryContext = null; | |||
There was a problem hiding this comment.
will do. its just draft PR I will not include it in the direct package PR>
|
|
||
| // Check if all replicas returned 429 | ||
| // TO DO: check with Kiran/Fabian if 429 from one replica is going to be enough to yield but that may be turn out to be flase alarm considering calls are async so the 429 state could be transient | ||
| if (responses.All(response => response.Target.StatusCode == StatusCodes.TooManyRequests)) |
There was a problem hiding this comment.
I think All make sense.I will remove the comment.
There was a problem hiding this comment.
With exceptions is the behavior all 429's from both replica?
There was a problem hiding this comment.
with one replica failure its returning successful response as expected. updated the PR with additional test case coverage and included the diagnostics for the same
| readBarrierLsn: secondaryQuorumReadResult.SelectedLsn, | ||
| targetGlobalCommittedLSN: secondaryQuorumReadResult.GlobalCommittedSelectedLsn, | ||
| readMode: readMode)) | ||
| #pragma warning disable CS8632 // The annotation for nullable reference types should only be used in code within a '#nullable' annotations context. |
There was a problem hiding this comment.
Can you please check the QuorumWrite paths also once.
There was a problem hiding this comment.
for 429 errors and implement the same there?
| if (isThrottled && throttledResponse != null) | ||
| { | ||
| // Handle throttling by returning the throttled response | ||
| DefaultTrace.TraceWarning("WritePrivateAsync: Throttling occurred during write barrier. Returning throttled response."); |
There was a problem hiding this comment.
Lets trace the status and sub status codes along.
| if (responses != null) | ||
| { | ||
| return true; | ||
| // Check if all replicas returned 429 |
There was a problem hiding this comment.
Will a helper method help avoid duplication?
When receiving 429s on a single region account with strong consistency, quorum reader code does not yield on QuorumRead and Barrier.
Single429Failure-BarrierCalls.json
429WriteBarrierCallsWithoutFix.json
429WriteBarrierCallsWithFix.json
// original issue diagnostics - QuorumRead
429-OriginalIssueWithOutFix.json
// verifying existing behavior by turning off exceptionless for 429
429-OriginalIssueWithExceptionLessTurnedoOfFor429.json
//fix Diagnostics with a large number of 429 failures- - QuorumRead
429-WithFixAll429s.json
//fix Diagnostics with fewer 429 failures resulting in eventual success- - QuorumRead
429-Withfewer429sResultsInEventualSuccess.json
// original issue diagnostics - Barrier calls
429 BarrierCallsOriginalIssueWithDiagnostics.json
// Barrier calls fix with Diagnostics
429-BarrierCallsFixWithDianogstics.json
Unit tests for QuorumReader is pending which I will submit along with the Direct code changes.
To automatically close an issue: closes #5035