fix(breaker): avoid markDrop feedback loop for breaker's own rejection#5551
fix(breaker): avoid markDrop feedback loop for breaker's own rejection#5551bigrocs wants to merge 1 commit into
Conversation
When googleBreaker rejects a request (ErrServiceUnavailable), calling markDrop() inflates bucket.Sum which feeds into history.total, creating a feedback loop that prevents the breaker from ever recovering. Skip markDrop() when the error is the breaker's own rejection, since it is not a real backend failure and should not be counted in statistics. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@kevwan Please ask Mr. Wan to merge this bug fix as soon as possible. It has caused significant issues for our high-concurrency payment scenarios. |
ReviewThe fix correctly identifies and resolves a feedback loop in the Google SRE breaker algorithm. Analysis: The original code called The feedback loop (original behavior):
The fix: Skip Code quality:
One consideration: Is there any scenario where Suggestion (non-blocking): A unit test that demonstrates the recovery path when the breaker was previously stuck (i.e., without the fix the breaker would never recover) would strengthen confidence, but the logic is sound without it. LGTM — this is a valid bug fix for a real production issue. |
When googleBreaker rejects a request (ErrServiceUnavailable), calling markDrop() inflates bucket.Sum which feeds into history.total, creating a feedback loop that prevents the breaker from ever recovering.
Skip markDrop() when the error is the breaker's own rejection, since it is not a real backend failure and should not be counted in statistics.