🐛 fix(repository): restore slow-path bucket read dropped by shard migration (backport to 0.10.x)#431
Merged
Merged
Conversation
…ration
batch_get_entity_and_buckets() classified BatchGetItem response items with
`sk.startswith(SK_BUCKET)` ("#BUCKET#"), but bucket state records use
SK="#STATE" since the per-shard migration (GHSA-76rv). The filter never
matched, so every bucket was silently dropped: the acquire() slow path
treated existing buckets as new, the conditional write failed, and acquire()
raised RateLimitExceeded with retry_after=0.0 instead of refilling.
This broke wait-then-acquire refill recovery: with speculative_writes=True
the refill fallback fails (masked in production by the aggregator Lambda);
with speculative_writes=False or --no-aggregator it is fully broken.
Latent since v0.7.0 (663b687, else->elif startswith(SK_BUCKET)); activated in
v0.10.1 (94a129a) when bucket SK changed to "#STATE" without updating this
response-classification filter.
Add regression tests the suite lacked: direct batch_get_entity_and_buckets
coverage (incl. the no-#META trigger) and a behavioral wait-then-acquire test
on both speculative and non-speculative paths. Also note a TODO to make
_now_ms() the single injectable clock seam.
Fixes #428
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 7e61fd3)
…very Exercises the real-DynamoDB code path the unit tests (moto) and the production aggregator both masked for the bug fixed in 7e61fd3: - integration (test_repository.py): batch_get_entity_and_buckets() returns existing buckets against real DynamoDB, incl. the no-#META trigger. The sibling batch_get_buckets() was already covered here; the method that broke was not — that gap let the regression ship. - e2e (test_localstack.py): exhaust -> wait -> acquire succeeds on the no-aggregator stack (client refill-recovery slow path). - benchmark (test_localstack.py): a recovery stress loop over multiple entities. All three run on no-aggregator (minimal) stacks on purpose, so the aggregator cannot refill out-of-band and mask the client path that broke. Verified RED before the fix / GREEN after, against a live LocalStack. Refs #428 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit 8123fea)
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## release/0.10.x #431 +/- ##
==================================================
+ Coverage 92.22% 92.26% +0.03%
==================================================
Files 36 36
Lines 7830 7830
==================================================
+ Hits 7221 7224 +3
+ Misses 609 606 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Backports the slow-path bucket-read fix from #429 (merged to
main) to therelease/0.10.xmaintenance branch for a v0.10.3 patch release.Bug:
Repository.batch_get_entity_and_buckets()— theacquire()slow-path / refill-recovery read — classifiedBatchGetItemresponse items using the stalesk.startswith(schema.SK_BUCKET)("#BUCKET#") prefix. Bucket state records useSK == "#STATE"(sk_state()) since the per-shard partition-key migration (GHSA-76rv-2r9v-c5m6), so the filter never matched and every bucket was silently dropped. The slow path then treated existing buckets as new, the conditional write failed, andacquire()raisedRateLimitExceededwithretry_after=0.0instead of refilling.Effect: wait-then-acquire refill recovery was broken.
speculative_writes=True, it breaks the refill fallback (masked in production only when the aggregator Lambda refills out-of-band).speculative_writes=Falseor--no-aggregator, it is fully broken.This bug shipped in v0.10.1 and is present in v0.10.2; this PR delivers the fix as v0.10.3.
Fix: one line —
elif sk.startswith(schema.SK_BUCKET):→elif sk == schema.sk_state():(async sourcerepository.py; regenerated intosync_repository.py).Cherry-picked cleanly from
maincommits7e61fd3(fix + unit tests) and8123fea(LocalStack tests), with-xprovenance recorded.Test plan
All verified locally on this branch against live LocalStack:
batch_get_entity_and_bucketsreturns existing buckets incl. no-#METAtrigger; wait-then-acquire recovery on speculative + non-speculative paths (async + generated sync) — 8 passed.ruff+mypyclean.References
main; referenced here as context only)This PR targets the
0.10.xmaintenance line.🤖 Generated with Claude Code