Summary
All rate limit buckets for a single entity share the same DynamoDB partition key (namespace/ENTITY#{id}). A high-traffic entity can exceed DynamoDB's per-partition throughput limits (~1,000 WCU/sec), causing throttling that degrades service for that entity — and potentially co-located entities in the same partition.
Details
Each acquire() call performs a TransactWriteItems (or UpdateItem in speculative mode) against items sharing the same partition key. For cascade entities, this doubles to 2-4 writes per request (child + parent). At sustained rates above ~500 req/sec for a single entity, DynamoDB's adaptive capacity may not redistribute fast enough, causing ProvisionedThroughputExceededException.
The library has no built-in mitigation:
- No partition key sharding/salting
- No write coalescing or batching
- No client-side admission control before hitting DynamoDB
RateLimiterUnavailable is raised but the caller has already been delayed
Impact
- Availability: High-traffic entities experience elevated latency and rejected requests beyond what their rate limits specify
- Fairness: Other entities sharing the same DynamoDB partition may experience collateral throttling
- Multi-tenant risk: In a shared LLM proxy scenario, one tenant's burst traffic could degrade service for others
Reproduction
- Create an entity with high rate limits (e.g., 100,000 rpm)
- Send sustained traffic at 1,000+ req/sec to a single entity
- Observe DynamoDB
ThrottledRequests CloudWatch metric increasing
- Observe
acquire() latency spikes and RateLimiterUnavailable exceptions
Remediation Design: Pre-Shard Buckets
- Move buckets to
PK={ns}/BUCKET#{entity}#{resource}#{shard}, SK=#STATE — one partition per (entity, resource, shard)
- Auto-inject
wcu:1000 reserved limit on every bucket — tracks DynamoDB partition write pressure in-band (name may change during implementation)
- Shard doubling (1→2→4→8) triggered by client on
wcu exhaustion or proactively by aggregator
- Shard 0 at suffix
#0 is source of truth for shard_count. Aggregator propagates to other shards
- Original limits stored on bucket, effective limits derived:
original / shard_count. Infrastructure limits (wcu) not divided
- Shard selection: random/round-robin. On application limit exhaustion, retry on another shard (max 2 retries)
- Lazy shard creation on first access
- Bucket discovery via GSI3 (KEYS_ONLY) + BatchGetItem. GSI2 for resource aggregation unchanged
- Cascade: parent unaware, protected by own
wcu
- Aggregator: parse new PK format, key by shard_id, effective limits for refill, filter
wcu from snapshots
- Clean break migration: schema version bump, old buckets ignored, new buckets created on first access
- $0.625/M preserved on hot path
References
Summary
All rate limit buckets for a single entity share the same DynamoDB partition key (
namespace/ENTITY#{id}). A high-traffic entity can exceed DynamoDB's per-partition throughput limits (~1,000 WCU/sec), causing throttling that degrades service for that entity — and potentially co-located entities in the same partition.Details
Each
acquire()call performs aTransactWriteItems(orUpdateItemin speculative mode) against items sharing the same partition key. For cascade entities, this doubles to 2-4 writes per request (child + parent). At sustained rates above ~500 req/sec for a single entity, DynamoDB's adaptive capacity may not redistribute fast enough, causingProvisionedThroughputExceededException.The library has no built-in mitigation:
RateLimiterUnavailableis raised but the caller has already been delayedImpact
Reproduction
ThrottledRequestsCloudWatch metric increasingacquire()latency spikes andRateLimiterUnavailableexceptionsRemediation Design: Pre-Shard Buckets
PK={ns}/BUCKET#{entity}#{resource}#{shard}, SK=#STATE— one partition per (entity, resource, shard)wcu:1000reserved limit on every bucket — tracks DynamoDB partition write pressure in-band (name may change during implementation)wcuexhaustion or proactively by aggregator#0is source of truth forshard_count. Aggregator propagates to other shardsoriginal / shard_count. Infrastructure limits (wcu) not dividedwcuwcufrom snapshotsReferences