- Channel/Interviewer: Tech Dummies - Narendra Lakshmana Gowda
- Duration: 00:35:55
- Original Video: https://www.youtube.com/watch?v=mhUQe4BKZXs
This document summarizes the key content of a system design mock interview. I highly recommend watching the full video if you can.
Teach Me: 5 Years Old | Beginner | Intermediate | Advanced | (reset auto redirect)
Learn Differently: Analogy | Storytelling | Cheatsheet | Mindmap | Flashcards | Practical Projects | Code Examples | Common Mistakes
Check Understanding: Generate Quiz | Interview Me | Refactor Challenge | Assessment Rubric | Next Steps
- Problem Prompt (One-liner): Design a rate limiter that protects APIs from abuse, supports paid vs. free quotas, and works in distributed deployments.
- Primary Scope: Core rate-limiting algorithms (token bucket, leaky bucket, fixed window counter, sliding logs, sliding-window counter) and their behavior under bursty traffic and multi-region setups.
Out of Scope: Exact API contracts, production SLAs/metrics, vendor-specific configs. - Non-Functional Priorities: Consistent enforcement, low latency overhead, memory efficiency, fair usage, resilience to DDoS/brute force, and cost control.
- Key Constraints & Numbers (as examples in video): per-user limits like 5 req/min or 10 req/min; developer trial quotas; concurrency limits (parallel sessions); geo/IP scoping.
- High-Level Architecture (Text):
- Clients → Load Balancer(s)
- Rate Limiter service (enforcement)
- Shared store for counters/logs (e.g., Redis/Cassandra)
- Application servers
- Optional: local in-memory cache + background sync to shared store
- Optional: sticky sessions or distributed locks to reduce races (with trade-offs)
- Top Trade-offs:
- Burst tolerance vs. smoothness (token bucket vs. leaky bucket)
- Precision vs. memory (sliding logs vs. sliding-window counter)
- Strict global accuracy vs. latency/availability (locks vs. relaxed limits)
- Simplicity vs. multi-region consistency
- Biggest Risks/Failure Modes:
- Race conditions across regions causing over-allow
- Hot keys (popular users/tenants) overloading storage
- Overly strict locks increasing tail latency
- Fixed windows causing boundary spikes (“double dip” at minute edges)
- Unbounded queues in leaky bucket causing memory pressure
- 5-Min Review Flashcards:
- Q: Why rate limit? A: Protect UX, security (brute force), and cost.
- Q: Token bucket in one line? A: Tokens refill over time; requests spend tokens to allow bursts up to bucket size.
- Q: Leaky bucket in one line? A: Queue + constant drain rate; smooths bursts, drops when queue full.
- Q: Fixed window pitfall? A: Boundary spikes allow ~2× near window edges.
- Q: Sliding logs advantage? A: Real-time window accuracy by tracking recent timestamps.
- Q: Sliding-window counter benefit? A: Similar accuracy with fewer entries via per-second counters.
- Q: Distributed consistency options? A: Sticky sessions, locks, or relax enforcement slightly.
- Q: Operational win from local cache? A: Fewer store round trips at the cost of eventual consistency.
- Product Pattern:
rate-limit - System Concerns:
throttling, backpressure, high-availability, geo-replication, hot-key - Infra/Tech (mentioned):
redis, cassandra, load-balancer
- Original Prompt (paraphrase): Build rate limiting to curb bot spikes and monetize APIs by tiering quotas; support per-user, concurrency, IP/location, and server-level limits.
- Use Cases:
- Free developer tier (e.g., 10 req/min) vs. enterprise tier
- Brute-force prevention on login/promo/booking
- Cost containment for autoscaling/pay-as-you-go services
- Location/IP-based throttles during targeted campaigns
- Out of Scope: Detailed auth, billing integration, analytics dashboards.
- APIs: Not stated in video.
Given in Video
- Per-identity throttling: user-based, concurrent session caps, IP/location-based, and server-scope limits.
- Distributed setup with multiple regions and shared state introduces races; sticky sessions and locks are options with downsides.
- Optional local cache + async sync to shared store improves performance but allows slight drift.
Assumptions (conservative)
- Sub-millisecond to low-millisecond overhead target for the limiter path.
- Quotas configurable per tenant/endpoint; enforcement returns 429 on reject.
- Observability events on allow/deny for audits.
Ask AI: Requirements and Constraints
Not stated in video—skipping numerical estimation.
- Ingress: Clients → regional Load Balancers (LBs).
- Enforcement: Regional Rate Limiter services apply algorithm-specific checks.
- State: Shared datastore for counters/logs (examples shown: Redis or Cassandra).
- App Tier: Requests that pass are forwarded to App Servers.
- Optional Consistency Helpers: Sticky sessions (per-user affinity) or distributed locks on shared keys.
- Sticky sessions trade off balance & fault tolerance.
- Locks serialize updates but add latency under contention.
- Optional Performance Path: Each limiter keeps a local cache of recent keys; a background sync service reconciles with the shared store asynchronously (slight over-allow tolerated).
Ask AI: High-Level Architecture
- Role: Allow bursts up to bucket size; tokens refill at a fixed rate; requests consume tokens.
- State per key: last refill time, available tokens.
- Behavior: If tokens ≥ 1 → allow & decrement; else reject.
- Distributed Concerns: Concurrent updates to the same key can race in shared storage.
Ask AI: Subsystem - Token Bucket
- Role: Smooth bursts via FIFO queue drained at constant rate; overflow is dropped.
- Tuning: Queue capacity (burst tolerance) and drain rate (throughput ceiling).
- Risk: Excessive queueing under spikes increases latency or memory usage.
Ask AI: Subsystem - Leaky Bucket
- Role: Count requests per fixed intervals (e.g., per minute); allow until threshold then reject.
- Pitfall: Edge “double dip” lets near-window-boundary bursts exceed target rate.
Ask AI: Subsystem - Fixed Window Counter
- Role: Track timestamps for each request; at decision time, count events within last T.
- Pros: Accurate real-time windows.
- Cons: Memory heavy (stores many timestamps).
Ask AI: Subsystem - Sliding Logs
- Role: Reduce memory by aggregating counts per small buckets (e.g., per second) within rolling window.
- Behavior: Sum counts of buckets within last T; drop if ≥ limit.
- Benefit: Near sliding-log accuracy with far fewer entries.
Ask AI: Subsystem - Sliding-Window Counter
- Problem: Same user hitting different regions simultaneously → race updates in shared store → over-allow.
- Options:
- Sticky sessions (per-user affinity to a region/app). Trade-off: imbalance/fault tolerance.
- Distributed locks per key when reading/updating. Trade-off: extra latency, potential contention.
- Relax limits slightly; accept small error margins for performance.
- Local cache + async reconciliation to shared DB.
Ask AI: Subsystem - Distributed Enforcement
| Topic | Option A | Option B | Video’s Leaning | Rationale (from video) |
|---|---|---|---|---|
| Burst Handling | Token Bucket | Leaky Bucket | Both presented | Token allows short bursts; leaky smooths strictly. |
| Windowing | Fixed Window | Sliding (logs/counter) | Sliding | Avoid boundary spikes; better fairness. |
| Consistency | Locks | Relaxed limits | Mixed | Locks add latency; small drift may be acceptable. |
| Affinity | Sticky Sessions | Any-region | Mixed | Affinity reduces races but hurts balance/failover. |
- Consistency: Global strictness is hard in multi-region; eventual reconciliation or affinity/locks are discussed.
- Latency Budget: Locks and cross-region calls add latency; local caches reduce datastore trips.
- Backpressure: Leaky bucket provides steady processing; fixed window can cause sudden load spikes.
- Degradation: When buckets empty or queues full → shed load (429) to protect core services.
Ask AI: Reliability and Performance
- Abuse Prevention: Throttle brute-force attempts (login, promo, booking).
- PII/Encryption: Not stated in video.
- Metrics/Logs/Tracing: Not stated in video. (Would track allow/deny counts, latency, hot keys in practice—assumption.)
Not stated in video.
Not stated in video.
- Pick algorithm by desired burstiness vs. smoothness characteristics.
- Fixed window is simple but unfair at edges; prefer sliding approaches for accuracy.
- Sliding-window counter is a strong default: accurate enough and memory efficient.
- Distributed enforcement is non-trivial; accept small drift or pay with locks/affinity.
- Local caches + async sync lower latency but require careful error budgets.
- IP/location and concurrency limits complement per-user quotas.
- Token Bucket: Refillable tokens; spend to pass.
- Leaky Bucket: FIFO queue drained at a constant rate.
- Fixed Window: Counter in fixed intervals (minute/hour).
- Sliding Logs: Per-request timestamps; real-time window count.
- Sliding-Window Counter: Per-bucket aggregates within a rolling window.
- Sticky Sessions: LB keeps a user bound to a region/instance.
- Hot Key: Disproportionately active identity causing contention.
- Locks and sticky sessions are presented as options.
[Personal note: Prefer algorithms that minimize cross-region coordination (e.g., approximate counters or per-edge enforcement) to reduce lock contention in 2025.] - Fixed window counter is shown for simplicity.
[Personal note: Likely outdated; consider sliding-window counter to avoid boundary spikes — verify for your stack.] - Local cache + async sync proposed to reduce latency.
[Personal note: Add guardrails (per-key error budgets and periodic reconciliation) to bound drift in modern deployments.]
- Source Video: https://www.youtube.com/watch?v=mhUQe4BKZXs
- Channel: Tech Dummies - Narendra Lakshmana Gowda
- Note: This document is a summary of the linked mock interview.
I'm Ali Sol, a PHP Developer. Learn more:
- Website: alisol.ir
- LinkedIn: linkedin.com/in/alisolphp