System Design Mock Interview: Design Uber | OLA | Amazon System Design Interview

Channel/Interviewer: Tech Dummies - Narendra Lakshmana Gowda
Duration: 00:36:51
Original Video: https://www.youtube.com/watch?v=umWABit-wbk

This document summarizes the key content of a system design mock interview. I highly recommend watching the full video if you can.

AI-Powered buttons

Teach Me: 5 Years Old | Beginner | Intermediate | Advanced | (reset auto redirect)

Check Understanding: Generate Quiz | Interview Me | Refactor Challenge | Assessment Rubric | Next Steps

One-Page Executive Summary (2–3 min skim)

Problem Prompt (One-liner):
Design a ridesharing platform (e.g., Uber/Ola/Lyft/Grab) that matches riders to nearby drivers in real time, at global scale.

Primary Scope:
Real-time dispatch (supply–demand matching), location ingestion, ETA calculation, geo-indexing, messaging between clients and backend, data storage/analytics, logging, and disaster recovery. Out-of-scope fine details: pricing, surge strategy, payments internals, full auth design.

Non-Functional Priorities:
Always-on availability, low-latency mobile experience, horizontal scalability across regions, resiliency to data center failures.

High-Level Architecture (Text):

Mobile apps (rider/driver) send frequent GPS updates.
Edge: Web Application Firewall → Load Balancer (L3/L4/L7).
Ingestion via REST → buffer/stream to Kafka.
Real-time state fan-out to: NoSQL storage, dispatch optimization service, and other consumers.
Dispatch ring (Node.js services) with consistent hashing + gossip for responsibility; RPC between nodes.
Mapping/ETA service (uses map provider + S2-like cell library).
WebSockets for bidirectional, event-driven updates to riders/drivers.
Analytics: batch on HDFS/Hadoop; real-time via Spark/Storm; centralized logging via ELK.
Backup data center that reconstructs trip state from driver apps’ state digests on primary DC failure.
Note: For real-time stream processing, Apache Flink or Kafka Streams are common 2025 choices due to unified batch/stream semantics and operational simplicity.

Top Trade-offs:

Freshest driver locations vs. network/battery cost (GPS push cadence).
Global consistency vs. availability/latency (favor eventual where safe).
In-house ETA modeling vs. map-provider ETA APIs.
Fine-grained sharding (S2 cells) vs. cross-shard coordination overhead.
Simpler monolith vs. scalable microservices (operational complexity).

Biggest Risks/Failure Modes:

Hotspots during peak demand (stadiums/airports).
Stale driver positions leading to bad matches/ETAs.
Message loss or consumer lag in streams.
Data center outage mid-trip.
Fraud: payment misuse, incentive abuse, account compromise.

5-Min Review Flashcards:

Why WebSockets here? To push server-initiated updates (offers, accept/decline, status) with low latency.
What’s the unit of geo-partitioning? S2-like cells (unique IDs per cell) for indexing and routing.
How do dispatch nodes scale? Consistent hashing (ring) + gossip to learn responsibilities; RPC among nodes.
Core ingestion path? App → WAF/LB → REST → Kafka; consumers update NoSQL + dispatch state.
Who computes ETA? Mapping/ETA service using road network and traffic; not simple Euclidean distance.
DC failure playbook? Fail over to backup DC and rebuild live trip state from driver apps’ “state digest.”
Logging stack? Elasticsearch + Logstash + Kibana (ELK).
Note: Consider OpenTelemetry + vendor-neutral backends to reduce lock-in in 2025.

Ask AI: Executive Summary

Interview Tags (for later filtering)

Domain/Industry: ridesharing, delivery, maps, analytics
Product Pattern: realtime-chat, pub-sub, notification, queue, job-scheduler
System Concerns: high-availability, low-latency, eventual-consistency, geo-replication, hot-key, backpressure, throttling, autoscaling
Infra/Tech (mentioned): microservices, rest, websocket, kafka, mysql, cassandra, mongodb, hdfs, elasticsearch, spark, storm
Note: Apache Storm is largely legacy; prefer Flink/Kafka Streams unless your org already runs Storm.

Ask AI: Interview Tags

Problem Understanding

Original Prompt: Build a ride-hailing platform whose dispatch system functions as a real-time marketplace to match riders to drivers, show nearby cars, and estimate arrival times.

Use Cases:

Rider requests a trip (vehicle class, solo/pool).
Show nearby cars and ETAs.
Driver location updates every few seconds.
Offer/accept/timeout flow; assign the first accepting driver.
Handle high-demand regions; add/remove cities and servers seamlessly.

Out of Scope (explicit/implicit): full pricing/surge details, payments internals, full auth/PII design.

APIs: Not stated in video.

Ask AI: Problem Understanding

Requirements & Constraints

Given in Video — Functional:

Ingest high-rate GPS pings from drivers.
Maintain latest driver state for dispatch.
Compute ETAs using road network (turn costs, traffic).
Run a bidirectional message channel to apps (offers, status).
Scale to many regions/cities; easy server add/remove (gossip).
Provide analytics and fraud detection.

Given in Video — Non-Functional:

Highly available (no planned downtime visible to users).
Low latency for offers/accept and map updates.
Horizontal scalability across data centers/regions.
Robust disaster recovery.

Assumptions (conservative):

Rider request latency budget in low seconds end-to-end.
Driver GPS update cadence ~4s (as stated).
Eventual consistency acceptable for map/nearby lists; stronger consistency for trip state.

Ask AI: Requirements and Constraints

Back-of-the-Envelope Estimation

Not stated in video—skipping numerical estimation.

Ask AI: Estimation

High-Level Architecture

Clients: Rider and driver mobile apps send/receive via WebSockets and call REST endpoints for location posting.
Edge: Web Application Firewall (blocks bots/regions) → L3/L4/L7 Load Balancers route traffic.
Ingestion: REST → buffer to Kafka topics (location stream). Copies to NoSQL, dispatch state, and other consumers.
Dispatch Ring: Node.js services; consistent hashing assigns cell ranges; gossip protocol shares responsibilities; RPC calls between nodes when a rider’s search radius spans multiple owners.
Note: Uber historically used Ringpop/TChannel; in 2025 many teams standardize on gRPC over HTTP/2 for service-to-service RPC.
Geo/ETA: Mapping service uses S2-like cell coverage to find candidate drivers, then road-network ETA (turn costs, traffic), not straight-line distance.
Data: “Schemaless” NoSQL built on MySQL concepts, plus alternatives cited (Cassandra/MongoDB/Bigtable). High read/write availability; online operations (index, backup) without downtime.
Note: For new large-scale OLTP in 2025, consider cloud-native, horizontally scalable SQL (e.g., Spanner/CockroachDB) when strong consistency across regions is needed.
Analytics: Batch (HDFS/Hadoop; periodic DB dumps), plus real-time (Spark/Storm) for trends and model inputs.
Observability: Centralized logs → Elasticsearch/Logstash/Kibana dashboards.
DR: Backup DC with empty data; on failover, driver apps upload “state digest” to reconstruct live trips.

Ask AI: High-Level Architecture

Deep Dives by Subsystem

Supply (Driver) Service

Role: Accept GPS updates (~every 4s), maintain latest driver state, push to Kafka and dispatch.
Scaling: Horizontally behind LB; partitions by driverId or cellId.
Consistency: Eventual for map views; stronger for assigned-trip state.
Failure: Retry with idempotent writes; drop stale pings.

Ask AI: Subsystem - Supply

Demand (Rider) Service

Role: Receives ride requests (vehicle class, pooling), forwards to dispatch with rider’s cellId.
Contracts: Send search radius; receive candidate drivers sorted by ETA.
Failure: Timeouts; re-query with expanded radius.

Ask AI: Subsystem - Demand

Dispatch Optimization (Node.js Ring)

Role: For a given rider cell, compute nearby drivers by drawing a circle, enumerating S2 cells, fetching candidates, and ranking by road ETA. Sends offers to top N via WebSockets; first accept wins.
Scaling/Partitioning: Consistent hashing over cell IDs; gossip to learn ownership; RPC to contact peer owners for overlapping search areas.

Ask AI: Subsystem - Dispatch

WebSockets Gateway

Role: Persistent, bidirectional messaging with riders/drivers for offers, accepts, cancellations, trip updates.
Why WS: Event-driven, low-overhead, server-initiated pushes.

Ask AI: Subsystem - WebSockets

Geo/Mapping & ETA

Role: S2-like cell modeling; coverage queries for a given radius; ETA via road graph (turn costs, traffic).
Note: Sometimes a finishing-soon trip near the rider beats an idle but farther driver.

Ask AI: Subsystem - Mapping & ETA

Data Platform & Analytics

Batch: Periodic dumps to HDFS; query with Hadoop “big query tools.”
Streaming: Spark/Storm for real-time trends and detection.
Use Cases: ETA improvement, map enhancement, traffic modeling.
Note: Modern lakehouse stacks (e.g., Apache Iceberg/Delta Lake) can simplify historical + streaming unification in 2025.

Ask AI: Subsystem - Analytics

Logging & Observability

Stack: ELK—forward logs to a cluster; dashboards show errors and health.

Ask AI: Subsystem - Logging

Fraud Detection (ML)

Threats: Payment fraud (stolen cards), incentive abuse (fake GPS/bookings to hit quotas), compromised accounts (phishing).
Signals: Historical trip patterns, altitude/speed traces, behavioral anomalies. Actions: warn/ban on repeated abuse.

Ask AI: Subsystem - Fraud

Trade-offs & Alternatives

Topic	Option A	Option B	Leaning	Rationale
Messaging to apps	WebSockets	Long-polling/HTTP/2 SSE	WebSockets	Bi-directional, lower latency for pushes.
Geo index	S2 cells	Raw lat/lon grids	S2 cells	Unique IDs, good coverage queries, routing/sharding.
ETA	Provider APIs	In-house model	Provider APIs (lean)	Road graph + traffic complexity; provider maintained.
Storage	MySQL-based “schemaless”	Cassandra/Mongo/Bigtable	Mixed	Must stay highly available & horizontally scalable.
Stream proc.	Spark/Storm	Flink/Kafka Streams	Depends	Existing mention; org choice varies. Note: Favor Flink/Kafka Streams in 2025 for mature stateful streaming.

Ask AI: Trade-offs

Reliability, Availability, and Performance

Replication/Consistency: Emphasis on availability; eventual consistency acceptable for nearby lists; stricter for trip state.
Backpressure: Kafka absorbs bursty GPS updates; downstream consumers process at pace.
Degradation: Expand search radius or relax matching in sparse supply.
DR: Backup DC boots with no data; driver apps upload “state digest” so trips continue.

Ask AI: Reliability and Performance

Security & Privacy

WAF blocks known-bad IPs/bots and unsupported regions.
Account compromise detection via behavioral anomalies.
Payment fraud monitoring.
Note: Target TLS 1.3 and strong mobile certificate pinning in 2025.

Ask AI: Security and Privacy

Observability

Centralized log ingestion; dashboards for error rates and system health (Kibana).

Ask AI: Observability

Follow-up Questions (to ask interviewer)

How do we tune GPS upload cadence for latency vs. battery?
What’s the acceptable ETA error and success rate targets?
Which failure modes are most common in production incidents?
What’s the plan for stadium/airport special handling (preferred access points)?

Ask AI: Follow-up Questions

Key Takeaways

Treat dispatch as a real-time marketplace optimized for low latency.
Geo-partition by S2-like cells; use consistent hashing to split compute.
WebSockets for push-heavy workflows between servers and mobile apps.
Use road-network ETAs; straight-line distance misleads.
Design for DC failure: rebuild live state from clients when needed.
Centralize logs and invest in analytics to improve ETA and catch fraud.

Ask AI: Key Takeaways

Glossary

S2 Cells: Hierarchical geospatial index dividing the globe into addressable cells.
Gossip Protocol: Peer-to-peer membership/health dissemination so nodes learn each other’s responsibilities.
Consistent Hashing: Partitioning scheme that minimizes remapping on node add/remove.
WebSockets: Full-duplex TCP-based protocol for real-time app messaging.
ELK: Elasticsearch + Logstash + Kibana logging/visualization stack.
State Digest: Compact snapshot on driver app used to reconstruct live trip state during DC failover.

Ask AI: Glossary

Study Plan (Optional)

Revisit consistent hashing & gossip basics.
Implement a toy S2-cell radius search and candidate ranking.
Build a small WebSocket demo (offer → accept → assign).
Run a Kafka pipeline ingesting mock GPS and fanning out to two consumers.

Ask AI: Study Plan

Attribution

Source Video: https://www.youtube.com/watch?v=umWABit-wbk
Channel: Tech Dummies - Narendra Lakshmana Gowda
Note: This document is a summary of the linked mock interview.

About the summarizer

I'm Ali Sol, a PHP Developer. Learn more:

Website: https://alisol.ir
LinkedIn: https://www.linkedin.com/in/alisolphp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System Design Mock Interview: Design Uber | OLA | Amazon System Design Interview

AI-Powered buttons

One-Page Executive Summary (2–3 min skim)

Interview Tags (for later filtering)

Problem Understanding

Requirements & Constraints

Back-of-the-Envelope Estimation

High-Level Architecture

Deep Dives by Subsystem

Supply (Driver) Service

Demand (Rider) Service

Dispatch Optimization (Node.js Ring)

WebSockets Gateway

Geo/Mapping & ETA

Data Platform & Analytics

Logging & Observability

Fraud Detection (ML)

Trade-offs & Alternatives

Reliability, Availability, and Performance

Security & Privacy

Observability

Follow-up Questions (to ask interviewer)

Key Takeaways

Glossary

Study Plan (Optional)

Attribution

About the summarizer

FilesExpand file tree

summary.en.md

Latest commit

History

summary.en.md

File metadata and controls

System Design Mock Interview: Design Uber | OLA | Amazon System Design Interview

AI-Powered buttons

One-Page Executive Summary (2–3 min skim)

Interview Tags (for later filtering)

Problem Understanding

Requirements & Constraints

Back-of-the-Envelope Estimation

High-Level Architecture

Deep Dives by Subsystem

Supply (Driver) Service

Demand (Rider) Service

Dispatch Optimization (Node.js Ring)

WebSockets Gateway

Geo/Mapping & ETA

Data Platform & Analytics

Logging & Observability

Fraud Detection (ML)

Trade-offs & Alternatives

Reliability, Availability, and Performance

Security & Privacy

Observability

Follow-up Questions (to ask interviewer)

Key Takeaways

Glossary

Study Plan (Optional)

Attribution

About the summarizer