Skip to content

Latest commit

 

History

History
207 lines (144 loc) · 8.16 KB

File metadata and controls

207 lines (144 loc) · 8.16 KB

Choosing the Right Database

Introduction

Designing scalable systems often requires choosing between different types of databases — each optimized for a specific workload or access pattern. The right choice depends on factors like:

  • Read/Write patterns

  • Consistency vs Latency trade-offs

  • Transactional vs Analytical workloads

  • Data relationships & structure

  • Scalability & fault tolerance needs

Choose the right database

1. OLTP vs OLAP Systems

OLTP (Online Transaction Processing)

  • Focus: Real-time transactional operations (insert, update, delete).
  • Prioritizes speed, consistency, and concurrency.
  • Stores current operational data.
  • Typically uses Relational Databases (SQL).

Characteristics

Property OLTP
Data Type Operational / Real-time
Query Type Short, frequent transactions
Operations INSERT, UPDATE, DELETE
Consistency Strong (ACID)
Data Volume Small per query, large overall
Example Systems Banking, E-commerce, Payment Systems
Database Examples PostgreSQL, MySQL, Oracle, SQL Server

OLAP (Online Analytical Processing)

  • Focus: Analytical queries — aggregation, reporting, and trend analysis.
  • Prioritizes read performance and data summarization.
  • Works with historical and aggregated data.
  • Commonly uses Columnar Databases or Data Warehouses.

Characteristics

Property OLAP
Data Type Historical / Analytical
Query Type Complex, long-running queries
Operations SELECT (Aggregations, Joins)
Consistency Eventually consistent is acceptable
Data Volume Huge (terabytes to petabytes)
Example Systems Business Intelligence, Analytics Dashboards
Database Examples Snowflake, BigQuery, Redshift, ClickHouse

OLTP vs OLAP

Aspect OLTP OLAP
Purpose Manage day-to-day transactions Analyze large datasets
Query Type Simple, frequent, real-time Complex, aggregated, historical
Data Structure Highly normalized Denormalized for performance
Consistency Strong ACID Eventual consistency acceptable
Schema Design ER model Star / Snowflake schema
Performance Metric Transactions/sec Query throughput / Scan speed
Examples MySQL, PostgreSQL BigQuery, Snowflake, Redshift

2. Consistency vs Latency Trade-Offs

In distributed systems, you can’t have everything — as per the CAP Theorem:

A distributed database can provide at most two of these three: Consistency, Availability, and Partition Tolerance.

This means we must balance trade-offs based on use case.


Consistency-Focused Systems (CP)

  • Prioritize data accuracy and integrity over availability or latency.
  • When a node fails or network partitions occur, reads/writes may block until consistency is ensured.

Use When:

  • Financial transactions
  • Inventory management
  • Banking systems

Examples:

  • PostgreSQL, MySQL (with synchronous replication)
  • MongoDB (strong consistency mode)
  • HBase

Availability-Focused Systems (AP)

  • Prioritize availability and partition tolerance, allowing temporary inconsistencies.
  • Data eventually converges (eventual consistency).

Use When:

  • Real-time analytics
  • Social media feeds
  • Recommendation systems
  • Caching layers

Examples:

  • Cassandra
  • DynamoDB
  • Couchbase
  • Redis

Latency-Focused (Eventual Consistency)

  • Reduces response time by not waiting for full consistency.
  • Writes are asynchronous, and replicas sync later.
  • Ideal for user-facing, high-throughput systems where slight staleness is tolerable.

Examples:

  • Amazon DynamoDB
  • Apache Cassandra
  • Riak

Choosing Consistency vs Latency

Requirement Preferred Type Database Examples
Banking / Payments Strong Consistency PostgreSQL, MySQL, Oracle
Real-time Analytics / Dashboards Eventual Consistency Cassandra, DynamoDB
Social Media Feed Tunable Consistency MongoDB, Cosmos DB
Caching Layer Eventual Consistency Redis, Memcached
Critical Logging High Availability Elasticsearch, Kafka

3. Database Choice by Use Case

Use Case Recommended Database Type Example Databases
Banking / Finance Relational (OLTP, Strong Consistency) PostgreSQL, Oracle
Analytics / Reporting Columnar (OLAP) BigQuery, Snowflake, ClickHouse
Social Networks Graph or Document (High Read Volume) Neo4j, MongoDB
E-commerce Product Catalog Document Store MongoDB, Couchbase
High-Speed Caching In-memory Key-Value Store Redis, Memcached
IoT / Time-Series Data Wide-Column or Time-Series DB InfluxDB, Cassandra
Search and Logging Full-Text / Log Indexing Elasticsearch, OpenSearch
Streaming / Events Log-Based Distributed Store Kafka, Pulsar

4. Real-World Examples

Company Use Case Database Choice
Amazon Product catalog + user cart DynamoDB (availability over consistency)
Netflix Stream data & user activity Cassandra (AP system)
Uber Trip & driver data MySQL + Redis + Cassandra (hybrid)
Airbnb Listings & user data MySQL (OLTP) + Elasticsearch (search)
Facebook Social graph TAO (custom graph store) + MySQL (OLTP)

5. Polyglot Persistence

  • Modern systems often combine multiple database types — using each for what it does best.
  • This approach is called Polyglot Persistence.

Example:

  • Use PostgreSQL for transactions.
  • Use Redis for caching hot data.
  • Use Elasticsearch for full-text search.
  • Use BigQuery / Snowflake for analytics.

In Summary

System Type Priority Ideal Database Choice
OLTP (Real-Time Transactions) Consistency, low latency SQL Databases (PostgreSQL, MySQL)
OLAP (Analytics & BI) Throughput, historical data Columnar Databases (BigQuery, Redshift)
Distributed / Global Systems Availability, partition tolerance NoSQL (Cassandra, DynamoDB)
Caching / Fast Access Low latency, eventual consistency Redis, Memcached
Hybrid Systems Balance of consistency and scale MongoDB, Cosmos DB