Designing scalable systems often requires choosing between different types of databases — each optimized for a specific workload or access pattern. The right choice depends on factors like:
-
Read/Write patterns
-
Consistency vs Latency trade-offs
-
Transactional vs Analytical workloads
-
Data relationships & structure
-
Scalability & fault tolerance needs
- Focus: Real-time transactional operations (insert, update, delete).
- Prioritizes speed, consistency, and concurrency.
- Stores current operational data.
- Typically uses Relational Databases (SQL).
| Property | OLTP |
|---|---|
| Data Type | Operational / Real-time |
| Query Type | Short, frequent transactions |
| Operations | INSERT, UPDATE, DELETE |
| Consistency | Strong (ACID) |
| Data Volume | Small per query, large overall |
| Example Systems | Banking, E-commerce, Payment Systems |
| Database Examples | PostgreSQL, MySQL, Oracle, SQL Server |
- Focus: Analytical queries — aggregation, reporting, and trend analysis.
- Prioritizes read performance and data summarization.
- Works with historical and aggregated data.
- Commonly uses Columnar Databases or Data Warehouses.
| Property | OLAP |
|---|---|
| Data Type | Historical / Analytical |
| Query Type | Complex, long-running queries |
| Operations | SELECT (Aggregations, Joins) |
| Consistency | Eventually consistent is acceptable |
| Data Volume | Huge (terabytes to petabytes) |
| Example Systems | Business Intelligence, Analytics Dashboards |
| Database Examples | Snowflake, BigQuery, Redshift, ClickHouse |
| Aspect | OLTP | OLAP |
|---|---|---|
| Purpose | Manage day-to-day transactions | Analyze large datasets |
| Query Type | Simple, frequent, real-time | Complex, aggregated, historical |
| Data Structure | Highly normalized | Denormalized for performance |
| Consistency | Strong ACID | Eventual consistency acceptable |
| Schema Design | ER model | Star / Snowflake schema |
| Performance Metric | Transactions/sec | Query throughput / Scan speed |
| Examples | MySQL, PostgreSQL | BigQuery, Snowflake, Redshift |
In distributed systems, you can’t have everything — as per the CAP Theorem:
A distributed database can provide at most two of these three: Consistency, Availability, and Partition Tolerance.
This means we must balance trade-offs based on use case.
- Prioritize data accuracy and integrity over availability or latency.
- When a node fails or network partitions occur, reads/writes may block until consistency is ensured.
Use When:
- Financial transactions
- Inventory management
- Banking systems
Examples:
- PostgreSQL, MySQL (with synchronous replication)
- MongoDB (strong consistency mode)
- HBase
- Prioritize availability and partition tolerance, allowing temporary inconsistencies.
- Data eventually converges (eventual consistency).
Use When:
- Real-time analytics
- Social media feeds
- Recommendation systems
- Caching layers
Examples:
- Cassandra
- DynamoDB
- Couchbase
- Redis
- Reduces response time by not waiting for full consistency.
- Writes are asynchronous, and replicas sync later.
- Ideal for user-facing, high-throughput systems where slight staleness is tolerable.
Examples:
- Amazon DynamoDB
- Apache Cassandra
- Riak
| Requirement | Preferred Type | Database Examples |
|---|---|---|
| Banking / Payments | Strong Consistency | PostgreSQL, MySQL, Oracle |
| Real-time Analytics / Dashboards | Eventual Consistency | Cassandra, DynamoDB |
| Social Media Feed | Tunable Consistency | MongoDB, Cosmos DB |
| Caching Layer | Eventual Consistency | Redis, Memcached |
| Critical Logging | High Availability | Elasticsearch, Kafka |
| Use Case | Recommended Database Type | Example Databases |
|---|---|---|
| Banking / Finance | Relational (OLTP, Strong Consistency) | PostgreSQL, Oracle |
| Analytics / Reporting | Columnar (OLAP) | BigQuery, Snowflake, ClickHouse |
| Social Networks | Graph or Document (High Read Volume) | Neo4j, MongoDB |
| E-commerce Product Catalog | Document Store | MongoDB, Couchbase |
| High-Speed Caching | In-memory Key-Value Store | Redis, Memcached |
| IoT / Time-Series Data | Wide-Column or Time-Series DB | InfluxDB, Cassandra |
| Search and Logging | Full-Text / Log Indexing | Elasticsearch, OpenSearch |
| Streaming / Events | Log-Based Distributed Store | Kafka, Pulsar |
| Company | Use Case | Database Choice |
|---|---|---|
| Amazon | Product catalog + user cart | DynamoDB (availability over consistency) |
| Netflix | Stream data & user activity | Cassandra (AP system) |
| Uber | Trip & driver data | MySQL + Redis + Cassandra (hybrid) |
| Airbnb | Listings & user data | MySQL (OLTP) + Elasticsearch (search) |
| Social graph | TAO (custom graph store) + MySQL (OLTP) |
- Modern systems often combine multiple database types — using each for what it does best.
- This approach is called Polyglot Persistence.
Example:
- Use PostgreSQL for transactions.
- Use Redis for caching hot data.
- Use Elasticsearch for full-text search.
- Use BigQuery / Snowflake for analytics.
| System Type | Priority | Ideal Database Choice |
|---|---|---|
| OLTP (Real-Time Transactions) | Consistency, low latency | SQL Databases (PostgreSQL, MySQL) |
| OLAP (Analytics & BI) | Throughput, historical data | Columnar Databases (BigQuery, Redshift) |
| Distributed / Global Systems | Availability, partition tolerance | NoSQL (Cassandra, DynamoDB) |
| Caching / Fast Access | Low latency, eventual consistency | Redis, Memcached |
| Hybrid Systems | Balance of consistency and scale | MongoDB, Cosmos DB |
