Skip to content

Latest commit

 

History

History
51 lines (40 loc) · 2.36 KB

File metadata and controls

51 lines (40 loc) · 2.36 KB

What is Database Sharding?

Database Sharding is a technique for distributing data across multiple servers to improve performance and scalability by dividing a large database into smaller, more manageable pieces (called "shards"). Each shard holds a subset of the data and operates independently.


Key Points

  1. Sharding - The Problem:

    • As databases grow, handling large volumes of data on a single server becomes inefficient.
    • Sharding addresses this by splitting data into smaller parts, or "shards," which are distributed across multiple servers.
  2. Horizontal Partitioning:

    • Sharding is a form of horizontal partitioning, where rows of a database are split across different servers.
    • Each server stores a subset of the data, based on a shard key (e.g., user ID, geographic region).

    Visual Example:

    Original Database: 
    --------------------------------------
    | ID | Name   | Age  | City    |
    |----|--------|------|---------|
    | 1  | Alice  | 30   | NY      |
    | 2  | Bob    | 25   | LA      |
    | 3  | Carol  | 29   | SF      |
    | 4  | Dave   | 35   | NY      |
    | 5  | Eve    | 27   | LA      |
    
    Sharded Database:
    Server 1: (shard 1)  | ID=1, ID=4 (NY)
    Server 2: (shard 2)  | ID=2, ID=5 (LA)
    Server 3: (shard 3)  | ID=3      (SF)
    
  3. Considerations:

    • Shard Key: Choosing an appropriate shard key is critical for data distribution and load balancing.
    • Scalability: As data grows, new shards can be added to handle more load.
    • Load Balancing: Even distribution of data across servers helps avoid overloading any single server.
  4. Potential Drawbacks:

    • Complexity: Managing multiple servers and ensuring consistency across shards can be challenging.
    • Cross-Shard Queries: Queries that involve data from multiple shards are more complex and slower.
    • Re-sharding: Re-distributing data across new shards can be difficult as the system grows.

Summary

  • Sharding splits large databases into smaller, more manageable parts, improving scalability and performance.
  • It involves horizontal partitioning, where data is distributed across multiple servers.
  • Choosing the right shard key is crucial for effective distribution.
  • Drawbacks include complexity, managing cross-shard queries, and the difficulty of re-sharding.