Skip to content

Latest commit

 

History

History
131 lines (85 loc) · 4.31 KB

File metadata and controls

131 lines (85 loc) · 4.31 KB

Database Partitioning

Definition

  • Database Partitioning is the process of dividing a large database table into smaller, more manageable pieces called partitions.
  • Each partition can be stored separately but is still logically treated as a single table.
  • Partitioning improves query performance, maintenance, and scalability.

Why Partitioning is Needed

  • Manage Large Tables: Break huge tables into smaller segments for easier handling.
  • Improve Query Performance: Queries can scan only relevant partitions instead of the full table.
  • Maintenance: Easier to backup, restore, or archive partitions.
  • Load Distribution: Reduce I/O contention by accessing only specific partitions.

Types of Partitioning

1. Horizontal Partitioning (Row-Based)

  • Mechanism: Splits table rows into different partitions based on a partition key.

  • Example: Sales records from 2019 in Partition 1, 2020 in Partition 2.

  • Pros:

    • Efficient for queries that access subsets of rows.
    • Supports parallel processing.
  • Cons:

    • Cross-partition queries may need aggregation, slightly complex.

2. Vertical Partitioning (Column-Based)

  • Mechanism: Splits a table by columns, storing frequently accessed columns separately from infrequently used ones.

  • Example: User authentication info in Partition 1, user profile info in Partition 2.

  • Pros:

    • Reduces I/O for queries needing only certain columns.
  • Cons:

    • Queries that require all columns must join partitions.

3. Range Partitioning

  • Mechanism: Assigns rows to partitions based on ranges of a column value.

  • Example: Orders with order_date Jan–Mar in Partition 1, Apr–Jun in Partition 2.

  • Pros:

    • Efficient for range queries.
  • Cons:

    • Uneven distribution if data is skewed.

4. List Partitioning

  • Mechanism: Assigns rows to partitions based on specific values of a column.

  • Example: Users from USA in Partition 1, India in Partition 2.

  • Pros:

    • Works well for categorical data.
  • Cons:

    • Adding new categories requires schema changes.

5. Hash Partitioning

  • Mechanism: Applies a hash function on a column value to assign a partition.

  • Example: hash(user_id) % 4 → 4 partitions.

  • Pros:

    • Uniform distribution of rows.
  • Cons:

    • Hard to perform range queries efficiently.

6. Composite / Hybrid Partitioning

  • Mechanism: Combines multiple partitioning strategies.

  • Example: Range + Hash partitioning: range by year, hash by user ID within each year.

  • Pros:

    • Offers flexibility and better load distribution.
  • Cons:

    • More complex to manage and maintain.

Benefits of Partitioning

  • Improved query performance.
  • Better maintenance (backup, restore, archiving).
  • Enhanced scalability.
  • Reduces I/O contention for large tables.
  • Facilitates parallel processing.

Challenges

  • Cross-Partition Queries: Aggregations across partitions can be slower.
  • Complexity: Adds overhead in database design and management.
  • Partition Key Selection: Poor choice can lead to uneven load.
  • Application Awareness: Some applications may need to know about partitions for optimal performance.

Summary Table

Partitioning Type Data Split Pros Cons
Horizontal (Row) Rows Efficient row queries, scalable Cross-partition queries complex
Vertical (Column) Columns Reduces I/O for column-specific queries Joins required for full table queries
Range Value ranges Efficient for range queries Uneven distribution possible
List Specific values Good for categorical data Schema changes needed for new categories
Hash Hash function Uniform distribution Poor for range queries
Composite / Hybrid Combination Flexible, balanced Complex management