- Database Partitioning is the process of dividing a large database table into smaller, more manageable pieces called partitions.
- Each partition can be stored separately but is still logically treated as a single table.
- Partitioning improves query performance, maintenance, and scalability.
- Manage Large Tables: Break huge tables into smaller segments for easier handling.
- Improve Query Performance: Queries can scan only relevant partitions instead of the full table.
- Maintenance: Easier to backup, restore, or archive partitions.
- Load Distribution: Reduce I/O contention by accessing only specific partitions.
-
Mechanism: Splits table rows into different partitions based on a partition key.
-
Example: Sales records from 2019 in Partition 1, 2020 in Partition 2.
-
Pros:
- Efficient for queries that access subsets of rows.
- Supports parallel processing.
-
Cons:
- Cross-partition queries may need aggregation, slightly complex.
-
Mechanism: Splits a table by columns, storing frequently accessed columns separately from infrequently used ones.
-
Example: User authentication info in Partition 1, user profile info in Partition 2.
-
Pros:
- Reduces I/O for queries needing only certain columns.
-
Cons:
- Queries that require all columns must join partitions.
-
Mechanism: Assigns rows to partitions based on ranges of a column value.
-
Example: Orders with
order_dateJan–Mar in Partition 1, Apr–Jun in Partition 2. -
Pros:
- Efficient for range queries.
-
Cons:
- Uneven distribution if data is skewed.
-
Mechanism: Assigns rows to partitions based on specific values of a column.
-
Example: Users from USA in Partition 1, India in Partition 2.
-
Pros:
- Works well for categorical data.
-
Cons:
- Adding new categories requires schema changes.
-
Mechanism: Applies a hash function on a column value to assign a partition.
-
Example:
hash(user_id) % 4→ 4 partitions. -
Pros:
- Uniform distribution of rows.
-
Cons:
- Hard to perform range queries efficiently.
-
Mechanism: Combines multiple partitioning strategies.
-
Example: Range + Hash partitioning: range by year, hash by user ID within each year.
-
Pros:
- Offers flexibility and better load distribution.
-
Cons:
- More complex to manage and maintain.
- Improved query performance.
- Better maintenance (backup, restore, archiving).
- Enhanced scalability.
- Reduces I/O contention for large tables.
- Facilitates parallel processing.
- Cross-Partition Queries: Aggregations across partitions can be slower.
- Complexity: Adds overhead in database design and management.
- Partition Key Selection: Poor choice can lead to uneven load.
- Application Awareness: Some applications may need to know about partitions for optimal performance.
| Partitioning Type | Data Split | Pros | Cons |
|---|---|---|---|
| Horizontal (Row) | Rows | Efficient row queries, scalable | Cross-partition queries complex |
| Vertical (Column) | Columns | Reduces I/O for column-specific queries | Joins required for full table queries |
| Range | Value ranges | Efficient for range queries | Uneven distribution possible |
| List | Specific values | Good for categorical data | Schema changes needed for new categories |
| Hash | Hash function | Uniform distribution | Poor for range queries |
| Composite / Hybrid | Combination | Flexible, balanced | Complex management |