Skip to content

Partitioning the Notification Table

Mack Halliday edited this page Dec 10, 2024 · 1 revision

Partitioning Technical Document

This document was created as apart of Ticket #97

1. Introduction

This technical document outlines the implementation of partitioning for the ENP Notification table. It provides an overview of partitioning, a proposed approach for partitioning the Notification table, and key considerations for the ongoing maintenance and management of the partitioned structure.

2. Current Challenges

Currently, in our Notification-API application, the Notification table stores a significant and growing volume of data, making queries slower and increasing the complexity of managing the data. With BLANK number of rows, this table is becoming a bottleneck, especially for queries involving time-based filters.

We'd like to avoid this bottleneck when building our ENP application by implementing partitioning on the Notification table.

3. What is Partitioning?

Partitioning is a database optimization technique that divides a table into smaller, more manageable pieces (partitions) while preserving the logic of the table.

Types of Partitioning:

  • Range Partitioning: Data is divided into ranges based on a column's value. (This is the proposed method for our use case).
  • List Partitioning: Data is divided into partitions based on specific column values (category, type, color, etc.).
  • Hash Partitioning: Data is distributed into partitions based on a hash function. Ensures data distributed evenly.

For our scenario, range partitioning by year using the created_at column is the most appropriate choice to support time-based queries efficiently.

4. Proposed Solution

We propose partitioning the Notification table by year using the created_at column as the partition key. Each partition will hold data for a specific year (2024, 2025, etc.). There will be an event listener on the Notification attach partitioned tables to the base Notification table as they are generated.

This solution will:

  • Reduce the volume of data scanned for time-based queries.
  • Simplify data management by organizing data into logical yearly partitions.
  • Eliminate the need NotificationHistory table (used in our Notification-API application)

5. Implementation Steps

Note: SQLAlchemy does not support partitioning directly. This will need to be implmentmented thru...

  1. Alter the table to use PARTITION BY RANGE on the created_at column. 2.** Create yearly partitions (notifications_2024, notifications_2025, etc.). **
  2. Add event listeners to attached newly partitioned table to base Notification table.
  3. Update application code to ensure proper handling of partitioned data. Modify queries to include created_at filters to benefit from partition pruning.
  4. Develop and apply migration scripts to implement partitioning.

6. Additional Considerations

  • Query Adjustments: Modify queries to include created_at filters to benefit from partition pruning.
  • Partition Management: Automate the creation of new partitions for future years. Do we need a ticket for a script?
  • Indexing: Optimize indexing strategy for the parent and child tables.
  • Rollback Plan: Define a strategy to revert changes if issues arise during migration.

7. Risks and Limitations

  • Increased complexity in database maintenance and management.
  • ** Queries spanning multiple partitions could introduce overhead. **

10. Conclusion

Partitioning the Notification table is a critical step to ensure our system scales efficiently while maintaining performance. By organizing data into yearly partitions, we address current challenges and position the system for sustainable growth. Proper planning, testing, and monitoring will be essential to a successful implementation.


Additional Considerations for Implementing Partitioning

1. Query Planning

  • Update queries to ensure they include date filters to benefit from partition pruning.
  • Test queries to confirm expected behavior across partitions.

2. Partition Management

  • Automate the creation of new partitions annually via scheduled tasks or cron jobs. Script?
  • Monitor partition sizes and enforce any retention policies.

4. Indexes

  • Evaluate indexing needs for both the parent and child tables.
  • Avoid excessive indexing to prevent write performance issues.

5. Backup and Restore

  • Confirm that backup and restore processes are compatible with partitioned tables.