Dynamodb cross partition and shard pagination #161
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
DynamoDB Cross-Partition and Shard Pagination
Overview
This PR implements a pagination system for the aggregator service that efficiently handles large datasets across multiple DynamoDB shards and time-based partitions. The implementation introduces a scalable, stateless pagination mechanism that maintains temporal ordering while supporting concurrent access across distributed data.
Key Components
1. Multi-Shard Pagination Architecture
The pagination system is built around several key components that work together to provide seamless data retrieval:
PaginatedAggregatedReports
: New response structure that includes both results and continuation tokensMergeIterator
: Merge algorithm that combines results from multiple shards while maintaining temporal ordering using a min-heapDayIterator
: Handles day-based temporal partitioning, querying each day's data separately for optimal DynamoDB access patternsFeedIterator
: Manages iteration over finalized feed data within each day partitionDDBQueryIterator
: Low-level DynamoDB query iterator that handles pagination within individual DynamoDB queriesAggregatedReportPaginationToken
: Stateless pagination tokens that track progress across multiple shards and partitions2. How Pagination Works
Request Flow
GetMessagesSince
without a tokenContinuation Flow
3. Cross-Partition and Shard Coordination
Temporal Partitioning
DayIterator
Shard Distribution
CalculateShardFromMessageID()
ensures consistent shard assignment4. Pagination Token Structure
The pagination token encapsulates the complete state needed to resume:
Testing Coverage
The PR includes comprehensive test coverage:
1. Pagination Scenarios
2. Multi-Shard Testing
3. Temporal Verification
Configuration Enhancements
The PR adds new configuration options:
PageSize
: Configurable page size for paginationShardCount
: Number of DynamoDB shards for horizontal scaling