Skip to content

Conversation

carte7000
Copy link
Contributor

@carte7000 carte7000 commented Oct 7, 2025

DynamoDB Cross-Partition and Shard Pagination

Overview

This PR implements a pagination system for the aggregator service that efficiently handles large datasets across multiple DynamoDB shards and time-based partitions. The implementation introduces a scalable, stateless pagination mechanism that maintains temporal ordering while supporting concurrent access across distributed data.

Key Components

1. Multi-Shard Pagination Architecture

The pagination system is built around several key components that work together to provide seamless data retrieval:

  • PaginatedAggregatedReports: New response structure that includes both results and continuation tokens
  • MergeIterator: Merge algorithm that combines results from multiple shards while maintaining temporal ordering using a min-heap
  • DayIterator: Handles day-based temporal partitioning, querying each day's data separately for optimal DynamoDB access patterns
  • FeedIterator: Manages iteration over finalized feed data within each day partition
  • DDBQueryIterator: Low-level DynamoDB query iterator that handles pagination within individual DynamoDB queries
  • AggregatedReportPaginationToken: Stateless pagination tokens that track progress across multiple shards and partitions

2. How Pagination Works

Request Flow

  1. Initial Request: Client calls GetMessagesSince without a token
  2. Multi-Shard Query: System queries all configured shards in parallel using an iterator for each shard
  3. Merge Results: Merge results chronologically across shards
  4. Page Assembly: Results are collected until page size is reached
  5. Token Generation: Current position across all shards is serialized into a continuation token
  6. Response: Page of results plus next token (if more data exists)

Continuation Flow

  1. Token Parsing: Next request includes token which is deserialized to restore shard positions
  2. Resume Query: Each shard iterator resumes from its stored position
  3. Continued Merge: Merge process continues from where it left off
  4. Repeat: Process continues until all data is consumed

3. Cross-Partition and Shard Coordination

Temporal Partitioning

  • Data is partitioned by day using DayIterator
  • Each day's data is queried separately to optimize DynamoDB access patterns
  • Results maintain chronological ordering across day boundaries

Shard Distribution

  • Messages are distributed across shards using deterministic hashing based on message ID
  • Each shard maintains its own pagination state in the token
  • CalculateShardFromMessageID() ensures consistent shard assignment

4. Pagination Token Structure

The pagination token encapsulates the complete state needed to resume:

  • Per-shard cursors: Track position within each shard
  • Day boundaries: Current day being processed for each shard
  • Exhaustion flags: Which shards have no more data

Testing Coverage

The PR includes comprehensive test coverage:

1. Pagination Scenarios

  • Various Page Sizes: Tests with page sizes that are multiples and non-multiples of total data
  • Edge Cases: Empty results, single page, exact divisions

2. Multi-Shard Testing

  • Shard Distribution: Verifies messages are properly distributed across shards
  • Cross-Shard Ordering: Confirms temporal ordering is maintained across shards
  • Shard Scaling: Tests with 2, 3, and 5 shard configurations

3. Temporal Verification

  • Global Ordering: All messages maintain chronological order across pages
  • Per-Shard Ordering: Within-shard ordering is preserved
  • Day Boundaries: Proper handling of messages across day transitions

Configuration Enhancements

The PR adds new configuration options:

  • PageSize: Configurable page size for pagination
  • ShardCount: Number of DynamoDB shards for horizontal scaling

@carte7000 carte7000 marked this pull request as ready for review October 8, 2025 12:51
@carte7000 carte7000 requested review from a team and skudasov as code owners October 8, 2025 12:51
@carte7000 carte7000 enabled auto-merge (squash) October 8, 2025 12:52
Copy link

github-actions bot commented Oct 8, 2025

Metric simon/aggregator/dynamodb-pagination main
aggregator Coverage 47.1% 43.0%

@carte7000 carte7000 merged commit c04bae0 into main Oct 8, 2025
13 checks passed
@carte7000 carte7000 deleted the simon/aggregator/dynamodb-pagination branch October 8, 2025 12:57
asoliman92 pushed a commit that referenced this pull request Oct 9, 2025
* Dynamodb cross partition and shard pagination
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants