Skip to content

source monday: use dict page cursor for items backfill#3068

Merged
JustinASmith merged 3 commits intomainfrom
js/source-monday-use-dict-page-cursor
Jul 21, 2025
Merged

source monday: use dict page cursor for items backfill#3068
JustinASmith merged 3 commits intomainfrom
js/source-monday-use-dict-page-cursor

Conversation

@JustinASmith
Copy link
Copy Markdown
Contributor

@JustinASmith JustinASmith commented Jul 17, 2025

Description:

See #3047 (#3047 (comment)) for more context.

This pull request introduces enhancements to cursor handling, backfill logic, and JSON merge patch functionality in the Estuary CDK and Source Monday integrations. The changes aim to improve efficiency, scalability, and fault tolerance, particularly in handling paginated data and structured cursors for the items stream. This PR also simplified the items.py BoardItemIterator by removing the unnecessary class that wrapped the functions. Below is a categorized summary of the most important changes:

Cursor Handling Enhancements

  • Added a new CURSOR_MARKER constant and utility functions (is_cursor_dict, make_cursor_dict, pop_cursor_marker) to support dict-based cursors for structured and granular progress tracking. (estuary-cdk/estuary_cdk/capture/common.py, estuary-cdk/estuary_cdk/capture/common.pyL53-R78)
  • Updated the PageCursor type to include dict for structured cursors and added support for JSON merge patches to enable efficient incremental updates. (estuary-cdk/estuary_cdk/capture/common.py, estuary-cdk/estuary_cdk/capture/common.pyL53-R78)

Backfill Logic Improvements

JSON Merge Patch Implementation

  • Added a new json_merge_patch utility function to apply RFC 7396-compliant JSON merge patches, enabling efficient in-place updates to dictionaries. (estuary-cdk/estuary_cdk/utils.py, estuary-cdk/estuary_cdk/utils.pyR4-R38)

Source Monday Integration Updates

  • Replaced BoardItemIterator with get_items_from_boards for improved cursor handling and streamlined item fetching. (source-monday/source_monday/api.py, [1] [2]
  • Updated the fetch_items_page function to use dict-based cursors for granular progress tracking and efficient recovery. (source-monday/source_monday/api.py, source-monday/source_monday/api.pyL320-R399)

GraphQL Query and Cursor Processing Refinements

  • Increased the fetch_boards_minimal query limit from 500 to 10,000 for better scalability. (source-monday/source_monday/graphql/boards.py, source-monday/source_monday/graphql/boards.pyL54-R65)
  • Simplified the CursorCollector class by removing unused attributes and improving cursor processing for nested structures like next_items_page. (source-monday/source_monday/graphql/items.py, [1] [2]

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

I tested this on a local stack.

  • Verified the backfill for the items stream works as expected.
    • The initial page cursor was populated and checkpointed.
    • The fetch_items_page invocations fetched items for all boards in the current invocation and emitted a patch checkpoint for those boards in the form {"board1": None, "board2": None} to remove those boards from the connector state and not bloat the recovery log.
  • I tested that restarting the capture works and that the connector started on the proper page.
  • I used gazctl to read the Gazette recovery log and verified at that lower-level the recovery log only contained the initial full checkpointed pager cursor dictionary followed by incremental patches or updates which were setting board IDs to null (i.e., removing them from the state by merge patching).

This change is Reviewable

@JustinASmith JustinASmith force-pushed the js/source-monday-use-dict-page-cursor branch from 0a0e93d to 8089097 Compare July 18, 2025 16:39
@JustinASmith JustinASmith changed the title DRAFT: source monday: use dict page cursor for items backfill source monday: use dict page cursor for items backfill Jul 18, 2025
@JustinASmith JustinASmith requested a review from Copilot July 18, 2025 17:03

This comment was marked as outdated.

@JustinASmith JustinASmith marked this pull request as ready for review July 18, 2025 17:17
@JustinASmith JustinASmith requested a review from Alex-Bair July 18, 2025 17:17
@JustinASmith
Copy link
Copy Markdown
Contributor Author

@Alex-Bair I have this PR in separate commits for easier review. Will likely consolidate into three commits. One for CDK changes, one for items.py refactor in source-monday and a final commit for source-monday updates to use dict-based page cursor in items stream.

I will also address some of the items Copiolot Review pointed out once I do a final force push.

Copy link
Copy Markdown
Member

@Alex-Bair Alex-Bair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good, nice job! I had some minor nits, mostly aimed around answering questions I know I'll have down the road once I lose context around the reason for these changes.

I haven't approved yet because I'd like to take the CDK changes for a spin & make sure I'm not missing any edge cases. I'll try to do a little testing either this weekend or on Monday so you're not waiting on me for too long.

Comment thread estuary-cdk/estuary_cdk/capture/common.py
Comment thread estuary-cdk/estuary_cdk/capture/common.py
Comment thread estuary-cdk/estuary_cdk/capture/common.py
Comment thread source-monday/source_monday/graphql/items.py Outdated
Comment thread source-monday/source_monday/api.py Outdated
@JustinASmith
Copy link
Copy Markdown
Contributor Author

Thanks @Alex-Bair! I'll get to reviewing your comments in more detail soon. Good points though. Having more comments and docstrings will help future us!

Copy link
Copy Markdown
Member

@Alex-Bair Alex-Bair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment thread estuary-cdk/estuary_cdk/capture/common.py Outdated
Implements in-place JSON merge patching for cursor state updates, allowing efficient incremental tracking and checkpointing. Streamlines backfill logic to properly handle structured cursors and emit precise state patches for improved runtime efficiency. Addresses need for flexible cursor management and efficient state updates in paged result processing.
Simplifies the logic for fetching items from boards by removing the iterator class, consolidating the process into standalone async functions, and clarifying error handling for invalid inputs.

Fixes a bug in cursor collection to support both board-level and next-page cursors, enhancing reliability and maintainability of pagination.

Removes unused logic related to tracking the oldest updated board, focusing on streamlined item retrieval.
@JustinASmith JustinASmith force-pushed the js/source-monday-use-dict-page-cursor branch 2 times, most recently from 9ceead6 to 7729065 Compare July 21, 2025 16:49
This addresses a fundamental limitation and issue with the prior implementation by using a dictionary for tracking boards that need their items backfilled. This leverages the CDK so that the initial page of all boards is first yielded and only patches for processed boards are yieled on subsequent invocations fo `fetch_items_page`.

This improves the reliability, efficiency and consistency in backfilling items, while working around the API limitations that Monday.com presents.
@JustinASmith JustinASmith force-pushed the js/source-monday-use-dict-page-cursor branch from 7729065 to 8155d52 Compare July 21, 2025 17:46
@JustinASmith JustinASmith requested a review from Copilot July 21, 2025 17:47
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request introduces dict-based page cursors for the Monday.com connector's items backfill functionality, replacing the previous timestamp-based approach. The changes improve granular progress tracking and fault tolerance by enabling JSON merge patches for efficient incremental state updates.

  • Implements structured cursor support in the Estuary CDK with JSON merge patch functionality for efficient state updates
  • Refactors Monday.com items backfill to use board-level tracking instead of timestamp-based pagination
  • Simplifies the items fetching logic by removing the BoardItemIterator wrapper class

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
estuary-cdk/estuary_cdk/utils.py Adds JSON merge patch utility function for RFC 7396-compliant dictionary updates
estuary-cdk/estuary_cdk/capture/common.py Introduces dict-based cursor support with marker constants and merge patch handling in backfill logic
source-monday/source_monday/models.py Adds ItemsBackfillCursor dataclass for type-safe cursor operations and removes unused items_count field
source-monday/source_monday/api.py Refactors fetch_items_page to use dict-based cursors and board-level tracking instead of timestamp pagination
source-monday/source_monday/graphql/items.py Removes BoardItemIterator class and simplifies to direct function calls with improved cursor collection
source-monday/source_monday/graphql/boards.py Increases query limit from 500 to 10,000 and removes items_count from GraphQL query
source-monday/source_monday/graphql/init.py Updates exports to replace BoardItemIterator with get_items_from_boards function
Comments suppressed due to low confidence (1)

Comment thread source-monday/source_monday/graphql/items.py
Comment thread source-monday/source_monday/graphql/items.py
@JustinASmith JustinASmith merged commit e02fb06 into main Jul 21, 2025
95 of 104 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants