Skip to content

checkpointed confluence #4473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 15, 2025
Merged

checkpointed confluence #4473

merged 5 commits into from
Apr 15, 2025

Conversation

evan-onyx
Copy link
Contributor

@evan-onyx evan-onyx commented Apr 7, 2025

Description

Addresses https://linear.app/danswer/issue/DAN-1703/checkpointed-confluence-connector

Checkpointing for the confluence connector. Creates a new checkpoint each time the previous version of the connector would have finished a batch.

How Has This Been Tested?

tested in UI and unit tests

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

@evan-onyx evan-onyx requested a review from a team as a code owner April 7, 2025 23:16
Copy link

vercel bot commented Apr 7, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 14, 2025 11:20pm

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR integrates a new checkpointing mechanism for multiple connectors, replacing CheckpointConnector with the new CheckpointedConnector interface and standardizing checkpoint management across the codebase.

  • Updated backend/onyx/connectors/factory.py to support CheckpointedConnector for POLL input types.
  • Modified backend/onyx/connectors/connector_runner.py to enforce time_range and wrap outputs with CheckpointOutputWrapper.
  • Refactored backend/onyx/background/indexing/checkpointing_utils.py for updated checkpoint validation and loading.
  • Adjusted key connectors (Confluence, GitHub, GoogleDrive, etc.) to increment checkpoints per batch with enhanced error propagation.
  • Test files now use utilities like load_all_docs_from_checkpoint_connector for thorough validation.

15 file(s) reviewed, no comment(s)
Edit PR Review Bot Settings | Greptile

if not self.continue_on_failure:
if _should_propagate_error(e):
raise
# TODO: should we remove continue_on_failure entirely now that we have checkpointing?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think so!

@evan-onyx evan-onyx added this pull request to the merge queue Apr 14, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 14, 2025
@evan-onyx evan-onyx force-pushed the confluence-checkpointed-connector branch from 11a806c to 5b5c35b Compare April 14, 2025 23:17
@evan-onyx evan-onyx enabled auto-merge April 14, 2025 23:28
@evan-onyx evan-onyx added this pull request to the merge queue Apr 14, 2025
Merged via the queue into main with commit ae9f8c3 Apr 15, 2025
10 of 11 checks passed
@evan-onyx evan-onyx deleted the confluence-checkpointed-connector branch April 15, 2025 00:43
aronszanto pushed a commit to aronszanto/onyx that referenced this pull request Apr 26, 2025
* checkpointed confluence

* confluence checkpointing tested

* fixed integration tests

* attempt to fix connector test flakiness

* fix rebase
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants