Skip to content

Conversation

@evan-onyx
Copy link
Contributor

Description

Should fix drive id not set in checkpoint; previously when a user failed to index a folder (which is fairly common due to permissions issues) the folder id was not set in the checkpoint. The necessary information for the checkpoint is yielded on error too,

How Has This Been Tested?

Tests tbd

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

@evan-onyx evan-onyx requested a review from a team as a code owner April 5, 2025 04:09
@vercel
Copy link

vercel bot commented Apr 5, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 5, 2025 4:09am

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

The changes ensure that the checkpoint is updated with completion and folder ID information even in error scenarios, helping maintain consistent state across indexing attempts.

  • Updated logic in backend/onyx/connectors/google_drive/connector.py to unconditionally yield checkpoint completion info on errors.
  • Ensures folder IDs are recorded even when permissions issues cause file indexing failures.
  • Addresses potential issues where file.drive_file may miss MODIFIED_TIME without overwriting valid state.
  • Enhances resilience of error handling in the Google Drive connector checkpointing mechanism.

1 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines +840 to +846
checkpoint.completion_map[file.user_email].update(
stage=file.completion_stage,
completed_until=datetime.fromisoformat(
file.drive_file[GoogleFields.MODIFIED_TIME.value]
).timestamp(),
completed_until_parent_id=file.parent_id,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Ensure file.drive_file always contains MODIFIED_TIME even when file.error is set. Otherwise may raise runtime errors.

Suggested change
checkpoint.completion_map[file.user_email].update(
stage=file.completion_stage,
completed_until=datetime.fromisoformat(
file.drive_file[GoogleFields.MODIFIED_TIME.value]
).timestamp(),
completed_until_parent_id=file.parent_id,
)
if file.error is None:
checkpoint.completion_map[file.user_email].update(
stage=file.completion_stage,
completed_until=datetime.fromisoformat(
file.drive_file[GoogleFields.MODIFIED_TIME.value]
).timestamp(),
completed_until_parent_id=file.parent_id,
)

@Weves Weves merged commit 989dab5 into main Apr 7, 2025
8 of 11 checks passed
@Weves Weves deleted the id-not-set-in-checkpoint branch April 7, 2025 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants