Skip to content
This repository was archived by the owner on Oct 11, 2024. It is now read-only.

Lazily fetch Exchange item data when possible #4300

Merged
merged 10 commits into from
Sep 20, 2023
Merged

Conversation

ashmrtn
Copy link
Contributor

@ashmrtn ashmrtn commented Sep 19, 2023

Implement lazy data fetch for Exchange items.
Use a new collection type to clearly denote
when items can be lazily fetched vs. requiring
eager fetch

This PR changes how the read bytes stat is
updated.
Lazily fetched items will not
update the read bytes stat. This stat doesn't
appear to be used anywhere at the moment

For items that are deleted between the time
enumeration takes place and the time the data
for them needs fetched, the corso will:

  • return an empty reader for the item
  • not add the item to backup details
  • delete the (empty) item from kopia on the
    next backup

Manually tested deleting an item between
enumeration and data fetch


Does this PR need a docs update or release note?

  • ✅ Yes, it's included
  • 🕐 Yes, but in a later PR
  • ⛔ No

Type of change

  • 🌻 Feature
  • 🐛 Bugfix
  • 🗺️ Documentation
  • 🤖 Supportability/Tests
  • 💻 CI/Deployment
  • 🧹 Tech Debt/Cleanup

Issue(s)

Test Plan

  • 💪 Manual
  • ⚡ Unit test
  • 💚 E2E

@ashmrtn ashmrtn self-assigned this Sep 19, 2023
@ashmrtn ashmrtn temporarily deployed to Testing September 19, 2023 22:59 — with GitHub Actions Inactive
@ashmrtn ashmrtn temporarily deployed to Testing September 19, 2023 22:59 — with GitHub Actions Inactive
@aviator-app
Copy link
Contributor

aviator-app bot commented Sep 19, 2023

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was merged using Aviator.


See the real-time status of this PR on the Aviator webapp.

@ashmrtn ashmrtn temporarily deployed to Testing September 19, 2023 22:59 — with GitHub Actions Inactive
@ashmrtn ashmrtn temporarily deployed to Testing September 19, 2023 23:03 — with GitHub Actions Inactive
@ashmrtn ashmrtn temporarily deployed to Testing September 19, 2023 23:03 — with GitHub Actions Inactive
@ashmrtn ashmrtn temporarily deployed to Testing September 19, 2023 23:03 — with GitHub Actions Inactive
@ashmrtn
Copy link
Contributor Author

ashmrtn commented Sep 19, 2023

added #4299 as follow up but not sure when it will be addressed

@ashmrtn ashmrtn temporarily deployed to Testing September 19, 2023 23:04 — with GitHub Actions Inactive
@ashmrtn ashmrtn temporarily deployed to Testing September 19, 2023 23:04 — with GitHub Actions Inactive
@ashmrtn ashmrtn temporarily deployed to Testing September 19, 2023 23:04 — with GitHub Actions Inactive
@ashmrtn ashmrtn temporarily deployed to Testing September 19, 2023 23:04 — with GitHub Actions Inactive
@ashmrtn ashmrtn temporarily deployed to Testing September 19, 2023 23:04 — with GitHub Actions Inactive
Copy link
Member

@meain meain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

//
// TODO(ashmrtn): If we switch to immutable IDs then we'll need to handle this
// sort of operation in the pager since this would become order-dependent
// unless Graph started consolidating the changes into a single delta result.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible for an deletion to happen midway through us fetching pages from graph and thus we can never be certain that one item will not have multiple entries. I'm not sure how this plays into immutable IDs, but just wanted to mention it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can catch duplicate IDs if changes happen mid-paging. The problem is if we have immutable IDs the handling of the duplicates becomes order dependent

For background, if an item is changed/moved/deleted while paging through results then either 1. the item will appear again in the result set or 2. the item will appear in the next delta result set in the next backup

The code below handles the first case by removing IDs in the added set if they also appear in the removed set. If we have multiple adds for the same item the map automatically consolidates them into a single item fetch

This would need to change if we switched to immutable IDs because in that case it would be possible to see a series of results like (add item A, remove item A, add item A). The "correct" outcome in that case would be to fetch data for item A. Without immutable IDs the delta results look like (add item A, remove item A, add item A') so we avoid the ordering issue altogether

@@ -12,6 +12,8 @@ import (
"time"

"github.com/alcionai/clues"
"github.com/spatialcurrent/go-lazy/pkg/lazy"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we did not add this package in this PR, but this is a repo with 2 stars last updated 2 years ago. Should we use something else or maybe just implement a simple version of this within corso?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh I wondered about that too when we added it to the OneDrive code previously. I guess since we haven't had issues with it so far we may as well keep using it? We can always switch if we do find some problem

To play devils advocate a bit, it's possible that there hasn't been updates to the repo simply because it doesn't do much from a logical standpoint and was made well enough that there haven't been issues reported

Change the type of added to have mod times. Also handle deduping added
and removed items in the collection initializer instead of the logic
that asks for the collection to be created.
Use a new collection and item implementation to lazily pull item data
from exchange if kopia requires it. This should only be used for added
items.
Return either a prefetchCollection or a lazyFetchCollection depending on
if the mod times are valid or not.
Allow returning data and errors a bit more smoothly.
If an item is deleted in flight (returns a sentinel error when getting
info) don't add it to backup details since we don't have the data to
restore it.
@ashmrtn ashmrtn force-pushed the 2023-exch-lazy-fetch branch from 02a56a2 to 56bda10 Compare September 20, 2023 19:35
@ashmrtn ashmrtn temporarily deployed to Testing September 20, 2023 19:35 — with GitHub Actions Inactive
@ashmrtn ashmrtn temporarily deployed to Testing September 20, 2023 19:35 — with GitHub Actions Inactive
@ashmrtn ashmrtn temporarily deployed to Testing September 20, 2023 19:35 — with GitHub Actions Inactive
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@ashmrtn ashmrtn temporarily deployed to Testing September 20, 2023 19:36 — with GitHub Actions Inactive
@ashmrtn ashmrtn temporarily deployed to Testing September 20, 2023 19:36 — with GitHub Actions Inactive
@ashmrtn ashmrtn temporarily deployed to Testing September 20, 2023 19:36 — with GitHub Actions Inactive
@ashmrtn ashmrtn temporarily deployed to Testing September 20, 2023 19:36 — with GitHub Actions Inactive
@ashmrtn ashmrtn temporarily deployed to Testing September 20, 2023 19:36 — with GitHub Actions Inactive
@aviator-app aviator-app bot merged commit b212c37 into main Sep 20, 2023
@aviator-app aviator-app bot deleted the 2023-exch-lazy-fetch branch September 20, 2023 20:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Delay fetching Exchange item data from Graph API until it's clear kopia will be uploading the item
2 participants