-
Notifications
You must be signed in to change notification settings - Fork 54
Lazily fetch Exchange item data when possible #4300
Conversation
Current Aviator status
This PR was merged using Aviator. |
added #4299 as follow up but not sure when it will be addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// | ||
// TODO(ashmrtn): If we switch to immutable IDs then we'll need to handle this | ||
// sort of operation in the pager since this would become order-dependent | ||
// unless Graph started consolidating the changes into a single delta result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible for an deletion to happen midway through us fetching pages from graph and thus we can never be certain that one item will not have multiple entries. I'm not sure how this plays into immutable IDs, but just wanted to mention it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can catch duplicate IDs if changes happen mid-paging. The problem is if we have immutable IDs the handling of the duplicates becomes order dependent
For background, if an item is changed/moved/deleted while paging through results then either 1. the item will appear again in the result set or 2. the item will appear in the next delta result set in the next backup
The code below handles the first case by removing IDs in the added set if they also appear in the removed set. If we have multiple adds for the same item the map automatically consolidates them into a single item fetch
This would need to change if we switched to immutable IDs because in that case it would be possible to see a series of results like (add item A, remove item A, add item A)
. The "correct" outcome in that case would be to fetch data for item A. Without immutable IDs the delta results look like (add item A, remove item A, add item A')
so we avoid the ordering issue altogether
@@ -12,6 +12,8 @@ import ( | |||
"time" | |||
|
|||
"github.com/alcionai/clues" | |||
"github.com/spatialcurrent/go-lazy/pkg/lazy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we did not add this package in this PR, but this is a repo with 2 stars last updated 2 years ago. Should we use something else or maybe just implement a simple version of this within corso?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh I wondered about that too when we added it to the OneDrive code previously. I guess since we haven't had issues with it so far we may as well keep using it? We can always switch if we do find some problem
To play devils advocate a bit, it's possible that there hasn't been updates to the repo simply because it doesn't do much from a logical standpoint and was made well enough that there haven't been issues reported
Change the type of added to have mod times. Also handle deduping added and removed items in the collection initializer instead of the logic that asks for the collection to be created.
Use a new collection and item implementation to lazily pull item data from exchange if kopia requires it. This should only be used for added items.
Return either a prefetchCollection or a lazyFetchCollection depending on if the mod times are valid or not.
Allow returning data and errors a bit more smoothly.
If an item is deleted in flight (returns a sentinel error when getting info) don't add it to backup details since we don't have the data to restore it.
02a56a2
to
56bda10
Compare
Kudos, SonarCloud Quality Gate passed!
|
Implement lazy data fetch for Exchange items.
Use a new collection type to clearly denote
when items can be lazily fetched vs. requiring
eager fetch
This PR changes how the read bytes stat is
updated. Lazily fetched items will not
update the read bytes stat. This stat doesn't
appear to be used anywhere at the moment
For items that are deleted between the time
enumeration takes place and the time the data
for them needs fetched, the corso will:
next backup
Manually tested deleting an item between
enumeration and data fetch
Does this PR need a docs update or release note?
Type of change
Issue(s)
Test Plan