Lazily fetch Exchange item data when possible #4300

ashmrtn · 2023-09-19T22:59:15Z

Implement lazy data fetch for Exchange items.
Use a new collection type to clearly denote
when items can be lazily fetched vs. requiring
eager fetch

This PR changes how the read bytes stat is
updated. Lazily fetched items will not
update the read bytes stat. This stat doesn't
appear to be used anywhere at the moment

For items that are deleted between the time
enumeration takes place and the time the data
for them needs fetched, the corso will:

return an empty reader for the item
not add the item to backup details
delete the (empty) item from kopia on the
next backup

Manually tested deleting an item between
enumeration and data fetch

Does this PR need a docs update or release note?

✅ Yes, it's included
🕐 Yes, but in a later PR
⛔ No

Type of change

Issue(s)

closes Delay fetching Exchange item data from Graph API until it's clear kopia will be uploading the item #2023

Test Plan

💪 Manual
⚡ Unit test
💚 E2E

aviator-app · 2023-09-19T22:59:25Z

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was merged using Aviator.

See the real-time status of this PR on the Aviator webapp.

ashmrtn · 2023-09-19T23:04:13Z

added #4299 as follow up but not sure when it will be addressed

meain

LGTM

meain · 2023-09-20T09:48:45Z

src/internal/m365/collection/exchange/collection.go

+	//
+	// TODO(ashmrtn): If we switch to immutable IDs then we'll need to handle this
+	// sort of operation in the pager since this would become order-dependent
+	// unless Graph started consolidating the changes into a single delta result.


It is possible for an deletion to happen midway through us fetching pages from graph and thus we can never be certain that one item will not have multiple entries. I'm not sure how this plays into immutable IDs, but just wanted to mention it.

I think we can catch duplicate IDs if changes happen mid-paging. The problem is if we have immutable IDs the handling of the duplicates becomes order dependent

For background, if an item is changed/moved/deleted while paging through results then either 1. the item will appear again in the result set or 2. the item will appear in the next delta result set in the next backup

The code below handles the first case by removing IDs in the added set if they also appear in the removed set. If we have multiple adds for the same item the map automatically consolidates them into a single item fetch

This would need to change if we switched to immutable IDs because in that case it would be possible to see a series of results like (add item A, remove item A, add item A). The "correct" outcome in that case would be to fetch data for item A. Without immutable IDs the delta results look like (add item A, remove item A, add item A') so we avoid the ordering issue altogether

src/internal/m365/collection/exchange/collection.go

meain · 2023-09-20T09:53:06Z

src/internal/m365/collection/exchange/collection.go

@@ -12,6 +12,8 @@ import (
 	"time"

 	"github.com/alcionai/clues"
+	"github.com/spatialcurrent/go-lazy/pkg/lazy"


I know we did not add this package in this PR, but this is a repo with 2 stars last updated 2 years ago. Should we use something else or maybe just implement a simple version of this within corso?

heh I wondered about that too when we added it to the OneDrive code previously. I guess since we haven't had issues with it so far we may as well keep using it? We can always switch if we do find some problem

To play devils advocate a bit, it's possible that there hasn't been updates to the repo simply because it doesn't do much from a logical standpoint and was made well enough that there haven't been issues reported

Change the type of added to have mod times. Also handle deduping added and removed items in the collection initializer instead of the logic that asks for the collection to be created.

Use a new collection and item implementation to lazily pull item data from exchange if kopia requires it. This should only be used for added items.

Return either a prefetchCollection or a lazyFetchCollection depending on if the mod times are valid or not.

Allow returning data and errors a bit more smoothly.

If an item is deleted in flight (returns a sentinel error when getting info) don't add it to backup details since we don't have the data to restore it.

sonarqubecloud · 2023-09-20T19:35:43Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
1 Code Smell

No Coverage information
0.0% Duplication

ashmrtn added backup perf exchange labels Sep 19, 2023

ashmrtn requested review from meain and ryanfkeepers September 19, 2023 22:59

ashmrtn self-assigned this Sep 19, 2023

ashmrtn temporarily deployed to Testing September 19, 2023 22:59 — with GitHub Actions Inactive

ashmrtn had a problem deploying to Testing September 19, 2023 23:00 — with GitHub Actions Error

ashmrtn temporarily deployed to Testing September 19, 2023 23:03 — with GitHub Actions Inactive

ashmrtn temporarily deployed to Testing September 19, 2023 23:04 — with GitHub Actions Inactive

meain approved these changes Sep 20, 2023

View reviewed changes

ashmrtn added 5 commits September 20, 2023 12:34

Handle added/removed during collection creation

3089bea

Change the type of added to have mod times. Also handle deduping added and removed items in the collection initializer instead of the logic that asks for the collection to be created.

Add collection to lazily fetch item data

5fdae99

Use a new collection and item implementation to lazily pull item data from exchange if kopia requires it. This should only be used for added items.

Use collection ctor to return different types

c111e02

Return either a prefetchCollection or a lazyFetchCollection depending on if the mod times are valid or not.

Expand mock functionality

0af023d

Allow returning data and errors a bit more smoothly.

Add and update tests

255027a

ashmrtn added 5 commits September 20, 2023 12:34

Don't add details for deleted in flight items

6713c91

If an item is deleted in flight (returns a sentinel error when getting info) don't add it to backup details since we don't have the data to restore it.

Add changelog

c26680e

Minor test and compile fixups

7ea1f5b

Minor cleanup

b548236

Minor variable naming update based on review

56bda10

ashmrtn force-pushed the 2023-exch-lazy-fetch branch from 02a56a2 to 56bda10 Compare September 20, 2023 19:35

ashmrtn temporarily deployed to Testing September 20, 2023 19:35 — with GitHub Actions Inactive

ashmrtn temporarily deployed to Testing September 20, 2023 19:36 — with GitHub Actions Inactive

ashmrtn added the mergequeue label Sep 20, 2023

aviator-app bot merged commit b212c37 into main Sep 20, 2023

aviator-app bot deleted the 2023-exch-lazy-fetch branch September 20, 2023 20:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lazily fetch Exchange item data when possible #4300

Lazily fetch Exchange item data when possible #4300

Uh oh!

ashmrtn commented Sep 19, 2023

Uh oh!

aviator-app bot commented Sep 19, 2023 •

edited

Loading

Uh oh!

ashmrtn commented Sep 19, 2023

Uh oh!

meain left a comment

Uh oh!

meain Sep 20, 2023

Uh oh!

ashmrtn Sep 20, 2023

Uh oh!

Uh oh!

meain Sep 20, 2023

Uh oh!

ashmrtn Sep 20, 2023

Uh oh!

sonarqubecloud bot commented Sep 20, 2023

Uh oh!

Uh oh!

Lazily fetch Exchange item data when possible #4300

Lazily fetch Exchange item data when possible #4300

Uh oh!

Conversation

ashmrtn commented Sep 19, 2023

Does this PR need a docs update or release note?

Type of change

Issue(s)

Test Plan

Uh oh!

aviator-app bot commented Sep 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current Aviator status

Uh oh!

ashmrtn commented Sep 19, 2023

Uh oh!

meain left a comment

Choose a reason for hiding this comment

Uh oh!

meain Sep 20, 2023

Choose a reason for hiding this comment

Uh oh!

ashmrtn Sep 20, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

meain Sep 20, 2023

Choose a reason for hiding this comment

Uh oh!

ashmrtn Sep 20, 2023

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Sep 20, 2023

Uh oh!

Uh oh!

aviator-app bot commented Sep 19, 2023 •

edited

Loading