-
Couldn't load subscription status.
- Fork 537
feat: allow for lazy loading files in operations #3872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
909687a to
86aed3a
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3872 +/- ##
==========================================
- Coverage 73.80% 73.70% -0.11%
==========================================
Files 151 151
Lines 39227 39228 +1
Branches 39227 39228 +1
==========================================
- Hits 28951 28912 -39
- Misses 8981 9007 +26
- Partials 1295 1309 +14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
86aed3a to
17d6c86
Compare
| pub(crate) async fn resolve_snapshot( | ||
| log_store: &dyn LogStore, | ||
| maybe_snapshot: Option<EagerSnapshot>, | ||
| require_files: bool, | ||
| ) -> DeltaResult<EagerSnapshot> { | ||
| if let Some(snapshot) = maybe_snapshot { | ||
| if require_files { | ||
| snapshot.with_files(log_store).await | ||
| } else { | ||
| Ok(snapshot) | ||
| } | ||
| } else { | ||
| let mut config = DeltaTableConfig::default(); | ||
| config.require_files = require_files; | ||
| EagerSnapshot::try_new(log_store, config, None).await | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function along with the new methods on EagerSnapshot are the main pieces of new logic.
doing this again b/c force push.
bc0f96a to
921092e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these builders are having a constructor change, which I don't think is necessarily a problem but should they all have pub constructors? This change forces an semver change due to breaking public APIs. 😢
If we're breaking, why not just take them away?
921092e to
967821f
Compare
|
@rtyler - I really like the idea of hiding the constructors. rigth now we need them in the python bindings. Happy to look into this in a follow-up? |
967821f to
f18d658
Compare
|
This is DEFINITELY something that shouldn't be part of this PR but I was looking through this code the other day and was perhaps wondering if it's possible to get the best of both worlds? Store a |
|
@rtyler - turns out there surprisingly is a way ... using our intended APIs 😆. So most of the builder constructors are private now, exceptions being @abhiaagarwal - we absolutely need to look into some caching strategies and the way we are moving is to do as much work on streams as possible. The main blocker in this is that most operations do their planning assuming the full state is available. So even if they consume the stream based apis on Snapshots for operations planning, they usually just collect the stream. My best guess right now is that we do some caching in the engine/object-store level (e.g. caching json bytes or deletion vector data) and treat our current |
022a14d to
9e7f49f
Compare
Signed-off-by: Robert Pack <[email protected]> feat: towards lazy loading table state Signed-off-by: Robert Pack <[email protected]>
0f7c2fd to
e5262b8
Compare
Signed-off-by: Robert Pack <[email protected]>
e5262b8 to
6ebd4e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement!
Thus far we had a bunch of `unwraps` in our `DeltaOps` that would panic for a missing `EagerSnapshot`. In this PR we handle missing snapshot by resolving the options inside an operations future. This also made it quite simple to re-load files in case we need them but `require_files=false`. This effectively allows lazy laoding the files while we still default to eagerly loading the data. The PR does look quite large, but the patterns are very repetitive. * make eager snapshot optional on builders * resolve snapshot at beginning of operation future To safeguard against trying to execute operations without files loaded, we pushed the respective check on the delta config into the methods accessing the files. Signed-off-by: Robert Pack <[email protected]>
Description
Thus far we had a bunch of
unwrapsin ourDeltaOpsthat would panic for a missingEagerSnapshot. In this PR we handle missing snapshot by resolving the options inside an operations future. This also made it quite simple to re-load files in case we need them butrequire_files=false. This effectively allows lazy laoding the files while we still default to eagerly loading the data.The PR does look quite large, but the patterns are very repetitive.
To safeguard against trying to execute operations without files loaded, we pushed the respective check on the delta config into the methods accessing the files.