Skip to content

[RFC] feat!: kernel-based log replay #3137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

roeap
Copy link
Collaborator

@roeap roeap commented Jan 16, 2025

Description

This PR aims to provide new implementations for the current Snapshot (now called LazySnapshot) and EagerSnapshot back by the delta-kernel-rs library.

This PR focusses on the implementation of the new snapshots, but avoids updating all usage and removing the old ones. I plan to provide some stacked PRs that actually use these in operations etc., hoping that this way reviews and feedback can be a bit more streamlined.

To reduce churn in the codebase, after the switch has been made, we introduce a trait Snapshot which is implemented by the new snapshots and should also be implemented for DeltaTableState. We can now establish a more uniform API across the Snapshot variants since Kernel's execution model allows us to avoid async in all APIs.

One of the most significant conceptual changes is how eager the EagerSnapshot is. The parquet reading in both delta-rs and delta-kernel-rs has evolved much since the EagerSnapshot was first written and handles pushdown of columns and predicates much more effectively. TO mitigate the cost of repeated reads of commit data, we introduce a simple caching layer in form of an ObjectStore implementation that caches commit reads in memory. This is right now a simple brute force approach to allow for migration, but hopefully will be extended in the future to also avoid json parsing and caching parquet metadata reads.

Any feedback on the direction this is taking is greatly appreciated.

@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Jan 16, 2025
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@ion-elgreco
Copy link
Collaborator

@roeap exciting stuff :D

This will close then this issue: #2776

Will your PR also address this: #3062?

@roeap
Copy link
Collaborator Author

roeap commented Jan 16, 2025

Will your PR also address this: #3062?

Not in its current form, but updating Snapshot and with that the log segment needs to definitely go in here...

@roeap roeap force-pushed the feat/kernel-data branch 4 times, most recently from f8049db to e7c7766 Compare January 19, 2025 00:11
Copy link

codecov bot commented Jan 19, 2025

Codecov Report

Attention: Patch coverage is 48.76033% with 434 lines in your changes missing coverage. Please review.

Project coverage is 71.41%. Comparing base (2269a67) to head (0eac792).

Files with missing lines Patch % Lines
crates/core/src/kernel/snapshot_next/iterators.rs 33.93% 104 Missing and 5 partials ⚠️
crates/test/src/acceptance/data.rs 0.00% 87 Missing ⚠️
crates/core/src/kernel/snapshot_next/mod.rs 43.62% 58 Missing and 26 partials ⚠️
crates/core/src/kernel/snapshot_next/lazy.rs 70.64% 31 Missing and 28 partials ⚠️
crates/core/src/kernel/snapshot_next/eager.rs 53.21% 37 Missing and 14 partials ⚠️
crates/core/src/kernel/snapshot_next/cache.rs 64.15% 36 Missing and 2 partials ⚠️
crates/test/src/acceptance/meta.rs 81.48% 0 Missing and 5 partials ⚠️
crates/core/src/protocol/checkpoints.rs 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3137      +/-   ##
==========================================
- Coverage   71.81%   71.41%   -0.41%     
==========================================
  Files         145      152       +7     
  Lines       45972    46811     +839     
  Branches    45972    46811     +839     
==========================================
+ Hits        33016    33429     +413     
- Misses      10859    11216     +357     
- Partials     2097     2166      +69     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ion-elgreco
Copy link
Collaborator

@roeap I assume with the introduction of the CommitCacheObjectStore, you would want to have the instantiate two object_stores on the log store, one for commits, and one for reading/writing parquet.

With regards to the object store for reading/writing parquet, the folks at "seafowl" built an interesting caching layer for reading parquets https://github.com/splitgraph/seafowl/blob/main/src/object_store/cache.rs, I asked whether they could publish that as a crate, I think it could be really valuable for read operations during some operations that require scans

@roeap
Copy link
Collaborator Author

roeap commented Jan 22, 2025

Well .. this very naive caching implementation is mainly meant for now to not double down on some of the "regrets" from our pasts selves when it comes to the Snapshot implementation.

By now the parquet read is very selective in delta-rs and delta-kernel-rs with column selection and row group filtering... as such the assumption is, that we do not need to cache data from checkpoints and focus on caching all these expensive json commit reads.

This simplifies the data we keep in memory significantly - essentially just reconciled add action data. While not incurring too much of a penalty for repeated json (commit) reads.

But this is mostly just a stop-gap for adopting kernel "the right way", or at least not in an obviously wrong way 😆.

As you rightfully mention, there is much more that can be done. IIRC, datafusion also at least has the wiring to inject caching of parquet footers, which should make scanning snapshots for actions other then adds also much more efficient.

Without having spend too much time thinking about it, I think the abstraction you mentioned is much nicer - i.e. we are aware of what type of file we are reading. For us this would in a kernel world mean we would hoist some caching up to a higher level, the json and parquet handler traits in Engine. This way we could also avoid additional parsing.

One could argue that this is more or less what we are doing now, keeping all arrow state in memory, but I would say that we can build something much more efficient - and shareable across snapshots - at the engine layer. Also do things like local file cache etc ..

One thing I discussed with @rtyler is to move the caching object store to a dedicated PR, as we can get that merged much quicker then this one - which may yet take some time :). Also, we can think about if we can (and should) iterate on our configuration system a bit. The tombstones config for instance has no effect for a while now.

@ion-elgreco
Copy link
Collaborator

@roeap on your last note, I think that could be useful indeed to already provide the benefit of it. I haven't looked to depth in to that code, but I assume you can limit the cache size?

@roeap
Copy link
Collaborator Author

roeap commented Jan 22, 2025

Indeed you can - right now its a hard-coded count, bit in a separate PR this should be configurable. The crate also allows in a simple way to use other weights - e.g. limit by size, as well as choose eviction policies. Some of which we should allow users to configure, but hopefully we can just have great defaults based on what we know about delta tables 😆.

Copy link
Collaborator

@ion-elgreco ion-elgreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall!

Self {
inner,
check: Arc::new(cache_json),
cache: Arc::new(Cache::new(100)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a Weight capacity here as well with a configurable env var to limit the Bytes held in memory

files: Option<RecordBatch>,
}

impl Snapshot for EagerSnapshot {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that is missing is the DeltaTableConfig, I added this some time ago to the old snapshot because we some times need to be aware in the operation how the table got loaded.

    /// Get the table config which is loaded with of the snapshot
    pub fn load_config(&self) -> &DeltaTableConfig {
        self.snapshot.load_config()
    }

self.snapshot.table_root()
}

fn version(&self) -> Version {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more version_timestamp as well?

fn next(&mut self) -> Option<Self::Item> {
if self.index < self.paths.len() {
let path = self.paths.value(self.index).to_string();
let add = AddVisitor::visit_add(self.index, path, self.getters.as_slice())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always guaranteed to find the next add action?

))
}

pub fn stats(&self) -> Option<&str> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does a logicalFileView have stats?


fn extract_column<'a>(
mut parent: &'a dyn ProvidesColumnByName,
col: &[impl AsRef<str>],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name threw me off, I thought it was multiple columns, but it's a single column_path

res.and_then(|(data, predicate)| {
let batch: RecordBatch =
ArrowEngineData::try_from_engine_data(data)?.into();
Ok(filter_record_batch(&batch, &BooleanArray::from(predicate))?)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why even filter when the predicate was None?

start_version: Option<Version>,
limit: Option<usize>,
) -> DeltaResult<Box<dyn Iterator<Item = (Version, CommitInfo)>>> {
// let start_version = start_version.into();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old line?

Comment on lines +146 to +152
let end_version = start_version.unwrap_or_else(|| self.version());
let start_version = limit
.and_then(|limit| {
if limit == 0 {
Some(end_version)
} else {
Some(end_version.saturating_sub(limit as u64 - 1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is highly confusing xd, the end version becomes the start versions when passed, and then the start_versions becomes the end version again when there is no limit :S

store: Arc<dyn ObjectStore>,
version: impl Into<Option<Version>>,
) -> DeltaResult<Self> {
// TODO: how to deal with the dedicated IO runtime? Would this already be covered by the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently do that all the way at the beginning in logstore_with

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roeap What's the status of integrating this, any ETA?

roeap added 3 commits April 11, 2025 19:02
Signed-off-by: Robert Pack <[email protected]>
Signed-off-by: Robert Pack <[email protected]>
@roeap roeap force-pushed the feat/kernel-data branch from a7ce51d to 7004eb7 Compare April 11, 2025 17:02
@github-actions github-actions bot removed the binding/python Issues for the Python package label Apr 11, 2025
@github-actions github-actions bot added the binding/python Issues for the Python package label Apr 11, 2025
@github-actions github-actions bot removed the binding/python Issues for the Python package label Apr 12, 2025
rtyler added a commit to rtyler/delta-rs that referenced this pull request May 4, 2025
This commit was modified from delta-io#3137 to enable an independent merge to
bring some of these structural changes needed for delta-kernel-rs
integration in piecemeal

Signed-off-by: Robert Pack <[email protected]>
Signed-off-by: R. Tyler Croy <[email protected]>
rtyler added a commit to rtyler/delta-rs that referenced this pull request May 5, 2025
This commit was modified from delta-io#3137 to enable an independent merge to
bring some of these structural changes needed for delta-kernel-rs
integration in piecemeal

Signed-off-by: Robert Pack <[email protected]>
Signed-off-by: R. Tyler Croy <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request May 5, 2025
This commit was modified from #3137 to enable an independent merge to
bring some of these structural changes needed for delta-kernel-rs
integration in piecemeal

Signed-off-by: Robert Pack <[email protected]>
Signed-off-by: R. Tyler Croy <[email protected]>
@zeevm
Copy link
Contributor

zeevm commented May 6, 2025

@roeap @ion-elgreco @rtyler anyone knows what's the status of this PR?

IIUC moving to delta-kernel is foundational for delta-rs moving forward, closing many gaps in protocol compliance like column mapping, deletion vectors, V2 checkpoints with sidecars etc. yet the only PR that seems to be working towards it has been open since January and doesn't seem to be concluding.

Can anyone share broad roadmap plans for delta-rs here?

Thanks.

@ion-elgreco
Copy link
Collaborator

@roeap @ion-elgreco @rtyler anyone knows what's the status of this PR?

IIUC moving to delta-kernel is foundational for delta-rs moving forward, closing many gaps in protocol compliance like column mapping, deletion vectors, V2 checkpoints with sidecars etc. yet the only PR that seems to be working towards it has been open since January and doesn't seem to be concluding.

Can anyone share broad roadmap plans for delta-rs here?

Thanks.

There is no official roadmap.

@zeevm
Copy link
Contributor

zeevm commented May 6, 2025

@roeap @ion-elgreco @rtyler anyone knows what's the status of this PR?
IIUC moving to delta-kernel is foundational for delta-rs moving forward, closing many gaps in protocol compliance like column mapping, deletion vectors, V2 checkpoints with sidecars etc. yet the only PR that seems to be working towards it has been open since January and doesn't seem to be concluding.
Can anyone share broad roadmap plans for delta-rs here?
Thanks.

There is no official roadmap.

Thanks.
Is there an ETA for this PR? Anyway to help with this effort?

@ion-elgreco
Copy link
Collaborator

@roeap @ion-elgreco @rtyler anyone knows what's the status of this PR?
IIUC moving to delta-kernel is foundational for delta-rs moving forward, closing many gaps in protocol compliance like column mapping, deletion vectors, V2 checkpoints with sidecars etc. yet the only PR that seems to be working towards it has been open since January and doesn't seem to be concluding.
Can anyone share broad roadmap plans for delta-rs here?
Thanks.

There is no official roadmap.

You could take this PR and try to hook the new log replay to delta-rs

@zeevm
Copy link
Contributor

zeevm commented May 6, 2025

@roeap @ion-elgreco @rtyler anyone knows what's the status of this PR?
IIUC moving to delta-kernel is foundational for delta-rs moving forward, closing many gaps in protocol compliance like column mapping, deletion vectors, V2 checkpoints with sidecars etc. yet the only PR that seems to be working towards it has been open since January and doesn't seem to be concluding.
Can anyone share broad roadmap plans for delta-rs here?
Thanks.

There is no official roadmap.

You could take this PR and try to hook the new log replay to delta-rs

I don't want to bifurcate the effort so just so I understand the current status, is this PR currently abandoned? Should I fork it and continue work on it or start fresh? what's the best game plan here?

@ion-elgreco
Copy link
Collaborator

@roeap @ion-elgreco @rtyler anyone knows what's the status of this PR?
IIUC moving to delta-kernel is foundational for delta-rs moving forward, closing many gaps in protocol compliance like column mapping, deletion vectors, V2 checkpoints with sidecars etc. yet the only PR that seems to be working towards it has been open since January and doesn't seem to be concluding.
Can anyone share broad roadmap plans for delta-rs here?
Thanks.

There is no official roadmap.

You could take this PR and try to hook the new log replay to delta-rs

I don't want to bifurcate the effort so just so I understand the current status, is this PR currently abandoned? Should I fork it and continue work on it or start fresh? what's the best game plan here?

It's not abandoned afaik, but @roeap has done some prep work in latest prs to get things more compatible, see https://github.com/delta-io/delta-rs/pulls?q=is%3Apr+author%3Aroeap+is%3Aclosed, also @rtyler moved some of the dat testing in this pr already into main.

I would just continue on this work and see if you can come up with something, @roeap however can give the best answer here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants