-
Notifications
You must be signed in to change notification settings - Fork 549
feat: support Shallow Clones for Filesystems #3938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Corwin Joy <[email protected]>
…ent name. Signed-off-by: Corwin Joy <[email protected]>
|
ACTION NEEDED delta-rs follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
|
This is a basic implementation of the shallow clone feature for delta-rs. While coding this, I ran into a couple of limitations that I could use feedback on.
|
|
Summary via copilot Pull Request OverviewThis PR adds a Key changes:
Changed Files
|
| Ok(()) | ||
| } | ||
|
|
||
| #[cfg(all(test, feature = "datafusion"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up requiring datafusion for the test here so that I could verify the data in the clone matched the data in the original at the same version. Not sure if this is a problem.
|
|
||
| log_store | ||
| .write_commit_entry(commit_version, commit_bytes.clone(), operation_id) | ||
| .await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do the file adds as a second commit. I think this is in line with how tables are usually done in delta-rs. That is, version 0 for the metadata. Then version 1 where the files are added.
This is mostly due to URL handling in datafusion. It's been something to resolve for a long time. |
Signed-off-by: Corwin Joy <[email protected]>
Description
Add support for creation/management of shallow clones (feature since 2.3) via delta-rs with python bindings.
Related Issue(s)
Closes issue #2456
Documentation
Delta Lake Clone
https://delta.io/blog/delta-lake-clone/
Use Case
Shallow clones are very valuable when wanting to test new features in ephemeral environments against production data, without huge memory usage or disruption to production systems. Being able to use a one-liner to effectively create an isolated test environment is especially valuable where users are granted read-only access to the table, but can use this feature to cheaply create their own writable branch of the data for testing new features.