Skip to content

Conversation

@corwinjoy
Copy link
Contributor

Description

Add support for creation/management of shallow clones (feature since 2.3) via delta-rs with python bindings.

Related Issue(s)

Closes issue #2456

Documentation

Delta Lake Clone
https://delta.io/blog/delta-lake-clone/

Use Case
Shallow clones are very valuable when wanting to test new features in ephemeral environments against production data, without huge memory usage or disruption to production systems. Being able to use a one-liner to effectively create an isolated test environment is especially valuable where users are granted read-only access to the table, but can use this feature to cheaply create their own writable branch of the data for testing new features.

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Nov 17, 2025
@github-actions
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@corwinjoy corwinjoy changed the title Support Shallow Clones for Filesystems feat: support Shallow Clones for Filesystems Nov 17, 2025
@corwinjoy
Copy link
Contributor Author

This is a basic implementation of the shallow clone feature for delta-rs. While coding this, I ran into a couple of limitations that I could use feedback on.

  1. It seems that delta-rs does not fully support absolute file paths yet. When I first tried this, I had the code add files with absolute paths. But, the tables wanted to prepend the table directory to the file paths, so this did not work. As a result, for now, I create symbolic links to obtain a usable feature. The goal would be to replace this eventually.

  2. I also tried this with deletion vectors. I have a test case for this using a table in the test directory with simple deletion vectors. However, this results in the error Error: Transaction { source: UnsupportedReaderFeatures([DeletionVectors]) }, so perhaps this feature is not yet supported? Or do I need to add something for this case?

@corwinjoy
Copy link
Contributor Author

Summary via copilot

Pull Request Overview

This PR adds a shallow_clone method to create Delta table clones that reference the same data files as the source table without copying actual data. The implementation uses symlinks to reference data files from the cloned table to the source table.

Key changes:

  • Adds shallow_clone method to the Python DeltaTable API accepting a target URI
  • Implements CloneBuilder in Rust core operations with symlink-based file sharing
  • Adds test coverage in both Python and Rust for the cloning functionality

Changed Files

Show a summary per file
File Description
python/tests/test_shallow_clone.py Adds Python integration test for shallow cloning functionality
python/src/lib.rs Adds Python binding for shallow_clone method on RawDeltaTable
python/deltalake/table.py Adds public Python API method for shallow cloning
python/deltalake/_internal.pyi Adds type stub for shallow_clone method
crates/core/src/operations/mod.rs Integrates CloneBuilder into DeltaOps API
crates/core/src/operations/clone.rs Core implementation of shallow clone operation with symlinks

@corwinjoy
Copy link
Contributor Author

@rtyler @adamreeve

Ok(())
}

#[cfg(all(test, feature = "datafusion"))]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up requiring datafusion for the test here so that I could verify the data in the clone matched the data in the original at the same version. Not sure if this is a problem.


log_store
.write_commit_entry(commit_version, commit_bytes.clone(), operation_id)
.await?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do the file adds as a second commit. I think this is in line with how tables are usually done in delta-rs. That is, version 0 for the metadata. Then version 1 where the files are added.

@hntd187
Copy link
Collaborator

hntd187 commented Nov 18, 2025

  1. It seems that delta-rs does not fully support absolute file paths yet. When I first tried this, I had the code add files with absolute paths. But, the tables wanted to prepend the table directory to the file paths, so this did not work. As a result, for now, I create symbolic links to obtain a usable feature. The goal would be to replace this eventually.

This is mostly due to URL handling in datafusion. It's been something to resolve for a long time.

@rtyler rtyler marked this pull request as draft November 18, 2025 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

binding/python Issues for the Python package binding/rust Issues for the Rust crate

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants