Replies: 1 comment
-
I'm not part of DuckDB team. I believe this feature would be really cool too. I have never used LakeFS (and I'm wondering if it's a good idea to have DuckLake on top of it ?) because I'm feared about user freely throwing data in a storage bound to my prod environment... I would like a way for users to create local branches (local datalakes), edit schemas and add data without impacting remote storage. It's already possible to copy ducklake's metadata database into a local duckdb, and override the default storage location to point to local folder - so that newly added data are local too, but we can still query remote data. I gave a try with S3:
CREATE OR REPLACE SECRET secret (
TYPE s3,
PROVIDER config,
KEY_ID '...',
SECRET '...',
ENDPOINT 's3.eu-central-003.backblazeb2.com'
);
ATTACH 'ducklake:postgres:dbname=...' AS s3_lakehouse (DATA_PATH 's3://joffreybvn/ducklake');
USE s3_lakehouse;
DO $$
BEGIN
-- Append data_path to all existing parquets
UPDATE ducklake_data_file
SET
path = 's3://joffreybvn/ducklake/' || path,
path_is_relative = false;
-- Use local storage for newly created parquets
UPDATE ducklake_metadata
SET value = 'data/'
WHERE key = 'data_path';
END;
$$;
ATTACH 'ducklake:postgres:dbname=...' AS s3_local;
USE s3_local;
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Duck Lake team,
I’m really enjoying the SQL lakehouse experience with DuckDB and Duck Lake! One feature I find extremely powerful in the Iceberg + Nessie ecosystem is the ability to do “zero-copy clones” and Git-like branching/merging of datasets—essentially, instant branching and versioning of tables and entire data lakes without duplicating data.
Is the Duck Lake team considering support for:
Zero-copy clone of tables or datasets (like what Nessie provides for Iceberg)?
Git-style branching and merging of data/catalog state, so we can experiment, develop, and collaborate on data with the same workflows as code?
If this is on the roadmap, I’d love to hear more. If not, I’d be interested in the team’s thoughts on the idea and any technical challenges you foresee.
Thanks for all your work on Duck Lake and DuckDB!
Beta Was this translation helpful? Give feedback.
All reactions