This is the DuckDB extension for Delta, built using the Delta Kernel. The extension offers read and limited write (blind insert) support for delta tables, both local and remote.
The supported platforms are:
linux_amd64andlinux_amd64_gcc4andlinux_arm64osx_amd64andosx_arm64windows_amd64
Support for the other DuckDB platforms is work-in-progress
Note
This extension requires the DuckDB v0.10.3 or higher
This extension is distributed as a binary extension. To use it, simply use one of its functions from DuckDB and the extension will be autoloaded:
FROM delta_scan('s3://some/delta/table');To scan a local table, use the full path prefixes with file://
FROM delta_scan('file:///some/path/on/local/machine');Note that using DuckDB Secrets for Cloud authentication is supported.
CREATE SECRET (
TYPE S3,
PROVIDER CREDENTIAL_CHAIN
);
FROM delta_scan('s3://some/delta/table/with/auth');CREATE SECRET (
TYPE AZURE,
PROVIDER CREDENTIAL_CHAIN,
CHAIN 'cli',
ACCOUNT_NAME 'mystorageaccount'
);
FROM delta_scan('abfss://some/delta/table/with/auth');https://duckdb.org/docs/guides/network_cloud_storage/gcs_import.html You need to create HMAC keys and declare a secret.
CREATE SECRET (
TYPE GCS,
KEY_ID 'xxxx',
SECRET 'yyy'
);Many features/optimizations are supported in this extension as it reuses most of DuckDB's regular parquet scanning logic:
- multithreaded scans and parquet metadata reading
- data skipping/filter pushdown
- skipping row-groups in file (based on parquet metadata)
- skipping complete files (based on delta partition info)
- projection pushdown
- blind inserts
- scanning tables with deletion vectors
- all primitive types
- structs
- VARIANT type support
- Cloud storage (AWS, Azure, GCP) support with secrets
See the Extension Template for generic build instructions
There are various tests available for the delta extension:
- Delta Acceptence Test (DAT) based tests in
/test/sql/dat - delta-kernel-rs based tests in
/test/sql/delta_kernel_rs - Generated data based tests in
tests/sql/generated(generated using delta-rs, PySpark, and DuckDB)
To run the first 2 sets of tests:
make test_debugor in release mode
make testTo also run the tests on generated data:
make generate-data
GENERATED_DATA_AVAILABLE=1 make testSimply update the GIT_TAG definition found in ./CMakeLists.txt and (re-)run
make clean <debug|release>. The FFI header is included directly from the
cargo build, and any breakage from the update should show up immediately.