ducklake-sdk — Native SDKs for DuckLake
Read and write DuckLake tables from Rust and Python - no DuckDB required.
DuckLake is an integrated data lake and catalog format that stores metadata in a SQL catalog database and writes data as Parquet files. This repository provides standalone Rust and Python SDKs that talk to DuckLakes directly, with no dependency on DuckDB or its DuckLake extension.
All language SDKs are built on the same Rust core, which bundles the implementation of the DuckLake specification.
Python (ducklake-sdk)
Rust (ducklake)
Warning
This is not an official SDK released by the DuckDB Foundation.
Python
pip install ducklake-sdk # core
pip install "ducklake-sdk[polars]" # Polars integration
pip install "ducklake-sdk[arrow]" # Arrow + DuckDB integrationRust
cargo add ducklakeimport ducklake as dl
import polars as pl
# Create a new DuckLake backed by SQLite metadata and local Parquet storage
ducklake = dl.create("sqlite:///metadata.sqlite", data_path="data_files/")
# Define a table.
table = ducklake.create_table(
"events",
schema={"id": dl.Int64(), "message": dl.Varchar()},
)
# Write data using Polars
lf = pl.LazyFrame({"id": [1, 2, 3], "message": ["hello", "ducklake", "sdk"]})
table.sink_polars(lf)
# Read it back as a Polars LazyFrame
df = table.scan_polars().collect()For the full API, see the Python documentation or the Rust API docs.
The Rust core — and therefore every SDK built on top of it — supports:
- Metadata operations — schemas, tables, schema evolution, partitioning, constraints, and table/column tags
- Transactions with conflict resolution
- Data inlining for small writes
- Metadata configuration
- Time travel queries
The Python SDK additionally provides:
- Reading and writing data through Polars
- Reading, writing, and deleting data through DuckDB
- Maintenance operations — compaction, snapshot expiration, and more — via DuckDB
Catalog Databases
| Database | Status |
|---|---|
| SQLite | ✅ |
| Postgres | ✅ |
| MySQL | 🟧 (no data inlining*) |
*Data inlining for MySQL is not defined in the DuckLake specification.
Storage Backends
| Backend | Status |
|---|---|
| Local / NFS | ✅ |
| AWS S3-compatible | ✅ |
| Google Cloud Storage | ❌ |
| Azure Blob Storage | ❌ |
DuckLake Specification Versions
| Version | Status |
|---|---|
| 1.0 | ✅ (actively supported) |
| 0.4 | ⬆️ (requires migration) |
| 0.3 | ⬆️ (requires migration) |
| 0.2 | ⬆️ (requires migration) |
| 0.1 | ⬆️ (requires migration) |
See the DuckLake release calendar for upcoming versions.
Note
This project is in alpha. It will move to beta once the full specification is implemented, and to stable once all relevant limitations have been addressed. Expect occasional breaking changes until then.
-
GEOMETRYandVARIANTdata types - Mapping columns by name (Parquet files must currently carry field IDs)
- Views, macros, sort info, and encrypted files
Rust SDK (may impact efficiency):
- Tables partitioned with a non-identity transform do not benefit from file pruning yet.
- Filters are not pushed down into the metadata query. Statistics are still loaded eagerly and used by readers to prune files, but the metadata query may transmit more data than necessary.
- Not tested on Windows.
Python SDK:
- Maintenance operations (compaction, snapshot expiration, ...) are dispatched to DuckDB rather than implemented natively.
- Performance of polars reads and writes can be optimized further:
- Writes currently require reading the file footer after the file has already been written (see also pola-rs/polars#27226)
- Reads currently suffer from suboptimal footer reads (see also pola-rs/polars#27227)
Contributions, bug reports, and feature requests are very welcome. See the contribution guidelines to get started.
Licensed under the MIT License.