Skip to content

Commit 56c5269

Browse files
authored
Enable static DuckDB extensions via Cargo in checkouts (#732)
This PR adds an experimental `bundled-cmake` build path for duckdb-rs checkouts. It keeps the existing `bundled` feature unchanged, and adds a higher-priority bundled backend that builds DuckDB through upstream CMake when `bundled-cmake` is enabled. Only in-tree static extensions such as `icu` are supported for now, but there's a path toward future out-of-tree extension support. In fact, I think using CMake is the only realistic path toward out-of-tree extensions. Features like `sqlite_scanner` are already part of DuckDB's checked-in extension configs. Reusing that upstream mechanism is much more maintainable than inventing a parallel extension build/link system in Rust. *Let DuckDB build DuckDB.* *Update: For out-of-tree extensions, see #734 Although this PR only targets checkout builds, the CMake backend is structured so that a future crates.io-friendly variant could reuse the same backend logic after downloading the DuckDB sources (`duckdb.tar.gz` does not contain the full source tree). Fixes #461
2 parents 6a00611 + 0a1d578 commit 56c5269

10 files changed

Lines changed: 802 additions & 230 deletions

File tree

.github/workflows/rust.yaml

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ jobs:
2323
test:
2424
# - Linux uses DUCKDB_DOWNLOAD_LIB (non-bundled)
2525
# - Windows uses pre-downloaded archives via DUCKDB_LIB_DIR
26-
# - bundled feature tested by --all-features
26+
# - bundled feature covered by the Ubuntu clippy feature set below
2727
name: Test ${{ matrix.name }}
2828
strategy:
2929
fail-fast: true
@@ -64,7 +64,8 @@ jobs:
6464

6565
- name: run cargo clippy
6666
if: matrix.os == 'ubuntu-latest'
67-
run: cargo clippy --all-targets --all-features --locked -- -D warnings
67+
# `bundled-cmake` is checkout-only and covered in the dedicated CMake job below.
68+
run: cargo clippy --all-targets --features "buildtime_bindgen extensions-full loadable-extension modern-full vscalar vscalar-arrow vtab-full" --locked -- -D warnings
6869

6970
- name: Dry-run release of crates
7071
if: matrix.os == 'ubuntu-latest'
@@ -104,6 +105,37 @@ jobs:
104105
DUCKDB_INCLUDE_DIR: ${{ github.workspace }}/libduckdb
105106
LD_LIBRARY_PATH: ${{ github.workspace }}/libduckdb
106107

108+
cmake:
109+
name: Bundled CMake ${{ matrix.name }}
110+
strategy:
111+
fail-fast: true
112+
matrix:
113+
include:
114+
- { name: Linux, os: ubuntu-latest }
115+
- { name: macOS, os: macos-latest }
116+
- { name: Windows, os: windows-latest }
117+
runs-on: ${{ matrix.os }}
118+
env:
119+
SCCACHE_GHA_ENABLED: "true"
120+
RUSTC_WRAPPER: sccache
121+
CMAKE_C_COMPILER_LAUNCHER: sccache
122+
CMAKE_CXX_COMPILER_LAUNCHER: sccache
123+
steps:
124+
- uses: actions/checkout@v5
125+
with:
126+
submodules: recursive
127+
- uses: actions-rust-lang/setup-rust-toolchain@v1
128+
- uses: mozilla-actions/sccache-action@v0.0.9
129+
- name: Verify static linking
130+
shell: bash
131+
run: |
132+
output=$(cargo run -p duckdb --example repl --no-default-features --features "bundled-cmake,icu" -- -c "from duckdb_extensions() where extension_name = 'icu';")
133+
echo "$output"
134+
grep -qF "STATICALLY_LINKED" <<<"$output"
135+
grep -qF "(BUILT-IN)" <<<"$output"
136+
- name: ICU functional test
137+
run: cargo test -p duckdb --no-default-features --features "bundled-cmake,icu" --lib extension::test::test_extension_icu -- --exact
138+
107139
msrv:
108140
name: MSRV Check
109141
runs-on: ubuntu-latest

Cargo.lock

Lines changed: 20 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 42 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -98,10 +98,22 @@ The `duckdb` crate provides a number of Cargo features that can be enabled to ad
9898
- `vscalar` - Create custom scalar functions that operate on individual values or rows.
9999
- `vscalar-arrow` - Arrow-optimized scalar functions for vectorized operations.
100100

101+
### File formats
102+
103+
- `json` - Enables reading and writing JSON files. Implies `bundled`.
104+
- `parquet` - Enables reading and writing Parquet files. Implies `bundled`.
105+
106+
### Bundled DuckDB extensions
107+
108+
These extensions are only available through the CMake build backend and imply `bundled-cmake`.
109+
110+
- `autocomplete` - DuckDB's autocomplete extension.
111+
- `icu` - DuckDB's ICU extension for locale-aware operations.
112+
- `tpch` - DuckDB's TPC-H benchmark extension.
113+
- `tpcds` - DuckDB's TPC-DS benchmark extension.
114+
101115
### Data integration
102116

103-
- `json` - Enables reading and writing JSON files. Requires `bundled`.
104-
- `parquet` - Enables reading and writing Parquet files. Requires `bundled`.
105117
- `appender-arrow` - Efficient bulk insertion of Arrow data into DuckDB tables.
106118
- `polars` - Integration with Polars DataFrames.
107119

@@ -114,6 +126,7 @@ The `duckdb` crate provides a number of Cargo features that can be enabled to ad
114126
### Build configuration
115127

116128
- `bundled` - Uses a bundled version of DuckDB's source code and compiles it during build. This is the simplest way to get started and avoids needing DuckDB system libraries.
129+
- `bundled-cmake` - *Experimental*. Builds DuckDB via its upstream CMake build system instead of `cc`. Requires a duckdb-rs checkout (not available from crates.io). See [step 2](#notes-on-building-duckdb-and-libduckdb-sys) below for details.
117130
- `buildtime_bindgen` - Use bindgen at build time to generate fresh bindings instead of using pre-generated ones.
118131
- `loadable-extension` - _Experimental_ support for creating loadable DuckDB extensions. Includes procedural macros for extension development.
119132

@@ -178,7 +191,27 @@ You can adjust this behavior in a number of ways:
178191
duckdb = { version = "1.10501.0", features = ["bundled"] }
179192
```
180193

181-
2. When linking against a DuckDB library already on the system (so _not_ using any of the `bundled` features), you can set the `DUCKDB_LIB_DIR` environment variable to point to a directory containing the library. You can also set the `DUCKDB_INCLUDE_DIR` variable to point to the directory containing `duckdb.h`.
194+
2. If you use the `bundled-cmake` feature, `libduckdb-sys` will build DuckDB from the local checkout in `crates/libduckdb-sys/duckdb-sources` using upstream CMake. This keeps plain `bundled` unchanged while allowing CMake-only extensions such as `icu`.
195+
196+
Example:
197+
198+
```toml
199+
[dependencies]
200+
duckdb = { git = "https://github.com/duckdb/duckdb-rs", branch = "main", features = ["bundled-cmake", "icu"] }
201+
```
202+
203+
Notes:
204+
205+
- `bundled-cmake` is *experimental* and requires a git/workspace checkout. It is not available from crates.io because the full `duckdb-sources` tree is not packaged there.
206+
- `bundled-cmake` implies `bundled` (for conditional-compilation gates) but replaces the `cc` build backend with CMake. Enabling any CMake-only extension feature (e.g. `icu`) automatically activates `bundled-cmake`.
207+
- `bundled-cmake` always links DuckDB's default static extensions (`core_functions` and `parquet`), so it also implies the `parquet` Cargo feature.
208+
- Extension autoload/autoinstall are forced on to match the existing `bundled` backend, even though upstream CMake defaults are off.
209+
- When `ninja` is on `PATH`, the Ninja generator is preferred automatically. Set `CMAKE_GENERATOR` to override.
210+
- Builds DuckDB in `Release` mode by default, even in Rust debug builds, to avoid DuckDB's much slower debug/sanitizer profile. Set `DUCKDB_CMAKE_BUILD_TYPE` or `CMAKE_BUILD_TYPE` to override. `DUCKDB_CMAKE_BUILD_TYPE` takes precedence.
211+
- `DUCKDB_EXTENSION_CONFIGS` is not yet supported; the build fails fast rather than producing a broken binary.
212+
- Use `cargo build -vv -F bundled-cmake` to surface CMake configure/build logs.
213+
214+
3. When linking against a DuckDB library already on the system (so _not_ using any of the `bundled` features), you can set the `DUCKDB_LIB_DIR` environment variable to point to a directory containing the library. You can also set the `DUCKDB_INCLUDE_DIR` variable to point to the directory containing `duckdb.h`.
182215

183216
Linux example:
184217

@@ -206,29 +239,31 @@ You can adjust this behavior in a number of ways:
206239
cargo build --examples
207240
```
208241

209-
3. Setting `DUCKDB_DOWNLOAD_LIB=1` makes the build script download pre-built DuckDB binaries from GitHub Releases. This always links against the dynamic library in the archive (setting `DUCKDB_STATIC` has no effect), and it effectively automates the manual steps above. The archives are cached in `target/duckdb-download/<target>/<version>` and that directory is automatically added to the linker search path. The downloaded version always matches the DuckDB version encoded in the `libduckdb-sys` crate version.
242+
4. Setting `DUCKDB_DOWNLOAD_LIB=1` makes the build script download pre-built DuckDB binaries from GitHub Releases. This always links against the dynamic library in the archive (setting `DUCKDB_STATIC` has no effect), and it effectively automates the manual steps above. The archives are cached in `target/duckdb-download/<target>/<version>` and that directory is automatically added to the linker search path. The downloaded version always matches the DuckDB version encoded in the `libduckdb-sys` crate version.
210243

211244
```shell
212245
DUCKDB_DOWNLOAD_LIB=1 cargo test
213246
```
214247

215-
4. Installing the DuckDB development packages will usually be all that is required, but
248+
5. Installing the DuckDB development packages will usually be all that is required, but
216249
the build helpers for [pkg-config](https://github.com/alexcrichton/pkg-config-rs)
217250
and [vcpkg](https://github.com/mcgoo/vcpkg-rs) have some additional configuration
218251
options. The default when using vcpkg is to dynamically link,
219252
which must be enabled by setting `VCPKGRS_DYNAMIC=1` environment variable before build.
220253

221254
When none of the options above are used, the build script falls back to this discovery path and will emit the appropriate `cargo:rustc-link-lib` directives if DuckDB is found on your system.
222255

223-
### ICU extension and the bundled feature
256+
### ICU extension and the bundled features
224257

225258
When using the `bundled` feature, the ICU extension is not included due to crates.io's 10MB package size limit. This means some date/time operations (like `now() - interval '1 day'` or `ts::date` casts) will fail. You can load ICU at runtime:
226259

227260
```rust,ignore
228261
conn.execute_batch("INSTALL icu; LOAD icu;")?;
229262
```
230263

231-
Alternatively, link against libduckdb without the `bundled` feature (see build instructions above). The ICU extension will be built-in and pre-loaded in that case.
264+
Alternatively, link against a system libduckdb that was compiled with ICU (see build instructions above).
265+
266+
If you are working from a duckdb-rs checkout, you can also use `bundled-cmake,icu` to compile ICU in through DuckDB's CMake build.
232267

233268
### Binding generation
234269

crates/duckdb/Cargo.toml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,14 @@ name = "duckdb"
1919
[features]
2020
default = []
2121
bundled = ["libduckdb-sys/bundled"]
22+
# Warning: experimental feature
23+
bundled-cmake = ["bundled", "libduckdb-sys/bundled-cmake", "parquet"]
2224
json = ["libduckdb-sys/json", "bundled"]
2325
parquet = ["libduckdb-sys/parquet", "bundled"]
26+
autocomplete = ["libduckdb-sys/autocomplete", "bundled-cmake"]
27+
icu = ["libduckdb-sys/icu", "bundled-cmake"]
28+
tpcds = ["libduckdb-sys/tpcds", "bundled-cmake"]
29+
tpch = ["libduckdb-sys/tpch", "bundled-cmake"]
2430
vscalar = ["vtab-arrow"]
2531
vscalar-arrow = []
2632
vtab = []

crates/duckdb/src/extension.rs

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22
mod test {
33
use crate::{Connection, Result};
44

5-
// https://duckdb.org/docs/extensions/json
5+
// https://duckdb.org/docs/current/data/json/overview
6+
#[cfg(feature = "json")]
67
#[test]
78
fn test_extension_json() -> Result<()> {
89
let db = Connection::open_in_memory()?;
@@ -17,7 +18,8 @@ mod test {
1718
Ok(())
1819
}
1920

20-
// https://duckdb.org/docs/data/parquet/overview.html
21+
// https://duckdb.org/docs/current/data/parquet/overview
22+
#[cfg(feature = "parquet")]
2123
#[test]
2224
fn test_extension_parquet() -> Result<()> {
2325
let db = Connection::open_in_memory()?;
@@ -32,6 +34,24 @@ mod test {
3234
Ok(())
3335
}
3436

37+
// https://duckdb.org/docs/current/core_extensions/icu
38+
#[cfg(feature = "icu")]
39+
#[test]
40+
fn test_extension_icu() -> Result<()> {
41+
let db = Connection::open_in_memory()?;
42+
assert_eq!(
43+
1i64,
44+
db.query_row::<i64, _, _>(
45+
"SELECT count(*) FROM icu_calendar_names() WHERE name = 'gregorian';",
46+
[],
47+
|r| r.get(0)
48+
)?
49+
);
50+
assert!(db.query_row::<bool, _, _>("SELECT length(icu_sort_key('Ş', 'ro')) > 0;", [], |r| r.get(0))?);
51+
Ok(())
52+
}
53+
54+
#[cfg(feature = "parquet")]
3555
#[test]
3656
fn test_extension_remote_parquet() -> Result<()> {
3757
let db = Connection::open_in_memory()?;

crates/duckdb/src/lib.rs

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,14 @@ mod row;
120120
mod statement;
121121
mod transaction;
122122

123-
#[cfg(feature = "extensions-full")]
123+
#[cfg(any(
124+
feature = "autocomplete",
125+
feature = "icu",
126+
feature = "json",
127+
feature = "parquet",
128+
feature = "tpcds",
129+
feature = "tpch"
130+
))]
124131
mod extension;
125132

126133
pub mod profiling;

crates/libduckdb-sys/Cargo.toml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,15 @@ rust-version = { workspace = true }
1717
[features]
1818
default = ["vcpkg", "pkg-config"]
1919
bundled = ["cc"]
20+
# Warning: experimental feature
21+
bundled-cmake = ["bundled", "dep:cmake", "dep:which", "parquet"]
2022
buildtime_bindgen = ["bindgen", "pkg-config", "vcpkg"]
2123
json = ["bundled"]
2224
parquet = ["bundled"]
25+
autocomplete = ["bundled-cmake"]
26+
icu = ["bundled-cmake"]
27+
tpcds = ["bundled-cmake"]
28+
tpch = ["bundled-cmake"]
2329
extensions-full = ["json", "parquet"]
2430
winduckdb = []
2531
# Warning: experimental feature
@@ -28,6 +34,7 @@ loadable-extension = ["prettyplease", "quote", "syn"]
2834
[build-dependencies]
2935
bindgen = { workspace = true, features = ["runtime"], optional = true }
3036
cc = { workspace = true, features = ["parallel"], optional = true }
37+
cmake = { version = "0.1", optional = true }
3138
flate2 = { workspace = true }
3239
pkg-config = { workspace = true, optional = true }
3340
prettyplease = { workspace = true, optional = true }
@@ -41,6 +48,7 @@ serde_json = { workspace = true }
4148
syn = { workspace = true, optional = true }
4249
tar = { workspace = true }
4350
vcpkg = { workspace = true, optional = true }
51+
which = { version = "8", optional = true }
4452
zip = { version = "6", default-features = false, features = ["deflate"] }
4553

4654
[dev-dependencies]

0 commit comments

Comments
 (0)