-
Notifications
You must be signed in to change notification settings - Fork 16
docs: add DuckLake catalog connector recipe #359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+268
−0
Merged
Changes from 3 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
448f0a8
docs: add DuckLake catalog connector recipe
lukekim f032e47
docs: update note about DuckLake connector availability in Spice v2.0…
lukekim 6be8204
Merge branch 'trunk' into lukim/ducklake-recipe
lukekim 8efa1d1
docs: fix DuckLake recipe dbgen compatibility and add version require…
lukekim 7499635
Merge branch 'trunk' into lukim/ducklake-recipe
lukekim File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,251 @@ | ||
| # DuckLake Catalog Connector | ||
|
|
||
| > **Note:** The DuckLake connector is available in Spice v2.0 or later. | ||
|
|
||
| The DuckLake Catalog Connector enables Spice to automatically discover and query all schemas and tables in a [DuckLake](https://ducklake.select/) catalog — an open lakehouse format that stores metadata in a SQLite-compatible database and data in Parquet files. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - [DuckDB CLI](https://duckdb.org/docs/installation/) installed (to create a DuckLake catalog). | ||
| - Spice is installed (see the [Getting Started](https://docs.spiceai.org/getting-started) documentation). | ||
|
|
||
| ## Step 1. Create a new directory and initialize a Spicepod | ||
|
|
||
| ```bash | ||
| mkdir ducklake-catalog-recipe | ||
| cd ducklake-catalog-recipe | ||
| spice init | ||
| ``` | ||
|
|
||
| ## Step 2. Create a DuckLake catalog with sample data | ||
|
|
||
| Open DuckDB and create a DuckLake catalog with TPC-H sample data: | ||
|
|
||
| ```bash | ||
| duckdb | ||
| ``` | ||
|
|
||
| Install and load the DuckLake and TPC-H extensions, then create a catalog and populate it: | ||
|
|
||
| ```sql | ||
| INSTALL ducklake; | ||
| LOAD ducklake; | ||
| INSTALL tpch; | ||
| LOAD tpch; | ||
|
|
||
| -- Create a DuckLake catalog with local metadata storage | ||
| ATTACH 'ducklake:metadata.ducklake' AS my_lakehouse; | ||
|
|
||
| -- Generate TPC-H data (scale factor 0.01 for a quick demo) | ||
| CALL dbgen(sf = 0.01, catalog = 'my_lakehouse'); | ||
| ``` | ||
|
|
||
| Verify the tables were created: | ||
|
|
||
| ```sql | ||
| SHOW ALL TABLES; | ||
| ``` | ||
|
|
||
| ```text | ||
| ┌──────────────┬─────────┬──────────┬──────────────────┬───────────────────────────────────────────────────────────────┬───────────┐ | ||
| │ database │ schema │ name │ column_names │ column_types │ temporary │ | ||
| │ varchar │ varchar │ varchar │ varchar[] │ varchar[] │ boolean │ | ||
| ├──────────────┼─────────┼──────────┼──────────────────┼───────────────────────────────────────────────────────────────┤───────────┤ | ||
| │ my_lakehouse │ main │ customer │ [c_custkey, ...] │ [INTEGER, VARCHAR, ...] │ false │ | ||
| │ my_lakehouse │ main │ lineitem │ [l_orderkey, ...] │ [INTEGER, INTEGER, ...] │ false │ | ||
| │ my_lakehouse │ main │ nation │ [n_nationkey, ...]│ [INTEGER, VARCHAR, ...] │ false │ | ||
| │ my_lakehouse │ main │ orders │ [o_orderkey, ...] │ [INTEGER, INTEGER, ...] │ false │ | ||
| │ my_lakehouse │ main │ part │ [p_partkey, ...] │ [INTEGER, VARCHAR, ...] │ false │ | ||
| │ my_lakehouse │ main │ partsupp │ [ps_partkey, ...] │ [INTEGER, INTEGER, ...] │ false │ | ||
| │ my_lakehouse │ main │ region │ [r_regionkey, ...]│ [INTEGER, VARCHAR, ...] │ false │ | ||
| │ my_lakehouse │ main │ supplier │ [s_suppkey, ...] │ [INTEGER, VARCHAR, ...] │ false │ | ||
| └──────────────┴─────────┴──────────┴──────────────────┴───────────────────────────────────────────────────────────────┴───────────┘ | ||
| ``` | ||
|
|
||
| Exit DuckDB: | ||
|
|
||
| ```sql | ||
| .exit | ||
| ``` | ||
|
|
||
| ## Step 3. Configure the DuckLake Catalog Connector in your Spicepod | ||
|
|
||
| Edit `spicepod.yaml` to add the DuckLake catalog: | ||
|
|
||
| ```yaml | ||
| version: v1 | ||
| kind: Spicepod | ||
| name: ducklake-catalog-recipe | ||
|
|
||
| catalogs: | ||
| - from: ducklake:metadata.ducklake | ||
| name: my_lakehouse | ||
| ``` | ||
|
|
||
| ## Step 4. Start the Spice runtime | ||
|
|
||
| ```bash | ||
| spice run | ||
| ``` | ||
|
|
||
| Observe that Spice discovers all schemas and tables: | ||
|
lukekim marked this conversation as resolved.
|
||
|
|
||
| ```bash | ||
| 2026-03-02T10:00:00.000000Z INFO runtime::init::catalog: Registering catalog 'my_lakehouse' for ducklake | ||
| 2026-03-02T10:00:00.500000Z INFO runtime::init::catalog: Registered catalog 'my_lakehouse' with 1 schema and 8 tables | ||
| ``` | ||
|
|
||
| ## Step 5. Query the DuckLake catalog | ||
|
|
||
| In a new terminal, start the Spice SQL REPL: | ||
|
|
||
| ```bash | ||
| spice sql | ||
| ``` | ||
|
|
||
| List all discovered tables: | ||
|
|
||
| ```sql | ||
| SHOW TABLES; | ||
| ``` | ||
|
|
||
| ```text | ||
| +---------------+--------------+------------+------------+ | ||
| | table_catalog | table_schema | table_name | table_type | | ||
| +---------------+--------------+------------+------------+ | ||
| | my_lakehouse | main | customer | BASE TABLE | | ||
| | my_lakehouse | main | lineitem | BASE TABLE | | ||
| | my_lakehouse | main | nation | BASE TABLE | | ||
| | my_lakehouse | main | orders | BASE TABLE | | ||
| | my_lakehouse | main | part | BASE TABLE | | ||
| | my_lakehouse | main | partsupp | BASE TABLE | | ||
| | my_lakehouse | main | region | BASE TABLE | | ||
| | my_lakehouse | main | supplier | BASE TABLE | | ||
| | spice | runtime | task_history | BASE TABLE | | ||
| | spice | runtime | metrics | BASE TABLE | | ||
| +---------------+--------------+------------+------------+ | ||
| ``` | ||
|
|
||
| Query the customer table: | ||
|
|
||
| ```sql | ||
| SELECT c_custkey, c_name, c_mktsegment, c_acctbal | ||
| FROM my_lakehouse.main.customer | ||
| LIMIT 5; | ||
| ``` | ||
|
|
||
| ```text | ||
| +-----------+--------------------+--------------+-----------+ | ||
| | c_custkey | c_name | c_mktsegment | c_acctbal | | ||
| +-----------+--------------------+--------------+-----------+ | ||
| | 1 | Customer#000000001 | BUILDING | 711.56 | | ||
| | 2 | Customer#000000002 | AUTOMOBILE | 121.65 | | ||
| | 3 | Customer#000000003 | AUTOMOBILE | 7498.12 | | ||
| | 4 | Customer#000000004 | MACHINERY | 2866.83 | | ||
| | 5 | Customer#000000005 | HOUSEHOLD | 794.47 | | ||
| +-----------+--------------------+--------------+-----------+ | ||
| ``` | ||
|
|
||
| Run a cross-table query: | ||
|
|
||
| ```sql | ||
| SELECT n.n_name AS nation, COUNT(*) AS num_customers, ROUND(AVG(c.c_acctbal), 2) AS avg_balance | ||
| FROM my_lakehouse.main.customer c | ||
| JOIN my_lakehouse.main.nation n ON c.c_nationkey = n.n_nationkey | ||
| GROUP BY n.n_name | ||
| ORDER BY num_customers DESC | ||
| LIMIT 5; | ||
| ``` | ||
|
|
||
| ## Step 6. Enable read-write access (optional) | ||
|
|
||
| To enable write operations, update the catalog configuration with `access: read_write`: | ||
|
|
||
| ```yaml | ||
| version: v1 | ||
| kind: Spicepod | ||
| name: ducklake-catalog-recipe | ||
|
|
||
| catalogs: | ||
| - from: ducklake:metadata.ducklake | ||
| name: my_lakehouse | ||
| access: read_write | ||
| ``` | ||
|
|
||
| Restart Spice and insert data: | ||
|
|
||
| ```bash | ||
| spice run | ||
| ``` | ||
|
|
||
| ```bash | ||
| spice sql | ||
| ``` | ||
|
|
||
| ```sql | ||
| INSERT INTO my_lakehouse.main.region (r_regionkey, r_name, r_comment) | ||
| VALUES (5, 'ANTARCTICA', 'A cold and remote region'); | ||
| ``` | ||
|
|
||
| ```text | ||
| +-------+ | ||
| | count | | ||
| +-------+ | ||
| | 1 | | ||
| +-------+ | ||
| ``` | ||
|
|
||
| Verify the insert: | ||
|
|
||
| ```sql | ||
| SELECT * FROM my_lakehouse.main.region ORDER BY r_regionkey; | ||
| ``` | ||
|
|
||
| ## Using the DuckLake Data Connector | ||
|
|
||
| Instead of the catalog connector (which auto-discovers all tables), you can connect to specific tables using the DuckLake data connector: | ||
|
|
||
| ```yaml | ||
| version: v1 | ||
| kind: Spicepod | ||
| name: ducklake-data-connector-recipe | ||
|
|
||
| datasets: | ||
| - from: ducklake:customer | ||
| name: customer | ||
| params: | ||
| connection_string: metadata.ducklake | ||
| - from: ducklake:orders | ||
| name: orders | ||
| params: | ||
| connection_string: metadata.ducklake | ||
| ``` | ||
|
|
||
| This is useful when you only need specific tables or want to configure each dataset independently (e.g., with different acceleration settings). | ||
|
|
||
| ## Using with Cloud Storage (S3) | ||
|
|
||
| DuckLake supports storing metadata and data on cloud storage. To use S3: | ||
|
|
||
| 1. Ensure AWS credentials are available via environment variables, `~/.aws/credentials`, or an IAM instance profile. | ||
|
|
||
| 2. Create a DuckLake catalog on S3 (via DuckDB CLI): | ||
|
|
||
| ```sql | ||
| ATTACH 'ducklake:s3://my-bucket/lakehouse/metadata.ducklake' AS cloud_lakehouse; | ||
| ``` | ||
|
|
||
| 3. Configure the Spice catalog: | ||
|
|
||
| ```yaml | ||
| catalogs: | ||
| - from: ducklake:s3://my-bucket/lakehouse/metadata.ducklake | ||
| name: cloud_lakehouse | ||
| ``` | ||
|
|
||
| ## Learn more | ||
|
|
||
| - [DuckLake website](https://ducklake.select/) | ||
| - [DuckLake Catalog Connector documentation](https://spiceai.org/docs/components/catalogs/ducklake) | ||
| - [DuckLake Data Connector documentation](https://spiceai.org/docs/components/data-connectors/ducklake) | ||
| - For using `spice sql`, see the [CLI reference](https://docs.spiceai.org/cli/reference/sql). | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| version: v1 | ||
| kind: Spicepod | ||
| name: ducklake-catalog-recipe | ||
|
|
||
| catalogs: | ||
| - from: ducklake:metadata.ducklake | ||
| name: my_lakehouse |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.