Skip to content

Commit 43db729

Browse files
authored
docs: add DuckLake catalog connector recipe (#359)
* docs: add DuckLake catalog connector recipe * docs: update note about DuckLake connector availability in Spice v2.0 or later * docs: fix DuckLake recipe dbgen compatibility and add version requirements dbgen does not support generating data directly into DuckLake catalogs. Generate TPC-H data in-memory first, then copy tables into DuckLake. Add DuckDB v1.3.0+ and Spice v2.0+ version requirements to prerequisites.
1 parent feb2385 commit 43db729

2 files changed

Lines changed: 268 additions & 0 deletions

File tree

catalogs/ducklake/README.md

Lines changed: 261 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,261 @@
1+
# DuckLake Catalog Connector
2+
3+
> **Note:** The DuckLake connector is available in Spice v2.0 or later.
4+
5+
The DuckLake Catalog Connector enables Spice to automatically discover and query all schemas and tables in a [DuckLake](https://ducklake.select/) catalog — an open lakehouse format that stores metadata in a SQLite-compatible database and data in Parquet files.
6+
7+
## Prerequisites
8+
9+
- [DuckDB CLI](https://duckdb.org/docs/installation/) v1.3.0 or later installed (to create a DuckLake catalog).
10+
- Spice v2.0 or later is installed (see the [Getting Started](https://docs.spiceai.org/getting-started) documentation).
11+
12+
## Step 1. Create a new directory and initialize a Spicepod
13+
14+
```bash
15+
mkdir ducklake-catalog-recipe
16+
cd ducklake-catalog-recipe
17+
spice init
18+
```
19+
20+
## Step 2. Create a DuckLake catalog with sample data
21+
22+
Open DuckDB and create a DuckLake catalog with TPC-H sample data:
23+
24+
```bash
25+
duckdb
26+
```
27+
28+
Install and load the DuckLake and TPC-H extensions, then create a catalog and populate it:
29+
30+
```sql
31+
INSTALL ducklake;
32+
LOAD ducklake;
33+
INSTALL tpch;
34+
LOAD tpch;
35+
36+
-- Generate TPC-H data in-memory (scale factor 0.01 for a quick demo)
37+
CALL dbgen(sf = 0.01);
38+
39+
-- Create a DuckLake catalog with local metadata storage
40+
ATTACH 'ducklake:metadata.ducklake' AS my_lakehouse;
41+
42+
-- Copy tables into DuckLake
43+
CREATE TABLE my_lakehouse.main.customer AS SELECT * FROM customer;
44+
CREATE TABLE my_lakehouse.main.lineitem AS SELECT * FROM lineitem;
45+
CREATE TABLE my_lakehouse.main.nation AS SELECT * FROM nation;
46+
CREATE TABLE my_lakehouse.main.orders AS SELECT * FROM orders;
47+
CREATE TABLE my_lakehouse.main.part AS SELECT * FROM part;
48+
CREATE TABLE my_lakehouse.main.partsupp AS SELECT * FROM partsupp;
49+
CREATE TABLE my_lakehouse.main.region AS SELECT * FROM region;
50+
CREATE TABLE my_lakehouse.main.supplier AS SELECT * FROM supplier;
51+
```
52+
53+
Verify the tables were created:
54+
55+
```sql
56+
SHOW ALL TABLES;
57+
```
58+
59+
```text
60+
┌──────────────┬─────────┬──────────┬──────────────────┬───────────────────────────────────────────────────────────────┬───────────┐
61+
│ database │ schema │ name │ column_names │ column_types │ temporary │
62+
│ varchar │ varchar │ varchar │ varchar[] │ varchar[] │ boolean │
63+
├──────────────┼─────────┼──────────┼──────────────────┼───────────────────────────────────────────────────────────────┤───────────┤
64+
│ my_lakehouse │ main │ customer │ [c_custkey, ...] │ [INTEGER, VARCHAR, ...] │ false │
65+
│ my_lakehouse │ main │ lineitem │ [l_orderkey, ...] │ [INTEGER, INTEGER, ...] │ false │
66+
│ my_lakehouse │ main │ nation │ [n_nationkey, ...]│ [INTEGER, VARCHAR, ...] │ false │
67+
│ my_lakehouse │ main │ orders │ [o_orderkey, ...] │ [INTEGER, INTEGER, ...] │ false │
68+
│ my_lakehouse │ main │ part │ [p_partkey, ...] │ [INTEGER, VARCHAR, ...] │ false │
69+
│ my_lakehouse │ main │ partsupp │ [ps_partkey, ...] │ [INTEGER, INTEGER, ...] │ false │
70+
│ my_lakehouse │ main │ region │ [r_regionkey, ...]│ [INTEGER, VARCHAR, ...] │ false │
71+
│ my_lakehouse │ main │ supplier │ [s_suppkey, ...] │ [INTEGER, VARCHAR, ...] │ false │
72+
└──────────────┴─────────┴──────────┴──────────────────┴───────────────────────────────────────────────────────────────┴───────────┘
73+
```
74+
75+
Exit DuckDB:
76+
77+
```sql
78+
.exit
79+
```
80+
81+
## Step 3. Configure the DuckLake Catalog Connector in your Spicepod
82+
83+
Edit `spicepod.yaml` to add the DuckLake catalog:
84+
85+
```yaml
86+
version: v1
87+
kind: Spicepod
88+
name: ducklake-catalog-recipe
89+
90+
catalogs:
91+
- from: ducklake:metadata.ducklake
92+
name: my_lakehouse
93+
```
94+
95+
## Step 4. Start the Spice runtime
96+
97+
```bash
98+
spice run
99+
```
100+
101+
Observe that Spice discovers all schemas and tables:
102+
103+
```bash
104+
2026-03-02T10:00:00.000000Z INFO runtime::init::catalog: Registering catalog 'my_lakehouse' for ducklake
105+
2026-03-02T10:00:00.500000Z INFO runtime::init::catalog: Registered catalog 'my_lakehouse' with 1 schema and 8 tables
106+
```
107+
108+
## Step 5. Query the DuckLake catalog
109+
110+
In a new terminal, start the Spice SQL REPL:
111+
112+
```bash
113+
spice sql
114+
```
115+
116+
List all discovered tables:
117+
118+
```sql
119+
SHOW TABLES;
120+
```
121+
122+
```text
123+
+---------------+--------------+------------+------------+
124+
| table_catalog | table_schema | table_name | table_type |
125+
+---------------+--------------+------------+------------+
126+
| my_lakehouse | main | customer | BASE TABLE |
127+
| my_lakehouse | main | lineitem | BASE TABLE |
128+
| my_lakehouse | main | nation | BASE TABLE |
129+
| my_lakehouse | main | orders | BASE TABLE |
130+
| my_lakehouse | main | part | BASE TABLE |
131+
| my_lakehouse | main | partsupp | BASE TABLE |
132+
| my_lakehouse | main | region | BASE TABLE |
133+
| my_lakehouse | main | supplier | BASE TABLE |
134+
| spice | runtime | task_history | BASE TABLE |
135+
| spice | runtime | metrics | BASE TABLE |
136+
+---------------+--------------+------------+------------+
137+
```
138+
139+
Query the customer table:
140+
141+
```sql
142+
SELECT c_custkey, c_name, c_mktsegment, c_acctbal
143+
FROM my_lakehouse.main.customer
144+
LIMIT 5;
145+
```
146+
147+
```text
148+
+-----------+--------------------+--------------+-----------+
149+
| c_custkey | c_name | c_mktsegment | c_acctbal |
150+
+-----------+--------------------+--------------+-----------+
151+
| 1 | Customer#000000001 | BUILDING | 711.56 |
152+
| 2 | Customer#000000002 | AUTOMOBILE | 121.65 |
153+
| 3 | Customer#000000003 | AUTOMOBILE | 7498.12 |
154+
| 4 | Customer#000000004 | MACHINERY | 2866.83 |
155+
| 5 | Customer#000000005 | HOUSEHOLD | 794.47 |
156+
+-----------+--------------------+--------------+-----------+
157+
```
158+
159+
Run a cross-table query:
160+
161+
```sql
162+
SELECT n.n_name AS nation, COUNT(*) AS num_customers, ROUND(AVG(c.c_acctbal), 2) AS avg_balance
163+
FROM my_lakehouse.main.customer c
164+
JOIN my_lakehouse.main.nation n ON c.c_nationkey = n.n_nationkey
165+
GROUP BY n.n_name
166+
ORDER BY num_customers DESC
167+
LIMIT 5;
168+
```
169+
170+
## Step 6. Enable read-write access (optional)
171+
172+
To enable write operations, update the catalog configuration with `access: read_write`:
173+
174+
```yaml
175+
version: v1
176+
kind: Spicepod
177+
name: ducklake-catalog-recipe
178+
179+
catalogs:
180+
- from: ducklake:metadata.ducklake
181+
name: my_lakehouse
182+
access: read_write
183+
```
184+
185+
Restart Spice and insert data:
186+
187+
```bash
188+
spice run
189+
```
190+
191+
```bash
192+
spice sql
193+
```
194+
195+
```sql
196+
INSERT INTO my_lakehouse.main.region (r_regionkey, r_name, r_comment)
197+
VALUES (5, 'ANTARCTICA', 'A cold and remote region');
198+
```
199+
200+
```text
201+
+-------+
202+
| count |
203+
+-------+
204+
| 1 |
205+
+-------+
206+
```
207+
208+
Verify the insert:
209+
210+
```sql
211+
SELECT * FROM my_lakehouse.main.region ORDER BY r_regionkey;
212+
```
213+
214+
## Using the DuckLake Data Connector
215+
216+
Instead of the catalog connector (which auto-discovers all tables), you can connect to specific tables using the DuckLake data connector:
217+
218+
```yaml
219+
version: v1
220+
kind: Spicepod
221+
name: ducklake-data-connector-recipe
222+
223+
datasets:
224+
- from: ducklake:customer
225+
name: customer
226+
params:
227+
connection_string: metadata.ducklake
228+
- from: ducklake:orders
229+
name: orders
230+
params:
231+
connection_string: metadata.ducklake
232+
```
233+
234+
This is useful when you only need specific tables or want to configure each dataset independently (e.g., with different acceleration settings).
235+
236+
## Using with Cloud Storage (S3)
237+
238+
DuckLake supports storing metadata and data on cloud storage. To use S3:
239+
240+
1. Ensure AWS credentials are available via environment variables, `~/.aws/credentials`, or an IAM instance profile.
241+
242+
2. Create a DuckLake catalog on S3 (via DuckDB CLI):
243+
244+
```sql
245+
ATTACH 'ducklake:s3://my-bucket/lakehouse/metadata.ducklake' AS cloud_lakehouse;
246+
```
247+
248+
3. Configure the Spice catalog:
249+
250+
```yaml
251+
catalogs:
252+
- from: ducklake:s3://my-bucket/lakehouse/metadata.ducklake
253+
name: cloud_lakehouse
254+
```
255+
256+
## Learn more
257+
258+
- [DuckLake website](https://ducklake.select/)
259+
- [DuckLake Catalog Connector documentation](https://spiceai.org/docs/components/catalogs/ducklake)
260+
- [DuckLake Data Connector documentation](https://spiceai.org/docs/components/data-connectors/ducklake)
261+
- For using `spice sql`, see the [CLI reference](https://docs.spiceai.org/cli/reference/sql).

catalogs/ducklake/spicepod.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
version: v1
2+
kind: Spicepod
3+
name: ducklake-catalog-recipe
4+
5+
catalogs:
6+
- from: ducklake:metadata.ducklake
7+
name: my_lakehouse

0 commit comments

Comments
 (0)