|
| 1 | +# DuckLake Catalog Connector |
| 2 | + |
| 3 | +> **Note:** The DuckLake connector is available in Spice v2.0 or later. |
| 4 | +
|
| 5 | +The DuckLake Catalog Connector enables Spice to automatically discover and query all schemas and tables in a [DuckLake](https://ducklake.select/) catalog — an open lakehouse format that stores metadata in a SQLite-compatible database and data in Parquet files. |
| 6 | + |
| 7 | +## Prerequisites |
| 8 | + |
| 9 | +- [DuckDB CLI](https://duckdb.org/docs/installation/) v1.3.0 or later installed (to create a DuckLake catalog). |
| 10 | +- Spice v2.0 or later is installed (see the [Getting Started](https://docs.spiceai.org/getting-started) documentation). |
| 11 | + |
| 12 | +## Step 1. Create a new directory and initialize a Spicepod |
| 13 | + |
| 14 | +```bash |
| 15 | +mkdir ducklake-catalog-recipe |
| 16 | +cd ducklake-catalog-recipe |
| 17 | +spice init |
| 18 | +``` |
| 19 | + |
| 20 | +## Step 2. Create a DuckLake catalog with sample data |
| 21 | + |
| 22 | +Open DuckDB and create a DuckLake catalog with TPC-H sample data: |
| 23 | + |
| 24 | +```bash |
| 25 | +duckdb |
| 26 | +``` |
| 27 | + |
| 28 | +Install and load the DuckLake and TPC-H extensions, then create a catalog and populate it: |
| 29 | + |
| 30 | +```sql |
| 31 | +INSTALL ducklake; |
| 32 | +LOAD ducklake; |
| 33 | +INSTALL tpch; |
| 34 | +LOAD tpch; |
| 35 | + |
| 36 | +-- Generate TPC-H data in-memory (scale factor 0.01 for a quick demo) |
| 37 | +CALL dbgen(sf = 0.01); |
| 38 | + |
| 39 | +-- Create a DuckLake catalog with local metadata storage |
| 40 | +ATTACH 'ducklake:metadata.ducklake' AS my_lakehouse; |
| 41 | + |
| 42 | +-- Copy tables into DuckLake |
| 43 | +CREATE TABLE my_lakehouse.main.customer AS SELECT * FROM customer; |
| 44 | +CREATE TABLE my_lakehouse.main.lineitem AS SELECT * FROM lineitem; |
| 45 | +CREATE TABLE my_lakehouse.main.nation AS SELECT * FROM nation; |
| 46 | +CREATE TABLE my_lakehouse.main.orders AS SELECT * FROM orders; |
| 47 | +CREATE TABLE my_lakehouse.main.part AS SELECT * FROM part; |
| 48 | +CREATE TABLE my_lakehouse.main.partsupp AS SELECT * FROM partsupp; |
| 49 | +CREATE TABLE my_lakehouse.main.region AS SELECT * FROM region; |
| 50 | +CREATE TABLE my_lakehouse.main.supplier AS SELECT * FROM supplier; |
| 51 | +``` |
| 52 | + |
| 53 | +Verify the tables were created: |
| 54 | + |
| 55 | +```sql |
| 56 | +SHOW ALL TABLES; |
| 57 | +``` |
| 58 | + |
| 59 | +```text |
| 60 | +┌──────────────┬─────────┬──────────┬──────────────────┬───────────────────────────────────────────────────────────────┬───────────┐ |
| 61 | +│ database │ schema │ name │ column_names │ column_types │ temporary │ |
| 62 | +│ varchar │ varchar │ varchar │ varchar[] │ varchar[] │ boolean │ |
| 63 | +├──────────────┼─────────┼──────────┼──────────────────┼───────────────────────────────────────────────────────────────┤───────────┤ |
| 64 | +│ my_lakehouse │ main │ customer │ [c_custkey, ...] │ [INTEGER, VARCHAR, ...] │ false │ |
| 65 | +│ my_lakehouse │ main │ lineitem │ [l_orderkey, ...] │ [INTEGER, INTEGER, ...] │ false │ |
| 66 | +│ my_lakehouse │ main │ nation │ [n_nationkey, ...]│ [INTEGER, VARCHAR, ...] │ false │ |
| 67 | +│ my_lakehouse │ main │ orders │ [o_orderkey, ...] │ [INTEGER, INTEGER, ...] │ false │ |
| 68 | +│ my_lakehouse │ main │ part │ [p_partkey, ...] │ [INTEGER, VARCHAR, ...] │ false │ |
| 69 | +│ my_lakehouse │ main │ partsupp │ [ps_partkey, ...] │ [INTEGER, INTEGER, ...] │ false │ |
| 70 | +│ my_lakehouse │ main │ region │ [r_regionkey, ...]│ [INTEGER, VARCHAR, ...] │ false │ |
| 71 | +│ my_lakehouse │ main │ supplier │ [s_suppkey, ...] │ [INTEGER, VARCHAR, ...] │ false │ |
| 72 | +└──────────────┴─────────┴──────────┴──────────────────┴───────────────────────────────────────────────────────────────┴───────────┘ |
| 73 | +``` |
| 74 | + |
| 75 | +Exit DuckDB: |
| 76 | + |
| 77 | +```sql |
| 78 | +.exit |
| 79 | +``` |
| 80 | + |
| 81 | +## Step 3. Configure the DuckLake Catalog Connector in your Spicepod |
| 82 | + |
| 83 | +Edit `spicepod.yaml` to add the DuckLake catalog: |
| 84 | + |
| 85 | +```yaml |
| 86 | +version: v1 |
| 87 | +kind: Spicepod |
| 88 | +name: ducklake-catalog-recipe |
| 89 | + |
| 90 | +catalogs: |
| 91 | + - from: ducklake:metadata.ducklake |
| 92 | + name: my_lakehouse |
| 93 | +``` |
| 94 | +
|
| 95 | +## Step 4. Start the Spice runtime |
| 96 | +
|
| 97 | +```bash |
| 98 | +spice run |
| 99 | +``` |
| 100 | + |
| 101 | +Observe that Spice discovers all schemas and tables: |
| 102 | + |
| 103 | +```bash |
| 104 | +2026-03-02T10:00:00.000000Z INFO runtime::init::catalog: Registering catalog 'my_lakehouse' for ducklake |
| 105 | +2026-03-02T10:00:00.500000Z INFO runtime::init::catalog: Registered catalog 'my_lakehouse' with 1 schema and 8 tables |
| 106 | +``` |
| 107 | + |
| 108 | +## Step 5. Query the DuckLake catalog |
| 109 | + |
| 110 | +In a new terminal, start the Spice SQL REPL: |
| 111 | + |
| 112 | +```bash |
| 113 | +spice sql |
| 114 | +``` |
| 115 | + |
| 116 | +List all discovered tables: |
| 117 | + |
| 118 | +```sql |
| 119 | +SHOW TABLES; |
| 120 | +``` |
| 121 | + |
| 122 | +```text |
| 123 | ++---------------+--------------+------------+------------+ |
| 124 | +| table_catalog | table_schema | table_name | table_type | |
| 125 | ++---------------+--------------+------------+------------+ |
| 126 | +| my_lakehouse | main | customer | BASE TABLE | |
| 127 | +| my_lakehouse | main | lineitem | BASE TABLE | |
| 128 | +| my_lakehouse | main | nation | BASE TABLE | |
| 129 | +| my_lakehouse | main | orders | BASE TABLE | |
| 130 | +| my_lakehouse | main | part | BASE TABLE | |
| 131 | +| my_lakehouse | main | partsupp | BASE TABLE | |
| 132 | +| my_lakehouse | main | region | BASE TABLE | |
| 133 | +| my_lakehouse | main | supplier | BASE TABLE | |
| 134 | +| spice | runtime | task_history | BASE TABLE | |
| 135 | +| spice | runtime | metrics | BASE TABLE | |
| 136 | ++---------------+--------------+------------+------------+ |
| 137 | +``` |
| 138 | + |
| 139 | +Query the customer table: |
| 140 | + |
| 141 | +```sql |
| 142 | +SELECT c_custkey, c_name, c_mktsegment, c_acctbal |
| 143 | +FROM my_lakehouse.main.customer |
| 144 | +LIMIT 5; |
| 145 | +``` |
| 146 | + |
| 147 | +```text |
| 148 | ++-----------+--------------------+--------------+-----------+ |
| 149 | +| c_custkey | c_name | c_mktsegment | c_acctbal | |
| 150 | ++-----------+--------------------+--------------+-----------+ |
| 151 | +| 1 | Customer#000000001 | BUILDING | 711.56 | |
| 152 | +| 2 | Customer#000000002 | AUTOMOBILE | 121.65 | |
| 153 | +| 3 | Customer#000000003 | AUTOMOBILE | 7498.12 | |
| 154 | +| 4 | Customer#000000004 | MACHINERY | 2866.83 | |
| 155 | +| 5 | Customer#000000005 | HOUSEHOLD | 794.47 | |
| 156 | ++-----------+--------------------+--------------+-----------+ |
| 157 | +``` |
| 158 | + |
| 159 | +Run a cross-table query: |
| 160 | + |
| 161 | +```sql |
| 162 | +SELECT n.n_name AS nation, COUNT(*) AS num_customers, ROUND(AVG(c.c_acctbal), 2) AS avg_balance |
| 163 | +FROM my_lakehouse.main.customer c |
| 164 | +JOIN my_lakehouse.main.nation n ON c.c_nationkey = n.n_nationkey |
| 165 | +GROUP BY n.n_name |
| 166 | +ORDER BY num_customers DESC |
| 167 | +LIMIT 5; |
| 168 | +``` |
| 169 | + |
| 170 | +## Step 6. Enable read-write access (optional) |
| 171 | + |
| 172 | +To enable write operations, update the catalog configuration with `access: read_write`: |
| 173 | + |
| 174 | +```yaml |
| 175 | +version: v1 |
| 176 | +kind: Spicepod |
| 177 | +name: ducklake-catalog-recipe |
| 178 | + |
| 179 | +catalogs: |
| 180 | + - from: ducklake:metadata.ducklake |
| 181 | + name: my_lakehouse |
| 182 | + access: read_write |
| 183 | +``` |
| 184 | +
|
| 185 | +Restart Spice and insert data: |
| 186 | +
|
| 187 | +```bash |
| 188 | +spice run |
| 189 | +``` |
| 190 | + |
| 191 | +```bash |
| 192 | +spice sql |
| 193 | +``` |
| 194 | + |
| 195 | +```sql |
| 196 | +INSERT INTO my_lakehouse.main.region (r_regionkey, r_name, r_comment) |
| 197 | +VALUES (5, 'ANTARCTICA', 'A cold and remote region'); |
| 198 | +``` |
| 199 | + |
| 200 | +```text |
| 201 | ++-------+ |
| 202 | +| count | |
| 203 | ++-------+ |
| 204 | +| 1 | |
| 205 | ++-------+ |
| 206 | +``` |
| 207 | + |
| 208 | +Verify the insert: |
| 209 | + |
| 210 | +```sql |
| 211 | +SELECT * FROM my_lakehouse.main.region ORDER BY r_regionkey; |
| 212 | +``` |
| 213 | + |
| 214 | +## Using the DuckLake Data Connector |
| 215 | + |
| 216 | +Instead of the catalog connector (which auto-discovers all tables), you can connect to specific tables using the DuckLake data connector: |
| 217 | + |
| 218 | +```yaml |
| 219 | +version: v1 |
| 220 | +kind: Spicepod |
| 221 | +name: ducklake-data-connector-recipe |
| 222 | + |
| 223 | +datasets: |
| 224 | + - from: ducklake:customer |
| 225 | + name: customer |
| 226 | + params: |
| 227 | + connection_string: metadata.ducklake |
| 228 | + - from: ducklake:orders |
| 229 | + name: orders |
| 230 | + params: |
| 231 | + connection_string: metadata.ducklake |
| 232 | +``` |
| 233 | +
|
| 234 | +This is useful when you only need specific tables or want to configure each dataset independently (e.g., with different acceleration settings). |
| 235 | +
|
| 236 | +## Using with Cloud Storage (S3) |
| 237 | +
|
| 238 | +DuckLake supports storing metadata and data on cloud storage. To use S3: |
| 239 | +
|
| 240 | +1. Ensure AWS credentials are available via environment variables, `~/.aws/credentials`, or an IAM instance profile. |
| 241 | + |
| 242 | +2. Create a DuckLake catalog on S3 (via DuckDB CLI): |
| 243 | + |
| 244 | +```sql |
| 245 | +ATTACH 'ducklake:s3://my-bucket/lakehouse/metadata.ducklake' AS cloud_lakehouse; |
| 246 | +``` |
| 247 | + |
| 248 | +3. Configure the Spice catalog: |
| 249 | + |
| 250 | +```yaml |
| 251 | +catalogs: |
| 252 | + - from: ducklake:s3://my-bucket/lakehouse/metadata.ducklake |
| 253 | + name: cloud_lakehouse |
| 254 | +``` |
| 255 | + |
| 256 | +## Learn more |
| 257 | + |
| 258 | +- [DuckLake website](https://ducklake.select/) |
| 259 | +- [DuckLake Catalog Connector documentation](https://spiceai.org/docs/components/catalogs/ducklake) |
| 260 | +- [DuckLake Data Connector documentation](https://spiceai.org/docs/components/data-connectors/ducklake) |
| 261 | +- For using `spice sql`, see the [CLI reference](https://docs.spiceai.org/cli/reference/sql). |
0 commit comments