Skip to content

Latest commit

 

History

History
251 lines (189 loc) · 8.57 KB

File metadata and controls

251 lines (189 loc) · 8.57 KB

DuckLake Catalog Connector

Note: The DuckLake connector is available in Spice v2.0 or later.

The DuckLake Catalog Connector enables Spice to automatically discover and query all schemas and tables in a DuckLake catalog — an open lakehouse format that stores metadata in a SQLite-compatible database and data in Parquet files.

Prerequisites

Step 1. Create a new directory and initialize a Spicepod

mkdir ducklake-catalog-recipe
cd ducklake-catalog-recipe
spice init

Step 2. Create a DuckLake catalog with sample data

Open DuckDB and create a DuckLake catalog with TPC-H sample data:

duckdb

Install and load the DuckLake and TPC-H extensions, then create a catalog and populate it:

INSTALL ducklake;
LOAD ducklake;
INSTALL tpch;
LOAD tpch;

-- Create a DuckLake catalog with local metadata storage
ATTACH 'ducklake:metadata.ducklake' AS my_lakehouse;

-- Generate TPC-H data (scale factor 0.01 for a quick demo)
CALL dbgen(sf = 0.01, catalog = 'my_lakehouse');

Verify the tables were created:

SHOW ALL TABLES;
┌──────────────┬─────────┬──────────┬──────────────────┬───────────────────────────────────────────────────────────────┬───────────┐
│   database   │ schema  │   name   │ column_names     │ column_types                                                │ temporary │
│   varchar    │ varchar │ varchar  │    varchar[]      │             varchar[]                                       │  boolean  │
├──────────────┼─────────┼──────────┼──────────────────┼───────────────────────────────────────────────────────────────┤───────────┤
│ my_lakehouse │ main    │ customer │ [c_custkey, ...]  │ [INTEGER, VARCHAR, ...]                                     │ false     │
│ my_lakehouse │ main    │ lineitem │ [l_orderkey, ...] │ [INTEGER, INTEGER, ...]                                     │ false     │
│ my_lakehouse │ main    │ nation   │ [n_nationkey, ...]│ [INTEGER, VARCHAR, ...]                                     │ false     │
│ my_lakehouse │ main    │ orders   │ [o_orderkey, ...] │ [INTEGER, INTEGER, ...]                                     │ false     │
│ my_lakehouse │ main    │ part     │ [p_partkey, ...]  │ [INTEGER, VARCHAR, ...]                                     │ false     │
│ my_lakehouse │ main    │ partsupp │ [ps_partkey, ...] │ [INTEGER, INTEGER, ...]                                     │ false     │
│ my_lakehouse │ main    │ region   │ [r_regionkey, ...]│ [INTEGER, VARCHAR, ...]                                     │ false     │
│ my_lakehouse │ main    │ supplier │ [s_suppkey, ...]  │ [INTEGER, VARCHAR, ...]                                     │ false     │
└──────────────┴─────────┴──────────┴──────────────────┴───────────────────────────────────────────────────────────────┴───────────┘

Exit DuckDB:

.exit

Step 3. Configure the DuckLake Catalog Connector in your Spicepod

Edit spicepod.yaml to add the DuckLake catalog:

version: v1
kind: Spicepod
name: ducklake-catalog-recipe

catalogs:
  - from: ducklake:metadata.ducklake
    name: my_lakehouse

Step 4. Start the Spice runtime

spice run

Observe that Spice discovers all schemas and tables:

2026-03-02T10:00:00.000000Z  INFO runtime::init::catalog: Registering catalog 'my_lakehouse' for ducklake
2026-03-02T10:00:00.500000Z  INFO runtime::init::catalog: Registered catalog 'my_lakehouse' with 1 schema and 8 tables

Step 5. Query the DuckLake catalog

In a new terminal, start the Spice SQL REPL:

spice sql

List all discovered tables:

SHOW TABLES;
+---------------+--------------+------------+------------+
| table_catalog | table_schema | table_name | table_type |
+---------------+--------------+------------+------------+
| my_lakehouse  | main         | customer   | BASE TABLE |
| my_lakehouse  | main         | lineitem   | BASE TABLE |
| my_lakehouse  | main         | nation     | BASE TABLE |
| my_lakehouse  | main         | orders     | BASE TABLE |
| my_lakehouse  | main         | part       | BASE TABLE |
| my_lakehouse  | main         | partsupp   | BASE TABLE |
| my_lakehouse  | main         | region     | BASE TABLE |
| my_lakehouse  | main         | supplier   | BASE TABLE |
| spice         | runtime      | task_history | BASE TABLE |
| spice         | runtime      | metrics    | BASE TABLE |
+---------------+--------------+------------+------------+

Query the customer table:

SELECT c_custkey, c_name, c_mktsegment, c_acctbal
FROM my_lakehouse.main.customer
LIMIT 5;
+-----------+--------------------+--------------+-----------+
| c_custkey | c_name             | c_mktsegment | c_acctbal |
+-----------+--------------------+--------------+-----------+
| 1         | Customer#000000001 | BUILDING     | 711.56    |
| 2         | Customer#000000002 | AUTOMOBILE   | 121.65    |
| 3         | Customer#000000003 | AUTOMOBILE   | 7498.12   |
| 4         | Customer#000000004 | MACHINERY    | 2866.83   |
| 5         | Customer#000000005 | HOUSEHOLD    | 794.47    |
+-----------+--------------------+--------------+-----------+

Run a cross-table query:

SELECT n.n_name AS nation, COUNT(*) AS num_customers, ROUND(AVG(c.c_acctbal), 2) AS avg_balance
FROM my_lakehouse.main.customer c
JOIN my_lakehouse.main.nation n ON c.c_nationkey = n.n_nationkey
GROUP BY n.n_name
ORDER BY num_customers DESC
LIMIT 5;

Step 6. Enable read-write access (optional)

To enable write operations, update the catalog configuration with access: read_write:

version: v1
kind: Spicepod
name: ducklake-catalog-recipe

catalogs:
  - from: ducklake:metadata.ducklake
    name: my_lakehouse
    access: read_write

Restart Spice and insert data:

spice run
spice sql
INSERT INTO my_lakehouse.main.region (r_regionkey, r_name, r_comment)
VALUES (5, 'ANTARCTICA', 'A cold and remote region');
+-------+
| count |
+-------+
| 1     |
+-------+

Verify the insert:

SELECT * FROM my_lakehouse.main.region ORDER BY r_regionkey;

Using the DuckLake Data Connector

Instead of the catalog connector (which auto-discovers all tables), you can connect to specific tables using the DuckLake data connector:

version: v1
kind: Spicepod
name: ducklake-data-connector-recipe

datasets:
  - from: ducklake:customer
    name: customer
    params:
      connection_string: metadata.ducklake
  - from: ducklake:orders
    name: orders
    params:
      connection_string: metadata.ducklake

This is useful when you only need specific tables or want to configure each dataset independently (e.g., with different acceleration settings).

Using with Cloud Storage (S3)

DuckLake supports storing metadata and data on cloud storage. To use S3:

  1. Ensure AWS credentials are available via environment variables, ~/.aws/credentials, or an IAM instance profile.

  2. Create a DuckLake catalog on S3 (via DuckDB CLI):

ATTACH 'ducklake:s3://my-bucket/lakehouse/metadata.ducklake' AS cloud_lakehouse;
  1. Configure the Spice catalog:
catalogs:
  - from: ducklake:s3://my-bucket/lakehouse/metadata.ducklake
    name: cloud_lakehouse

Learn more