A Trino connector for DuckLake, enabling SQL queries over DuckLake tables through Trino's distributed query engine.
It was largely prototyped with the help of AI based on references such as:
- The official DuckLake specifications
- Existing Trino plugins like trino-iceberg and trino-snowflake
This connector integrates DuckLake with Trino by:
- Connecting to a PostgreSQL metadata store containing DuckLake schema information
- Reading Parquet data files from S3 storage
- Supporting complex data types including structs, arrays, and nested types
- Providing read-only access to DuckLake tables
The connector requires the following configuration properties:
# PostgreSQL metadata database connection
ducklake.metadata-url=jdbc:postgresql://host:port/database?user=username&password=password
# S3 configuration
ducklake.s3.endpoint=https://s3.amazonaws.com
ducklake.s3.region=us-east-1
ducklake.s3.bucket=your-bucket-name
ducklake.s3.access-key=your-access-key
ducklake.s3.secret-key=your-secret-key# S3 advanced configuration
ducklake.s3.path-style-access=false
ducklake.s3.use-ssl=true
ducklake.s3.sse-c.key=your-sse-c-keyThe connector supports the following DuckLake/Parquet data types:
| DuckLake Type | Trino Type | Notes |
|---|---|---|
boolean |
boolean |
|
int8 |
tinyint |
|
int16 |
smallint |
|
int32 |
integer |
|
int64 |
bigint |
|
uint8, uint16, uint32, uint64 |
bigint |
Unsigned types mapped to larger signed type |
float32 |
real |
|
float64 |
double |
|
decimal(p,s) |
decimal(p,s) |
|
varchar |
varchar |
|
json |
varchar |
JSON stored as text |
blob |
varbinary |
|
date |
date |
|
time |
time(6) |
Microsecond precision |
timetz |
time(6) with time zone |
|
timestamp |
timestamp(6) |
Microsecond precision |
timestamptz |
timestamp(3) with time zone |
|
timestamp_s |
timestamp(0) |
Second precision |
timestamp_ms |
timestamp(3) |
Millisecond precision |
timestamp_ns |
timestamp(9) |
Nanosecond precision |
uuid |
uuid |
|
struct |
row |
Nested structures |
list |
array |
Arrays with element type inference |
- Read-Only Operations: Only SELECT queries are supported. No INSERT, UPDATE, DELETE, or DDL operations.
- PostgreSQL Dependency: Requires a PostgreSQL database for metadata storage, which may not align with all DuckLake deployment patterns.
- S3-Only Storage: Currently only supports S3-compatible storage backends. Local filesystem and other storage systems are not supported.
- No Predicate Pushdown: The connector doesn't implement advanced optimizations like predicate pushdown to reduce data scanning.
- Snapshot Management: Always uses the latest snapshot (
MAX(snapshot_id)) without support for time travel or specific snapshot querying. - Type System Gaps:
- Unsigned integer types are mapped to signed types, potentially causing overflow issues
- Complex nested type validation is limited
- No support for DuckLake-specific types that don't map cleanly to Trino