Skip to content

altertable-ai/trino-ducklake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

trino-ducklake

A Trino connector for DuckLake, enabling SQL queries over DuckLake tables through Trino's distributed query engine.

⚠️ This plugin is intended as a proof of concept and for experimentation only. Is it not production-ready.

It was largely prototyped with the help of AI based on references such as:

Overview

This connector integrates DuckLake with Trino by:

  • Connecting to a PostgreSQL metadata store containing DuckLake schema information
  • Reading Parquet data files from S3 storage
  • Supporting complex data types including structs, arrays, and nested types
  • Providing read-only access to DuckLake tables

Configuration

The connector requires the following configuration properties:

Required Properties

# PostgreSQL metadata database connection
ducklake.metadata-url=jdbc:postgresql://host:port/database?user=username&password=password

# S3 configuration
ducklake.s3.endpoint=https://s3.amazonaws.com
ducklake.s3.region=us-east-1
ducklake.s3.bucket=your-bucket-name
ducklake.s3.access-key=your-access-key
ducklake.s3.secret-key=your-secret-key

Optional Properties

# S3 advanced configuration
ducklake.s3.path-style-access=false
ducklake.s3.use-ssl=true
ducklake.s3.sse-c.key=your-sse-c-key

Supported Data Types

The connector supports the following DuckLake/Parquet data types:

DuckLake Type Trino Type Notes
boolean boolean
int8 tinyint
int16 smallint
int32 integer
int64 bigint
uint8, uint16, uint32, uint64 bigint Unsigned types mapped to larger signed type
float32 real
float64 double
decimal(p,s) decimal(p,s)
varchar varchar
json varchar JSON stored as text
blob varbinary
date date
time time(6) Microsecond precision
timetz time(6) with time zone
timestamp timestamp(6) Microsecond precision
timestamptz timestamp(3) with time zone
timestamp_s timestamp(0) Second precision
timestamp_ms timestamp(3) Millisecond precision
timestamp_ns timestamp(9) Nanosecond precision
uuid uuid
struct row Nested structures
list array Arrays with element type inference

Current Limitations (non exhaustive)

  • Read-Only Operations: Only SELECT queries are supported. No INSERT, UPDATE, DELETE, or DDL operations.
  • PostgreSQL Dependency: Requires a PostgreSQL database for metadata storage, which may not align with all DuckLake deployment patterns.
  • S3-Only Storage: Currently only supports S3-compatible storage backends. Local filesystem and other storage systems are not supported.
  • No Predicate Pushdown: The connector doesn't implement advanced optimizations like predicate pushdown to reduce data scanning.
  • Snapshot Management: Always uses the latest snapshot (MAX(snapshot_id)) without support for time travel or specific snapshot querying.
  • Type System Gaps:
    • Unsigned integer types are mapped to signed types, potentially causing overflow issues
    • Complex nested type validation is limited
    • No support for DuckLake-specific types that don't map cleanly to Trino

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages