Using AWS DSQL as metadata server? #308
Replies: 3 comments 5 replies
-
The Ducklake format itself is database agnostic, it can work with any SQL database as long as it supports primary keys and transactions. The DuckDB engine that communicates with the DuckLake, however, currently only supports PostgreSQL, MySQL and SQLite. I'm not sure about Aurora and DSQL as I have never used them, but assuming they support the PostgreSQL protocol, then yes they can be used for the ducklake meta server. I do, however, have tried and can confirm that Neon Serverless can be used as ducklake meta server since its postgres compatible even though its technically not postgres itself. |
Beta Was this translation helpful? Give feedback.
-
You could use SQLite and replicate the WAL using Litestream perhaps . Or replicate with NATS using https://github.com/wongfei2009/harmonylite I have used this way to replicate the ducklake SQLite . It will sync with every other ducklake SQLite using CRDT sync so you have a multi master replication of ducklake. It’s 100% a side car listening to the WAL of SQLite , and so need absolutely no integration with ducklake to work . |
Beta Was this translation helpful? Give feedback.
-
I looked deeper into using AWS DSQL as meta store but right now it is not possible as DSQL only suports a single DDL statement per transaction - I get the following error message: java.sql.SQLException: Invalid Error: Failed to initialize DuckLake:Failed to execute query "CREATE TABLE "public"."ducklake_snapshot"("snapshot_id" BIGINT PRIMARY KEY, "snapshot_time" TIMESTAMP WITH TIME ZONE, "schema_version" BIGINT, "next_catalog_id" BIGINT, "next_file_id" BIGINT);": ERROR: multiple ddl statements not supported in a transaction The "postgres" extension for DuckDB seem to work with DSQL (I only tried some simple selects so far though) so once AWS adds better Postgre compatibility there MAY be future hope to use it also as a Ducklake meta store... |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I really love the idea of DuckLake and using DuckDB for local analytics. As the organisation I work with use 100% AWS for all backend storage and compute I would like use an AWS hosted database server for the meta storage. To this end I am curious about how much "postgre compatibility" that is needed? I assume the fully Postgre compatible Aurora would work (or is a duckdb postgre extension needed also on the meta server?) and the "serverless" version would be ok (as it nowadays scale down to zero hardware when there are no load) but I would have preferred the TRULY serverless DSQL that today have somewhat "thin" compatibility lacking quite a bit of POSTGRE features.
Is there some description of what SQL engine features that are required for storing the Ducklake meta data?
Anybody that have already tried runing meta server on AWS Aurora or even better on DSQL?
Beta Was this translation helpful? Give feedback.
All reactions