DuckDB + Ducklake as main DB #394

OrGamliel8 · 2025-08-26T16:05:40Z

OrGamliel8
Aug 26, 2025

We’re currently exploring a fairly radical shift in our backend architecture, and I’d love to get some feedback.

Our current system is based on MongoDB combined with Atlas Search. We’re considering replacing it entirely with DuckDB + Ducklake, working directly on Parquet files stored in S3, without any additional database layer.

• Users can update data via the UI, which we plan to support using inline updates (DuckDB writes).
• Analytical jobs that update millions of records currently take hours – with DuckDB, we’ve seen they could take just minutes.
• All data is stored in columnar format and compressed, which significantly reduces both cost and latency for analytic workloads.

To support Ducklake, we’ll be using PostgreSQL as the catalog backend, while the actual data remains in S3.

The only real pain point we’re struggling with is retrieving a record by ID efficiently, which is trivial in MongoDB.

So here’s my question: Does it sound completely unreasonable to build a production-grade system that relies solely on Ducklake (on S3) as the primary datastore, assuming we handle write scenarios via inline updates and optimize access patterns?

Would love to hear from others who tried something similar – or any thoughts on potential pitfalls.

aszenz · 2025-08-29T07:46:27Z

aszenz
Aug 29, 2025

I looked into using ducklake but I found that it's simpler to insert data into a primary data store like duckdb/postgres and then periodically shift older data to S3. The tooling around this is not ideal but it's the fastest and most flexible option

0 replies

meomeocoj · 2025-08-30T08:17:49Z

meomeocoj
Aug 30, 2025

I think DuckLake have been borned to be lightweight Lakehouse. You switched from MongoDB to Ducklake + DuckDB, which mean you switch from Data Warehouse architecture to Lakehouse Architecture. In my opinion, if your source data only have only come from 1 or 2 generator (Your application or something else), you shoul keep Data Warehouse architecture with Mongo or Postgres and schedually export data to S3 for analytics by DuckDB. Make sure you understand the Ducklake is not only the extension, it's more than that.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DuckDB + Ducklake as main DB #394

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

DuckDB + Ducklake as main DB #394

Uh oh!

Uh oh!

OrGamliel8 Aug 26, 2025

Replies: 2 comments

Uh oh!

aszenz Aug 29, 2025

Uh oh!

meomeocoj Aug 30, 2025

OrGamliel8
Aug 26, 2025

aszenz
Aug 29, 2025

meomeocoj
Aug 30, 2025