Query performance: DuckDB (native) vs DuckLake (Parquet partitions) #430

tanisha-ga · 2025-09-09T10:24:06Z

tanisha-ga
Sep 9, 2025

I ran the same query (involves joins across 7 tables) on DuckDB (native file) and DuckLake (per table Parquet files partitioned on one field) with identical data.

DuckDB: finished in <2s, scanning ~2.37B rows directly from a ~160GB native file, using ~4.3GB RAM entirely in memory.

DuckLake: took >14m on a ~103GB Parquet dataset, despite filter on the partitioned field being correctly pushed down. It used ~106GB RAM (25× more) and spilled ~312GB temp data to disk.

So while both systems scanned the same rows with the same filters, DuckLake was vastly slower and more resource-intensive.

DuckDB’s docs note that native files can optimize join order better, but I didn’t expect such a dramatic gap. Is this level of difference typical when doing join heavy querying Parquet in DuckLake?

YuweiXiao · 2025-09-09T11:15:01Z

YuweiXiao
Sep 9, 2025

is query plan the same for both engines?

1 reply

tanisha-ga Sep 9, 2025
Author

No. From the profiling result I can see that it really understimated the result cardinality compared to native DuckDB (DuckLake’s prediction: 24,142 while DuckDB’s prediction: 151,428,143) which caused it to choose a poor join order as it prepared for a small result. This resulted in huge intermediary results which it had to spill to disk.

Let me know if you need the query or the profiling result too.

tanisha-ga · 2025-09-18T11:08:54Z

tanisha-ga
Sep 18, 2025
Author

Hi @guillesd,
Do you have any thoughts on this?

5 replies

guillesd Sep 18, 2025
Collaborator

What is the query? Also maybe nice to know what is the data layout to see if DuckLake is doing any file pruning. I think we need to take a closer look at this!

tanisha-ga Sep 19, 2025
Author

I have attached the profiling results that also contains the query. The DuckLake dataset is organized over a hundred Parquet tables, each partitioned by the businessDate field. From the profiling results I can see that partition pruning is working as expected: for the applied businessDate filter, the engine scanned only 21 files per table that corresponds to the selected partitions.

profile_dup_duckdb.json
profile_dup_ducklake.json

guillesd Sep 19, 2025
Collaborator

I appreciate you sharing the profiling results but It would be easier for us if you could give us a reproducer (Query plus some data) where you see some of this deficiencies in action. The profiling results can give some insights but won't help us debug as much! Thanks!

bineetsingh30 Oct 3, 2025

Hi @guillesd, I work with Tanisha. We can't share the data as it is proprietary data. Is there some other way we can help you with debugging this. We could synthesize similar data but it might not yield similar results we see here.

guillesd Oct 3, 2025
Collaborator

if you can share some sort of self-contained reproducible example. So prepare some data (can be fake) and then do similar queries to what you where doing. Then we can run that ourselves and try to see what we can do to improve it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Query performance: DuckDB (native) vs DuckLake (Parquet partitions) #430

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 6 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Query performance: DuckDB (native) vs DuckLake (Parquet partitions) #430

Uh oh!

tanisha-ga Sep 9, 2025

Replies: 2 comments · 6 replies

Uh oh!

YuweiXiao Sep 9, 2025

Uh oh!

tanisha-ga Sep 9, 2025 Author

Uh oh!

tanisha-ga Sep 18, 2025 Author

Uh oh!

guillesd Sep 18, 2025 Collaborator

Uh oh!

tanisha-ga Sep 19, 2025 Author

Uh oh!

guillesd Sep 19, 2025 Collaborator

Uh oh!

bineetsingh30 Oct 3, 2025

Uh oh!

guillesd Oct 3, 2025 Collaborator

tanisha-ga
Sep 9, 2025

Replies: 2 comments 6 replies

YuweiXiao
Sep 9, 2025

tanisha-ga Sep 9, 2025
Author

tanisha-ga
Sep 18, 2025
Author

guillesd Sep 18, 2025
Collaborator

tanisha-ga Sep 19, 2025
Author

guillesd Sep 19, 2025
Collaborator

guillesd Oct 3, 2025
Collaborator