Query performance: DuckDB (native) vs DuckLake (Parquet partitions) #430
tanisha-ga
started this conversation in
General
Replies: 2 comments 6 replies
-
is query plan the same for both engines? |
Beta Was this translation helpful? Give feedback.
1 reply
-
Hi @guillesd, |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I ran the same query (involves joins across 7 tables) on DuckDB (native file) and DuckLake (per table Parquet files partitioned on one field) with identical data.
DuckDB: finished in <2s, scanning ~2.37B rows directly from a ~160GB native file, using ~4.3GB RAM entirely in memory.
DuckLake: took >14m on a ~103GB Parquet dataset, despite filter on the partitioned field being correctly pushed down. It used ~106GB RAM (25× more) and spilled ~312GB temp data to disk.
So while both systems scanned the same rows with the same filters, DuckLake was vastly slower and more resource-intensive.
DuckDB’s docs note that native files can optimize join order better, but I didn’t expect such a dramatic gap. Is this level of difference typical when doing join heavy querying Parquet in DuckLake?
Beta Was this translation helpful? Give feedback.
All reactions