How is DuckLake different from old Hive metastore? #63

HuyNguyen7994 · 2025-05-28T11:33:38Z

HuyNguyen7994
May 28, 2025

The idea of managing metadata solely on metastore is not new. It's roughly how Hive + Hadoop works. How bad would DuckLake performance degrade overtime? We put all manifests/checkpoints/metadata ... along with the actual data files in the same lake so that reader wouldn't overwhelm and lock the metastore.
https://lakefs.io/blog/hive-metastore-it-didnt-age-well/

Mytherin · 2025-05-28T17:39:07Z

Mytherin
May 28, 2025
Maintainer

Most of the flaws of Hive described in the article don't really apply since DuckLake does not use Hive's partitioning structure, and supports hidden partitioning/partition evolution similar to Iceberg.

As for scaling of DuckLake - this is discussed in the manifesto and the podcast. How well DuckLake scales mainly depends on the catalog server that is used. Storing metadata in a database has worked just fine for Snowflake and BigQuery - so this architecture clearly scales to enormous data set sizes. Of course once your metadata sets grow to enormous sizes you will not be able to use a Postgres instance to manage it anymore. For the majority of data set sizes and use cases, however, a Postgres instance will work just fine.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How is DuckLake different from old Hive metastore? #63

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How is DuckLake different from old Hive metastore? #63

Uh oh!

HuyNguyen7994 May 28, 2025

Replies: 1 comment

Uh oh!

Uh oh!

Mytherin May 28, 2025 Maintainer

HuyNguyen7994
May 28, 2025

Mytherin
May 28, 2025
Maintainer