How is DuckLake different from old Hive metastore? #63
Replies: 1 comment
-
Most of the flaws of Hive described in the article don't really apply since DuckLake does not use Hive's partitioning structure, and supports hidden partitioning/partition evolution similar to Iceberg. As for scaling of DuckLake - this is discussed in the manifesto and the podcast. How well DuckLake scales mainly depends on the catalog server that is used. Storing metadata in a database has worked just fine for Snowflake and BigQuery - so this architecture clearly scales to enormous data set sizes. Of course once your metadata sets grow to enormous sizes you will not be able to use a Postgres instance to manage it anymore. For the majority of data set sizes and use cases, however, a Postgres instance will work just fine. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
The idea of managing metadata solely on metastore is not new. It's roughly how Hive + Hadoop works. How bad would DuckLake performance degrade overtime? We put all manifests/checkpoints/metadata ... along with the actual data files in the same lake so that reader wouldn't overwhelm and lock the metastore.
https://lakefs.io/blog/hive-metastore-it-didnt-age-well/
Beta Was this translation helpful? Give feedback.
All reactions