-
Notifications
You must be signed in to change notification settings - Fork 63
Description
An iceberg table stored on s3 consists of two folders:
- metadata
- data
Moreover the metadata instead of directly pointing to the data folder points to the parent folder. The iceberg libraries look for data folder under this folder to retrieve the data files.
For anyone who wants to migrate their parquet tables to iceberg tables in-place option becomes impossible. This complicates migration since if I have an existing s3 folder which contains the data files and have written spark code to read parquet data from there. I won't be able to do so once I move to iceberg since I need to look at the data sub-folder instead.
This is in sharp contrast to delta lake table format where the structure of the delta table on s3 consists of only one folder:
metadata - the metadata.json points directly to the folder containing the data which is the parent folder of metadata.
data files are not in a sub-folder but live next to the metadata folder under the same folder.
From in-place migration point of view delta table structure makes it much easier to migrate from parquet snapshot (drop+CTAS tables) instead of migrating to iceberg table format.