Skip to content

Iceberg folder structure makes migration harder #123

@abhinigam

Description

@abhinigam

An iceberg table stored on s3 consists of two folders:

  • metadata
  • data

Moreover the metadata instead of directly pointing to the data folder points to the parent folder. The iceberg libraries look for data folder under this folder to retrieve the data files.

For anyone who wants to migrate their parquet tables to iceberg tables in-place option becomes impossible. This complicates migration since if I have an existing s3 folder which contains the data files and have written spark code to read parquet data from there. I won't be able to do so once I move to iceberg since I need to look at the data sub-folder instead.

This is in sharp contrast to delta lake table format where the structure of the delta table on s3 consists of only one folder:
metadata - the metadata.json points directly to the folder containing the data which is the parent folder of metadata.
data files are not in a sub-folder but live next to the metadata folder under the same folder.

From in-place migration point of view delta table structure makes it much easier to migrate from parquet snapshot (drop+CTAS tables) instead of migrating to iceberg table format.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions