Iceberg folder structure makes migration harder

An iceberg table stored on s3 consists of two folders:

- metadata
- data

Moreover the metadata instead of directly pointing to the data folder points to the parent folder. The iceberg libraries look for data folder under this folder to retrieve the data files.

For anyone who wants to migrate their parquet tables to iceberg tables in-place option becomes impossible. This complicates migration since if I have an existing s3 folder which contains the data files and have written spark code to read parquet data from there. I won't be able to do so once I move to iceberg since I need to look at the data sub-folder instead.

This is in sharp contrast to delta lake table format where the structure of the delta table on s3 consists of only one folder:
metadata - the metadata.json points directly to the folder containing the data which is the parent folder of metadata.
data files are not in a sub-folder but live next to the metadata folder under the same folder.

From in-place migration point of view delta table structure makes it much easier to migrate from parquet snapshot (drop+CTAS tables) instead of migrating to iceberg table format. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Iceberg folder structure makes migration harder #123

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Iceberg folder structure makes migration harder #123

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions