Skip to content

Add more columns to Iceberg $files system table#29044

Open
oskar-szwajkowski wants to merge 2 commits intotrinodb:masterfrom
oskar-szwajkowski:osz/add-more-columns-to-iceberg-files-table
Open

Add more columns to Iceberg $files system table#29044
oskar-szwajkowski wants to merge 2 commits intotrinodb:masterfrom
oskar-szwajkowski:osz/add-more-columns-to-iceberg-files-table

Conversation

@oskar-szwajkowski
Copy link
Copy Markdown
Contributor

@oskar-szwajkowski oskar-szwajkowski commented Apr 8, 2026

Description

This PR enhances Iceberg's connector $files system table by adding additional columns like:
file_sequence_number, data_sequence_number, referenced_data_file, pos, manifest_location, first_row_id, content_offset, and content_size_in_bytes

Those are mostly useful for an insights of how does delete file relate to data files

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(X) Release notes are required, with the following suggested text:

* Extend $files system table in Iceberg connector with additional columns

@cla-bot cla-bot bot added the cla-signed label Apr 8, 2026
@github-actions github-actions bot added the iceberg Iceberg connector label Apr 8, 2026
@oskar-szwajkowski oskar-szwajkowski force-pushed the osz/add-more-columns-to-iceberg-files-table branch from 24955ad to d446f21 Compare April 8, 2026 18:24
@github-actions github-actions bot added the docs label Apr 8, 2026
@oskar-szwajkowski oskar-szwajkowski force-pushed the osz/add-more-columns-to-iceberg-files-table branch from d446f21 to 3711921 Compare April 8, 2026 20:29
// readable_metrics
writeValueOrNull(pageBuilder, READABLE_METRICS_COLUMN_NAME, () -> metadataSchema.findField(MetricsUtil.READABLE_METRICS),
(blkBldr, value) -> VARCHAR.writeString(blkBldr, readableMetricsToJson(readableMetricsStruct(schema, contentFile, value.type().asStructType()), primitiveFields)));
// file_sequence_number
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's rather unclear to me the purpose of the (obvious) comments - i see the old ones as well

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me too, but if this pattern exist then its more unclear not to follow it in the middle

let me remove those in commit before

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added 1 commit before with comments removal and extracting some code to method

Comment on lines +279 to +281
writeValueOrNull(pageBuilder, FilesTable.REFERENCED_DATA_FILE_COLUMN_NAME, () -> null, VARCHAR::writeString);
writeValueOrNull(pageBuilder, FilesTable.CONTENT_OFFSET_COLUMN_NAME, () -> null, BIGINT::writeLong);
writeValueOrNull(pageBuilder, FilesTable.CONTENT_SIZE_IN_BYTES_COLUMN_NAME, () -> null, BIGINT::writeLong);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could introduce writeNull convenience method to avoid sending () -> null repeatedly

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@findinpath
Copy link
Copy Markdown
Contributor

nit:

release notes

Extend $files system table in Iceberg connector with additional columns

Please use the provided pattern in the template to ease up maintainer's work in adding the topic to the release notes PR
#28894

@oskar-szwajkowski oskar-szwajkowski force-pushed the osz/add-more-columns-to-iceberg-files-table branch from 3711921 to a3fb91b Compare April 9, 2026 09:47
@findinpath
Copy link
Copy Markdown
Contributor

@oskar-szwajkowski do note that there is as well a competing PR from @kaveti #28933

Please reconcile which one should be pushed forward.

Remove redundant comments in FilesTablePageSource

Extract code to method
Add file_sequence_number, data_sequence_number, referenced_data_file,
pos, manifest_location, first_row_id, content_offset,
and content_size_in_bytes columns to the Iceberg $files system table.

These columns expose additional metadata from Iceberg manifest entries,
including delete-file-specific field like referenced_data_file and
content_offset which are populated for deletion vectors.
@oskar-szwajkowski oskar-szwajkowski force-pushed the osz/add-more-columns-to-iceberg-files-table branch from a3fb91b to bd68773 Compare April 10, 2026 10:07
Copy link
Copy Markdown
Contributor

@chenjian2664 chenjian2664 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

* - `first_row_id`
- `BIGINT`
- The ID of the first row in the data file.
* - `content_offset`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

// sort_order_id
writeValueOrNull(pageBuilder, SORT_ORDER_ID_COLUMN_NAME, contentFile::sortOrderId,
(blkBldr, value) -> INTEGER.writeLong(blkBldr, value));
// readable_metrics
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it intentional?

// readable_metrics
writeValueOrNull(pageBuilder, READABLE_METRICS_COLUMN_NAME, () -> metadataSchema.findField(MetricsUtil.READABLE_METRICS),
(blkBldr, value) -> VARCHAR.writeString(blkBldr, readableMetricsToJson(readableMetricsStruct(schema, contentFile, value.type().asStructType()), primitiveFields)));
writeValueOrNull(pageBuilder, FilesTable.FILE_SEQUENCE_NUMBER_COLUMN_NAME, contentFile::fileSequenceNumber, BIGINT::writeLong);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static import the FilesTable field

Comment on lines +1593 to +1600
* - `content_offset`
- `BIGINT`
- The offset in the file where the content starts. Only set for deletion
vectors, `NULL` for data files.
* - `content_size_in_bytes`
- `BIGINT`
- The size of the content in bytes. Only set for deletion vectors, `NULL` for
data files.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current name is misleading, as the functionality is clearly valuable for data files as well. I understand the intent behind it, but in that case, I would recommend either incorporating "delete" or "deletion" into the name for clarity - otherwise looks confusion with "file_size_in_bytes"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

4 participants