Add more columns to Iceberg $files system table#29044
Add more columns to Iceberg $files system table#29044oskar-szwajkowski wants to merge 2 commits intotrinodb:masterfrom
Conversation
24955ad to
d446f21
Compare
d446f21 to
3711921
Compare
| // readable_metrics | ||
| writeValueOrNull(pageBuilder, READABLE_METRICS_COLUMN_NAME, () -> metadataSchema.findField(MetricsUtil.READABLE_METRICS), | ||
| (blkBldr, value) -> VARCHAR.writeString(blkBldr, readableMetricsToJson(readableMetricsStruct(schema, contentFile, value.type().asStructType()), primitiveFields))); | ||
| // file_sequence_number |
There was a problem hiding this comment.
it's rather unclear to me the purpose of the (obvious) comments - i see the old ones as well
There was a problem hiding this comment.
to me too, but if this pattern exist then its more unclear not to follow it in the middle
let me remove those in commit before
There was a problem hiding this comment.
added 1 commit before with comments removal and extracting some code to method
| writeValueOrNull(pageBuilder, FilesTable.REFERENCED_DATA_FILE_COLUMN_NAME, () -> null, VARCHAR::writeString); | ||
| writeValueOrNull(pageBuilder, FilesTable.CONTENT_OFFSET_COLUMN_NAME, () -> null, BIGINT::writeLong); | ||
| writeValueOrNull(pageBuilder, FilesTable.CONTENT_SIZE_IN_BYTES_COLUMN_NAME, () -> null, BIGINT::writeLong); |
There was a problem hiding this comment.
you could introduce writeNull convenience method to avoid sending () -> null repeatedly
|
nit: release notes Please use the provided pattern in the template to ease up maintainer's work in adding the topic to the release notes PR |
3711921 to
a3fb91b
Compare
|
@oskar-szwajkowski do note that there is as well a competing PR from @kaveti #28933 Please reconcile which one should be pushed forward. |
Remove redundant comments in FilesTablePageSource Extract code to method
Add file_sequence_number, data_sequence_number, referenced_data_file, pos, manifest_location, first_row_id, content_offset, and content_size_in_bytes columns to the Iceberg $files system table. These columns expose additional metadata from Iceberg manifest entries, including delete-file-specific field like referenced_data_file and content_offset which are populated for deletion vectors.
a3fb91b to
bd68773
Compare
| * - `first_row_id` | ||
| - `BIGINT` | ||
| - The ID of the first row in the data file. | ||
| * - `content_offset` |
| // sort_order_id | ||
| writeValueOrNull(pageBuilder, SORT_ORDER_ID_COLUMN_NAME, contentFile::sortOrderId, | ||
| (blkBldr, value) -> INTEGER.writeLong(blkBldr, value)); | ||
| // readable_metrics |
| // readable_metrics | ||
| writeValueOrNull(pageBuilder, READABLE_METRICS_COLUMN_NAME, () -> metadataSchema.findField(MetricsUtil.READABLE_METRICS), | ||
| (blkBldr, value) -> VARCHAR.writeString(blkBldr, readableMetricsToJson(readableMetricsStruct(schema, contentFile, value.type().asStructType()), primitiveFields))); | ||
| writeValueOrNull(pageBuilder, FilesTable.FILE_SEQUENCE_NUMBER_COLUMN_NAME, contentFile::fileSequenceNumber, BIGINT::writeLong); |
There was a problem hiding this comment.
static import the FilesTable field
| * - `content_offset` | ||
| - `BIGINT` | ||
| - The offset in the file where the content starts. Only set for deletion | ||
| vectors, `NULL` for data files. | ||
| * - `content_size_in_bytes` | ||
| - `BIGINT` | ||
| - The size of the content in bytes. Only set for deletion vectors, `NULL` for | ||
| data files. |
There was a problem hiding this comment.
I think the current name is misleading, as the functionality is clearly valuable for data files as well. I understand the intent behind it, but in that case, I would recommend either incorporating "delete" or "deletion" into the name for clarity - otherwise looks confusion with "file_size_in_bytes"
Description
This PR enhances Iceberg's connector
$filessystem table by adding additional columns like:file_sequence_number, data_sequence_number, referenced_data_file, pos, manifest_location, first_row_id, content_offset, and content_size_in_bytes
Those are mostly useful for an insights of how does delete file relate to data files
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(X) Release notes are required, with the following suggested text:
* Extend $files system table in Iceberg connector with additional columns