-
Notifications
You must be signed in to change notification settings - Fork 81
Open
Description
Problem
When DuckDB writes Iceberg tables via the REST catalog (e.g., to AWS S3 Tables), the snapshot summary only contains the operation field and is missing statistics fields that some query engines require.
DuckDB-created snapshot:
"summary": {
"operation": "append"
}Expected (e.g., Trino-created):
"summary": {
"operation": "append",
"added-data-files": "2",
"added-records": "9",
"added-files-size": "1333",
"changed-partition-count": "2",
"total-records": "9",
"total-files-size": "1333",
"total-data-files": "2",
"total-delete-files": "0",
"total-position-deletes": "0",
"total-equality-deletes": "0"
}Impact
Amazon Redshift fails to query DuckDB-created Iceberg tables with:
Error parsing table metadata. code: 15003 context: Required field total-records missing.
Root Cause
In src/metadata/iceberg_snapshot.cpp, the ToRESTObject() method only sets summary.operation:
rest_api_objects::Snapshot IcebergSnapshot::ToRESTObject() const {
// ...
res.summary.operation = OperationTypeToString(operation);
// Missing: total-records, total-data-files, total-files-size, etc.
// ...
}Reproduction
- Create an Iceberg table using DuckDB with the REST catalog (e.g., AWS S3 Tables)
- Try to query the table from Amazon Redshift
- Observe the "Required field total-records missing" error
Suggested Fix
Populate the snapshot summary with statistics during write operations:
total-recordstotal-data-filestotal-files-sizetotal-delete-filestotal-position-deletestotal-equality-deletesadded-records(for append operations)added-data-files(for append operations)added-files-size(for append operations)
Environment
- DuckDB Iceberg extension: latest (commit ef13fd0)
- Target: AWS S3 Tables with REST catalog
- Query engine experiencing issue: Amazon Redshift Serverless
Metadata
Metadata
Assignees
Labels
No labels