Skip to content

Snapshot summary missing required statistics (total-records, total-data-files) breaks Redshift compatibility #633

@CedricYauLBD

Description

@CedricYauLBD

Problem

When DuckDB writes Iceberg tables via the REST catalog (e.g., to AWS S3 Tables), the snapshot summary only contains the operation field and is missing statistics fields that some query engines require.

DuckDB-created snapshot:

"summary": {
  "operation": "append"
}

Expected (e.g., Trino-created):

"summary": {
  "operation": "append",
  "added-data-files": "2",
  "added-records": "9",
  "added-files-size": "1333",
  "changed-partition-count": "2",
  "total-records": "9",
  "total-files-size": "1333",
  "total-data-files": "2",
  "total-delete-files": "0",
  "total-position-deletes": "0",
  "total-equality-deletes": "0"
}

Impact

Amazon Redshift fails to query DuckDB-created Iceberg tables with:

Error parsing table metadata. code: 15003 context: Required field total-records missing.

Root Cause

In src/metadata/iceberg_snapshot.cpp, the ToRESTObject() method only sets summary.operation:

rest_api_objects::Snapshot IcebergSnapshot::ToRESTObject() const {
    // ...
    res.summary.operation = OperationTypeToString(operation);
    // Missing: total-records, total-data-files, total-files-size, etc.
    // ...
}

Reproduction

  1. Create an Iceberg table using DuckDB with the REST catalog (e.g., AWS S3 Tables)
  2. Try to query the table from Amazon Redshift
  3. Observe the "Required field total-records missing" error

Suggested Fix

Populate the snapshot summary with statistics during write operations:

  • total-records
  • total-data-files
  • total-files-size
  • total-delete-files
  • total-position-deletes
  • total-equality-deletes
  • added-records (for append operations)
  • added-data-files (for append operations)
  • added-files-size (for append operations)

Environment

  • DuckDB Iceberg extension: latest (commit ef13fd0)
  • Target: AWS S3 Tables with REST catalog
  • Query engine experiencing issue: Amazon Redshift Serverless

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions