Skip to content

Conversation

@CedricYauLBD
Copy link

@CedricYauLBD CedricYauLBD commented Dec 19, 2025

Summary

  • Serialize snapshot summary statistics (e.g., total-records, total-data-files, added-records) to metadata JSON

Fixes #633

Problem

AWS Redshift fails to read DuckDB-created Iceberg tables with error:

Error parsing table metadata. code: 15003 context: Required field total-records missing

Root Cause

DuckDB's IcebergSnapshotSummary struct already computes these statistics in additional_properties, but they weren't being serialized to the JSON output. The CommitTableToJSON function only wrote the operation field.

Solution

Added a loop to serialize all additional_properties from the snapshot summary:

for (const auto &prop : snapshot.summary.additional_properties) {
    yyjson_mut_obj_add_strcpy(doc, summary_json, prop.first.c_str(), prop.second.c_str());
}

Iceberg Spec Reference

Per Iceberg spec Appendix F, these fields are optional. However, AWS Redshift requires them for compatibility.

Test plan

  • Build and load extension
  • Create Iceberg table on S3 Tables
  • Verify metadata JSON contains summary statistics
  • Verify Redshift can read the table

🤖 Generated with Claude Code

CedricYauLBD and others added 2 commits December 18, 2025 21:11
This patch adds the required snapshot summary statistics (total-records,
total-data-files, total-files-size, etc.) that are needed for Amazon
Redshift and other query engines to read DuckDB-created Iceberg tables.

Previously, DuckDB only set the "operation" field in the snapshot summary,
which caused Redshift to fail with:
  "Error parsing table metadata. Required field total-records missing."

Changes:
- Add IcebergSnapshotSummary struct to track cumulative and delta statistics
- Update ToRESTObject() to serialize statistics to additional_properties
- Update AddSnapshot() and AddUpdateSnapshot() to compute statistics from
  manifest entries and previous snapshot totals
- Update ParseSnapshot() to parse statistics from existing metadata files

Fixes: duckdb#633

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Per Iceberg spec Appendix F, snapshot summary fields like total-records
are optional. However, AWS Redshift requires these fields and fails with
"Required field total-records missing" when they're absent.

This change serializes the additional_properties from IcebergSnapshotSummary
to the metadata JSON, which includes statistics like total-records,
total-data-files, added-records, etc.

Reference: https://iceberg.apache.org/spec/#appendix-f-optional-snapshot-summary-fields

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Snapshot summary missing required statistics (total-records, total-data-files) breaks Redshift compatibility

1 participant