iceberg: add case_sensitive_columns option#4365
Conversation
Adds a new top-level config field that makes column-name matching follow
iceberg's recommended case-insensitive convention end-to-end when set to
false. Defaults to true to preserve released behaviour.
When false, the option is plumbed through:
- shredder lookup (input keys → schema field names)
- schema_metadata findCommonField traversal
- partition spec column references (top-level and nested)
- struct inference (auto-inferred and metadata-driven), rejecting
nested keys that fold to the same lowercase string
- top-level table creation, rejecting case-only-duplicate keys
- sink dedup keys, so cross-message new fields differing only in
case fold to a single schema-evolution attempt
- catalogx UpdateSchema's caseSensitive flag
The shredder errors on ambiguous case-only duplicates (two input keys
mapping to the same schema column).
Adds unit coverage for each touchpoint and four integration tests
covering the column-matching path, create-time duplicate rejection,
cross-message new-field dedup, and case-insensitive partition spec
parsing during table creation.
|
Commits Review LGTM |
|
Commits Review LGTM |
|
Commits Review LGTM |
Summary
case_sensitive_columnsboolean to theicebergoutput (defaulttrue, preserving released behaviour). When set tofalse, column-name matching follows iceberg's recommended case-insensitive convention end-to-end.schema_metadatatraversal, partition spec column references, struct inference (auto-inferred and metadata-driven), top-level table-creation duplicate detection, sink dedup keys for cross-message new fields, and thecaseSensitiveargument to iceberg-go'sUpdateSchema. Ambiguous case-only duplicates in input records are rejected at shredding and creation.Test plan
task fmttask lint(0 issues)task test:unitinternal/impl/iceberg/integration):CaseInsensitiveColumnMatching— pre-created lowercase table, uppercase-keyed messages, no schema evolution, data lands in the right columnsCaseInsensitiveCreateTimeDuplicate— record withidandIDrejected at create timeCaseInsensitiveNewFieldDedupAcrossBatch— two messages with case-only-different new fields produce exactly one new columnCaseInsensitivePartitionSpecCreate— partition spec with uppercase column reference resolves against schema inferred with lowercase keys