iceberg: add case_sensitive_columns option by Jeffail · Pull Request #4365 · redpanda-data/connect

Jeffail · 2026-04-29T16:07:09Z

Summary

Adds a top-level case_sensitive_columns boolean to the iceberg output (default true, preserving released behaviour). When set to false, column-name matching follows iceberg's recommended case-insensitive convention end-to-end.
The flag is plumbed through every site that compares iceberg field names: shredder lookup, schema_metadata traversal, partition spec column references, struct inference (auto-inferred and metadata-driven), top-level table-creation duplicate detection, sink dedup keys for cross-message new fields, and the caseSensitive argument to iceberg-go's UpdateSchema. Ambiguous case-only duplicates in input records are rejected at shredding and creation.
Resolves a customer-reported issue where messages with upper-case keys against a table with lower-case columns triggered spurious schema evolution rather than routing into the existing columns.

Test plan

task fmt
task lint (0 issues)
task test:unit
New unit tests across the shredder, type inferrer/resolver, partition-spec parser, router, and writer sinks — covering both the case-insensitive path and the case-sensitive default
Integration tests (internal/impl/iceberg/integration):
- CaseInsensitiveColumnMatching — pre-created lowercase table, uppercase-keyed messages, no schema evolution, data lands in the right columns
- CaseInsensitiveCreateTimeDuplicate — record with id and ID rejected at create time
- CaseInsensitiveNewFieldDedupAcrossBatch — two messages with case-only-different new fields produce exactly one new column
- CaseInsensitivePartitionSpecCreate — partition spec with uppercase column reference resolves against schema inferred with lowercase keys

Adds a new top-level config field that makes column-name matching follow iceberg's recommended case-insensitive convention end-to-end when set to false. Defaults to true to preserve released behaviour. When false, the option is plumbed through: - shredder lookup (input keys → schema field names) - schema_metadata findCommonField traversal - partition spec column references (top-level and nested) - struct inference (auto-inferred and metadata-driven), rejecting nested keys that fold to the same lowercase string - top-level table creation, rejecting case-only-duplicate keys - sink dedup keys, so cross-message new fields differing only in case fold to a single schema-evolution attempt - catalogx UpdateSchema's caseSensitive flag The shredder errors on ambiguous case-only duplicates (two input keys mapping to the same schema column). Adds unit coverage for each touchpoint and four integration tests covering the column-matching path, create-time duplicate rejection, cross-message new-field dedup, and case-insensitive partition spec parsing during table creation.

claude · 2026-04-29T16:11:05Z

Commits
LGTM

Review
Adds an end-to-end case_sensitive_columns option for the iceberg output, plumbed through the shredder, schema_metadata resolution, partition spec parser, struct inference, table creation, sink dedup, and catalogx UpdateSchema. Defaults to true for backwards compatibility, with explicit ambiguity detection at every layer where case-only duplicates could otherwise cause silent data loss or schema corruption. Unit tests cover each touchpoint and four integration tests exercise the column-matching path, create-time duplicate rejection, cross-message new-field dedup, and case-insensitive partition spec parsing.

LGTM

claude · 2026-04-29T16:53:51Z

Commits
LGTM

Review
Comprehensive plumbing of the new case_sensitive_columns flag through every iceberg comparison site (shredder, schema_metadata traversal, partition spec parser, type inference/resolver, sink dedup, UpdateSchema). Default true preserves released behaviour. Case-only ambiguity is rejected at the right points. Unit and integration coverage exercise both modes.

LGTM

claude · 2026-04-29T23:06:53Z

Commits
LGTM

Review
The PR threads a new case_sensitive_columns config flag (default true for backward compatibility) through the iceberg output's shredder, type inferrer, type resolver, partition spec parser, sinks, and catalogx.UpdateSchema. Case-only duplicate detection is added at table creation, struct inference, and shredding, and new-field dedup folds case in case-insensitive mode. Unit and integration coverage is comprehensive across each touchpoint.

LGTM

Update docs

865b3bf

iceberg: fix schema evolution producing non-lowercase column names

6f756d3

josephwoodward added the run-integration-tests label Apr 29, 2026

josephwoodward approved these changes Apr 29, 2026

View reviewed changes

Jeffail merged commit 757dbcd into main Apr 30, 2026
11 checks passed

Jeffail deleted the iceberg-column-names branch April 30, 2026 07:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iceberg: add case_sensitive_columns option#4365

iceberg: add case_sensitive_columns option#4365
Jeffail merged 3 commits intomainfrom
iceberg-column-names

Jeffail commented Apr 29, 2026

Uh oh!

claude Bot commented Apr 29, 2026

Uh oh!

claude Bot commented Apr 29, 2026

Uh oh!

claude Bot commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jeffail commented Apr 29, 2026

Summary

Test plan

Uh oh!

claude Bot commented Apr 29, 2026

Uh oh!

claude Bot commented Apr 29, 2026

Uh oh!

claude Bot commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants