You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[iceberg] Fix duplicate records when schema change splits writes within a checkpoint
When a schema-change event arrives mid-checkpoint, the writer flushes the
affected table before applying the new schema, producing two batches for
the same table. Previously these were merged into one RowDelta and committed
as a single Iceberg snapshot. Because Iceberg equality-delete files only
suppress data with a strictly lower sequence number, same-snapshot deletes
were ineffective and both versions of a row appeared on read.
- flush(boolean) is now a no-op to prevent unrelated tables from being
split into multiple batches on non-schema-change flushes
- Schema-change events call flushTableWriter(tableId) to flush only the
affected table; a per-table batchIndex increments on each flush
- Each batch is committed as a separate Iceberg snapshot so equality-deletes
in batch N have a strictly higher sequence number than data in batch M (M<N)
- flink.batch-index and flink.checkpoint-id snapshot properties enable
retry-safe idempotency: on failure, the committer resumes from the last
uncommitted batch without re-committing already-persisted files
Tests added for: same-PK dedup across batches, schema-change split correctness,
retry after partial batch commit, multiple schema changes in one checkpoint,
and multi-table isolation.
Copy file name to clipboardExpand all lines: flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/main/java/org/apache/flink/cdc/connectors/iceberg/sink/v2/IcebergCommitter.java
Copy file name to clipboardExpand all lines: flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/main/java/org/apache/flink/cdc/connectors/iceberg/sink/v2/IcebergWriter.java
Copy file name to clipboardExpand all lines: flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/main/java/org/apache/flink/cdc/connectors/iceberg/sink/v2/WriteResultWrapper.java
+21-1Lines changed: 21 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -40,17 +40,31 @@ public class WriteResultWrapper implements Serializable {
40
40
41
41
privatefinalStringoperatorId;
42
42
43
+
/** Batch index within the checkpoint for this table; increments on each schema-change flush. */
0 commit comments