You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[iceberg] Fix batchIndex sync and parallel subtask grouping in Iceberg sink
Address parallelism issues identified during review:
- Writer: Advance tableBatchIndexMap before the writer == null guard so all subtasks stay in sync when a subtask has no data for the table at schema-change time
- Writer: Skip flushTableWriter on initial CreateTableEvent since no data has been written yet and there is nothing to split
- Committer: Group WriteResultWrappers by batchIndex using a TreeMap, so wrappers from different subtasks with the same batchIndex are merged into a single Iceberg snapshot instead of being committed separately
Tests added:
- testBatchIndexInSyncWhenSubtaskHasNoWriterAtSchemaChange
- testNoDuplicateWithParallelSubtasksMissingPreSchemaChangeData
- testSameBatchIndexFromTwoSubtasksMergedIntoOneSnapshot
Copy file name to clipboardExpand all lines: flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/main/java/org/apache/flink/cdc/connectors/iceberg/sink/v2/IcebergCommitter.java
Copy file name to clipboardExpand all lines: flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-iceberg/src/main/java/org/apache/flink/cdc/connectors/iceberg/sink/v2/IcebergWriter.java
0 commit comments