Commit 42bff6e
committed
feat: streaming snapshot flush with persistent writer and adaptive file splitting
This commit implements the streaming snapshot flush pattern for the Iceberg sink.
Combined with the parallel incremental snapshot SPI introduced in
debezium/debezium#7362, it dramatically reduces commit overhead and memory
pressure during snapshot of large tables.
## Streaming snapshot flush
Instead of creating a new Iceberg writer for every batch (5K-20K rows), keep a
single writer open per table for the entire snapshot. The writer accumulates
data across chunks and produces a single atomic commit at table completion.
Periodic file splitting kicks in when the writer reaches a calibrated row
threshold, producing ~512MB Parquet files. After the first split-commit, the
threshold is recalibrated from actual file size (bytes-per-row) and clamped
by available heap (60% of max heap, divided by worker count, divided by an
in-memory factor of ~40x for Parquet decompression).
## Components
- `IcebergSnapshotCompletionHandler` — implements the SPI from
debezium-connector-common. Routes per-chunk events to the streaming writer
and triggers final commit on `onTableSnapshotFinished()`.
- `BatchCommitCoordinator` — accumulates events from CDC streaming path
(legacy fallback when SPI not available).
- `IcebergChangeConsumer.StreamingSnapshotContext` — per-table state holder:
open writer, cached schema converter, calibrated split threshold.
- `IcebergTableOperator.writeChunkToWriter()` / `commitWriter()` — write
without commit / final atomic commit + `CommitResult` for adaptive
calibration.
- `IcebergTableOperator.isSafeTypeChange()` — allows compatible type
evolution (timestamptz↔timestamp, decimal↔double, int↔long) for
pre-existing tables with legacy schemas.
- `StructEventConverter` — cached schema converter constructor, static
`fieldMappingCache` for performance.
- `EventConverter.isSnapshotEvent()` — used to skip equality-delete writes
for READ ops.
- Schema evolution + identifier field protection in
`IcebergTableOperator.applyFieldAddition()` — protect both new schema's
and existing table's identifier fields when key schema is unavailable
(e.g. `key.converter.schemas.enable=false`).
## Throughput / memory impact (production, PostgreSQL 16, 116 tables, ~128M rows)
| Metric | Before (per-batch writer) | After (streaming + adaptive split) |
|-------------------------|---------------------------|-------------------------------------|
| Iceberg writers / table | ~1,500 | 1 (with periodic file splits) |
| Iceberg commits / table | ~1,500 | ~6-10 (one per ~512MB Parquet file) |
| Throughput | ~14K rows/min | ~80-120K rows/min |
| Peak memory / worker | ~1.5 GB | ~200-300 MB |
## Build alignment
Pin `kafka-clients:4.2.0` (matches `connect-runtime:4.2.0` from
`debezium-bom:3.6.0-SNAPSHOT`; the `debezium-server-bom:3.5.0.Final` would
otherwise pull `kafka-clients:4.1.1` which is missing
`ConfigDef$ValidList.anyNonDuplicateValues`).
Pin `httpclient5:5.4.3` to avoid the 5.4.3+5.5 classpath duplication that
caused HEAD-request format issues against some REST catalogs (Lakekeeper).
## Dependencies
This PR depends on debezium/debezium#7362 which introduces the
`SnapshotTableCompletionHandler` SPI in `debezium-connector-common`.
The CI build will fail until that PR is merged and `debezium-bom:3.6.0-SNAPSHOT`
is published.
## Spinoff PRs (already extracted, mergeable independently before this one)
- #695 — Support nested namespaces with dot separator
- #696 — OpenLineage integration and Quarkus management interface
- #698 — Snapshot READ semantics (READ as INSERT, missing __op handling)
- #699 — Critical data loss fix in processTablesInParallel
When those are merged, this PR's diff will shrink to only the streaming
flush changes + build alignment.
Signed-off-by: ivan.senyk <ivan.senyk94@gmail.com>1 parent 61a8857 commit 42bff6e
15 files changed
Lines changed: 1994 additions & 282 deletions
File tree
- debezium-server-iceberg-dist
- debezium-server-iceberg-sink/src/main
- java/io/debezium/server/iceberg
- converter
- mapper
- tableoperator
- resources
- META-INF/services
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
125 | 125 | | |
126 | 126 | | |
127 | 127 | | |
| 128 | + | |
128 | 129 | | |
129 | 130 | | |
130 | 131 | | |
131 | 132 | | |
| 133 | + | |
132 | 134 | | |
133 | 135 | | |
134 | 136 | | |
| |||
Lines changed: 259 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
debezium-server-iceberg-sink/src/main/java/io/debezium/server/iceberg/BatchConfig.java
100644100755Lines changed: 27 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
40 | 67 | | |
41 | 68 | | |
Lines changed: 11 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
92 | 103 | | |
93 | 104 | | |
94 | 105 | | |
| |||
0 commit comments