debezium/dbz#1829 Add parallel incremental snapshot for relational connectors#7362
debezium/dbz#1829 Add parallel incremental snapshot for relational connectors#7362MrIvv wants to merge 2 commits into
Conversation
4ec06eb to
ce4c456
Compare
|
Welcome as a new contributor to Debezium, @MrIvv. Reviewers, please add missing author name(s) and alias name(s) to the COPYRIGHT.txt and Aliases.txt respectively. |
|
Hi @MrIvv, thanks for your contribution. Please prefix the commit message(s) with the debezium/dbz#xxx GitHub issue key. |
1 similar comment
|
Hi @MrIvv, thanks for your contribution. Please prefix the commit message(s) with the debezium/dbz#xxx GitHub issue key. |
ce4c456 to
b122b28
Compare
|
Hi @MrIvv. Thank you for your valuable contribution. |
2 similar comments
|
Hi @MrIvv. Thank you for your valuable contribution. |
|
Hi @MrIvv. Thank you for your valuable contribution. |
|
Hi @MrIvv, there appears to be a lot of overlap with the new Chunked Initial Snapshot behavior, just refactored to its own classes. Can you see if we can reuse that somehow? Also please prefix all your commits as debezium/dbz#1829 |
b122b28 to
f6ca68b
Compare
|
Hi @Naros, you're right. I'm already working on the refactoring and I'll update this pr soon |
f6ca68b to
b5c046f
Compare
|
Hi @MrIvv. Thank you for your valuable contribution. |
b5c046f to
efbd211
Compare
|
@MrIvv before you get too far along, it appears your PR is reintroducing debezium-core, which it should not. You may want to rebase on the latest main so that the classes that have been recently reorganized into various new modules are not showing as new files in the PR. |
7686f91 to
1b25f33
Compare
|
Hi @MrIvv. Thank you for your valuable contribution. |
ca0be4c to
900b28e
Compare
|
Hi @MrIvv. Thank you for your valuable contribution. |
|
Hi @MrIvv, thanks for your contribution. Please prefix the commit message(s) with the debezium/dbz#xxx GitHub issue key. |
900b28e to
f235572
Compare
|
Hi @MrIvv. Thank you for your valuable contribution. |
f9a73d9 to
4069780
Compare
6cc9c98 to
c1055b7
Compare
…splitting, and data loss fix Streaming Flush Architecture: - Persistent IcebergWriter per table throughout incremental snapshot lifecycle - Chunked writes via SnapshotTableCompletionHandler SPI from debezium-core - Adaptive file splitting calibrated to actual data size and available memory - BatchCommitCoordinator for coordinated Iceberg commits across parallel workers Performance: - Throughput: ~14K -> ~80-120K rows/min - Peak memory: ~1.5GB -> ~200-300MB per worker - Parallel multi-table snapshot processing Critical Bug Fix: - processTablesInParallel() was catching exceptions without re-throwing, allowing Debezium to commit offsets despite failed writes, causing permanent silent data loss Schema Evolution Fixes: - Handle optional PK fields from CDC schema in applyFieldAddition - requireColumn for identifier fields widened to optional by unionByNameWith - Treat READ operations as direct INSERT in BaseDeltaTaskWriter - Evolve required fields to optional to prevent Parquet NPE on NULL values Additional: - Upgrade Debezium dependency from 3.3.1.Final to 3.6.0-SNAPSHOT - Support nested namespaces with dot separator for Iceberg catalog - Apply SMT chain to snapshot events for type consistency with CDC path - Consumer done flag for external report watcher lifecycle - OpenLineage output dataset emission integration - Quarkus management interface enabled at build time Depends-on: debezium/debezium#7362 (SPI classes for multi-threaded snapshot) Tested: PostgreSQL 16, 116 tables, ~128M rows, zero data loss
5bb395b to
bb67484
Compare
…trics, RetryExecutor Addresses code review feedback from @Naros on PR debezium#7362: - Remove dead code: RecordTransformer (deleted), DebeziumEventFactory (deleted entirely — its single wrapSourceRecord() was redundant with ConverterBuilder.toFormat() configured with Connect format, which already does the same raw SourceRecord -> ChangeEvent wrap) - Move IncrementalSnapshotRetryPolicy to debezium-util as RetryExecutor, generalized (no SQLException-specific handling) and reusable across Debezium components - Extract config defaults to public static final constants with Field::isPositiveInteger / Field::isPositiveLong validation on all numeric properties - Lower DEFAULT_INCREMENTAL_SNAPSHOT_BATCH_FLUSH_SIZE from 20_000 to 5_000 for safety on wide-row tables (VARCHAR/TEXT/JSONB rows can saturate worker memory at higher batch sizes) - Wire PostProcessorRegistry and SnapshotProgressListener into TableSnapshotWorker so post-processors and metrics now run on snapshot events the same as on CDC events - Refactor SnapshotRecordBuilder to use SnapshotChangeRecordEmitter + EventDispatcher.emitReadRecord() (same path as the chunked initial snapshot) instead of a custom flat builder; the builder shrinks from 126 to 66 lines and is now a thin envelope wrapper - Revert unrelated debezium-api/pom.xml change (restore "Used for unit testing with Kafka" comment block) The remaining SnapshotTableCompletionHandler SPI exists solely for the onTableSnapshotFinished(tableName) callback — a per-table boundary signal that EventDispatcher does not currently expose. It enables sinks with batched-write semantics to switch from writer-per-chunk to writer-per-table-with-periodic-split. Concrete consumer implementation: memiiso/debezium-server-iceberg#693, which now uses ConverterBuilder.toFormat() to wrap raw SourceRecords into change events. Signed-off-by: ivan.senyk <ivan.senyk94@gmail.com>
| if (value instanceof SpecialValueDecimal) { | ||
| value = ((SpecialValueDecimal) value).getDecimalValue().orElse(null); |
There was a problem hiding this comment.
| if (value instanceof SpecialValueDecimal) { | |
| value = ((SpecialValueDecimal) value).getDecimalValue().orElse(null); | |
| if (value instanceof SpecialValueDecimal specialValueDecimal) { | |
| value = specialValueDecimal.getDecimalValue().orElse(null); |
| * @param <P> the type of partition | ||
| * @param <T> the type of data collection identifier | ||
| */ | ||
| public class ParallelIncrementalSnapshotCoordinator<P extends Partition, T extends DataCollectionId> { |
There was a problem hiding this comment.
Take a look at RelationalSnapshotChangeEventSource, specifically ThreadedSnapshotExecutor and PooledWork. It seems here you're mixing concerns, and while having a coordinator that faciliates a higher-abstraction may make sense, perhaps reusing the lower-level components here reduces code duplication?
|
Hi @Naros, force-pushed after rebase on latest main. The branch is now 5 commits. Summary of what changed in response to the review:
The remaining DCO, email, and commit prefix issues from the previous push are also fixed. |
…le splitting This commit implements the streaming snapshot flush pattern for the Iceberg sink. Combined with the parallel incremental snapshot SPI introduced in debezium/debezium#7362, it dramatically reduces commit overhead and memory pressure during snapshot of large tables. ## Streaming snapshot flush Instead of creating a new Iceberg writer for every batch (5K-20K rows), keep a single writer open per table for the entire snapshot. The writer accumulates data across chunks and produces a single atomic commit at table completion. Periodic file splitting kicks in when the writer reaches a calibrated row threshold, producing ~512MB Parquet files. After the first split-commit, the threshold is recalibrated from actual file size (bytes-per-row) and clamped by available heap (60% of max heap, divided by worker count, divided by an in-memory factor of ~40x for Parquet decompression). ## Components - `IcebergSnapshotCompletionHandler` — implements the SPI from debezium-connector-common. Routes per-chunk events to the streaming writer and triggers final commit on `onTableSnapshotFinished()`. - `BatchCommitCoordinator` — accumulates events from CDC streaming path (legacy fallback when SPI not available). - `IcebergChangeConsumer.StreamingSnapshotContext` — per-table state holder: open writer, cached schema converter, calibrated split threshold. - `IcebergTableOperator.writeChunkToWriter()` / `commitWriter()` — write without commit / final atomic commit + `CommitResult` for adaptive calibration. - `IcebergTableOperator.isSafeTypeChange()` — allows compatible type evolution (timestamptz↔timestamp, decimal↔double, int↔long) for pre-existing tables with legacy schemas. - `StructEventConverter` — cached schema converter constructor, static `fieldMappingCache` for performance. - `EventConverter.isSnapshotEvent()` — used to skip equality-delete writes for READ ops. - Schema evolution + identifier field protection in `IcebergTableOperator.applyFieldAddition()` — protect both new schema's and existing table's identifier fields when key schema is unavailable (e.g. `key.converter.schemas.enable=false`). ## Throughput / memory impact (production, PostgreSQL 16, 116 tables, ~128M rows) | Metric | Before (per-batch writer) | After (streaming + adaptive split) | |-------------------------|---------------------------|-------------------------------------| | Iceberg writers / table | ~1,500 | 1 (with periodic file splits) | | Iceberg commits / table | ~1,500 | ~6-10 (one per ~512MB Parquet file) | | Throughput | ~14K rows/min | ~80-120K rows/min | | Peak memory / worker | ~1.5 GB | ~200-300 MB | ## Build alignment Pin `kafka-clients:4.2.0` (matches `connect-runtime:4.2.0` from `debezium-bom:3.6.0-SNAPSHOT`; the `debezium-server-bom:3.5.0.Final` would otherwise pull `kafka-clients:4.1.1` which is missing `ConfigDef$ValidList.anyNonDuplicateValues`). Pin `httpclient5:5.4.3` to avoid the 5.4.3+5.5 classpath duplication that caused HEAD-request format issues against some REST catalogs (Lakekeeper). ## Dependencies This PR depends on debezium/debezium#7362 which introduces the `SnapshotTableCompletionHandler` SPI in `debezium-connector-common`. The CI build will fail until that PR is merged and `debezium-bom:3.6.0-SNAPSHOT` is published. ## Spinoff PRs (already extracted, mergeable independently before this one) - memiiso#695 — Support nested namespaces with dot separator - memiiso#696 — OpenLineage integration and Quarkus management interface - memiiso#698 — Snapshot READ semantics (READ as INSERT, missing __op handling) - memiiso#699 — Critical data loss fix in processTablesInParallel When those are merged, this PR's diff will shrink to only the streaming flush changes + build alignment. Signed-off-by: ivan.senyk <ivan.senyk94@gmail.com>
But I would argue this is accomplishable today without this SPI. What I don't understand is why the Iceberg ' ChangeEventConsumer ' couldn't subscribe to one of the pre-existing notification channels about initial, blocking, and incremental snapshots, and alternate between the current event-batch handler logic and a persistent writer mode by table? For example, Debezium emits "Incremental Snapshot started for Table ABC". The Iceberg sink receives it and adds the table "ABC" to its list using a persisted writer. As batches of events come in for table ABC, you use the persisted writer for them; they are buffered in memory or on disk, depending on your use case, and acknowledge receipt of the events, just like any other. The only difference is they're not yet written to the target system. Once Debezium emits "Incremental Snapshot finished for Table ABC", you flush the writer to disk, close it, and transition Table ABC back to the normal batch mode it operates in today. The only question is whether or not there is currently a place in a sink for it to register for such events, e.g. is Debezium available for it to register at a given good point. Obviously, this could be done lazily in the This keeps what I believe are sink-related aspects where they belong, on the sink and not the producer. What you are doing right now is introducing the ability to wire something in, potentially in a non-DS environment, that won't work and makes no sense. The connector code itself should be runtime-agnostic in this regard, which is why I believe moving this work to the sink and using notifications to manage the writes makes far more architectural sense. What do you think? |
I talked with a colleague, and he reminded me about Debezium Server's lifecycle management. There is already |
Hi @MrIvv , thank you for your contribution. I went through your issue and pull request and based on your motivation:
as mentioned by @Naros there is the custom notification channel that can help your motivation: you can observe the event if you can give a try and check if it fits your motivation, it may help to reduce the complexity of the contribution. What do you think? |
|
Hello @kmos , your suggestion is amazing! It’s a great point that simplifies things a lot. I’ll update the code today to include these parts as well |
bb67484 to
2690d4a
Compare
|
Hi @Naros and @kmos , thanks for the patient review and for pointing me toward the right pattern. |
2690d4a to
5a79f69
Compare
…le splitting
Adds a streaming flush mode for the Iceberg sink that keeps a single
persistent writer per data collection across an entire incremental
snapshot, replacing the writer-per-chunk pattern that produced one
small Parquet file per chunk and inflated catalog metadata.
A persistent writer is opened on the first chunk of a table and stays
open across subsequent chunks, accumulating events in memory or
spilling to a rolling temporary file when adaptive thresholds are
crossed (rows-per-file, bytes-per-file, time-since-first-row). The
writer is committed and closed on the corresponding TABLE_SCAN_COMPLETED
notification emitted by Debezium core.
Per-table completion is consumed via the standard Debezium notification
channels (`SinkNotificationChannel`) — no producer-side SPI is added.
The consumer subscribes at startup, filters for
`aggregateType="Incremental Snapshot"` and
`type=TABLE_SCAN_COMPLETED`, and finalizes the per-table writer using
the `scanned_collection` field. JSON-serialized notifications (the
default with `debezium.format.value=json`) are unwrapped from the
`{schema, payload}` envelope before reading the fields.
A read-only REST endpoint `/v1/snapshot-status/incremental` exposed
on Quarkus' management interface (port 9000) reports per-table progress
(rows, files, bytes committed; current/in-progress tables) so external
orchestrators can observe completion without parsing the offset file.
The status snapshot tracks both the streaming-writer path and the
direct-commit path (via per-table counters) so the endpoint surfaces a
correct view regardless of which write path is in use.
This PR depends on the parallel incremental snapshot work in
debezium/debezium#7362 — that PR introduces the parallel scheduler
and the per-table TABLE_SCAN_COMPLETED notification semantics this
sink consumes. Until that PR is merged and a Debezium snapshot is
published, this PR cannot build against an upstream artifact and must
be built with a locally-installed `debezium-core` snapshot.
Signed-off-by: ivan.senyk <ivan.senyk94@gmail.com>
| private boolean createDataEventsForTable(P partition, JdbcConnection connection) throws SQLException { | ||
| JdbcConnection originalConnection = this.jdbcConnection; | ||
| try { | ||
| this.jdbcConnection = connection; | ||
| return createDataEventsForTable(partition); | ||
| } | ||
| finally { | ||
| this.jdbcConnection = originalConnection; | ||
| } | ||
| } |
There was a problem hiding this comment.
Are we sure this is thread-safe and that at no point two threads won't fight for what connection is the member variable connection here? I think it's much safer to pass the desired connection through the call where it's needed rather than relying on this hacky swap approach.
| public void shutdown() { | ||
| if (parallelCoordinator != null) { | ||
| LOGGER.info("Shutting down parallel incremental snapshot coordinator"); | ||
| parallelCoordinator.shutdown(); | ||
| } | ||
| } |
There was a problem hiding this comment.
I don't see where this is ever called.
| private final List<JdbcConnection> allConnections; | ||
| private final Map<T, TableSnapshotWorker<P, T>> activeWorkers; | ||
| private final Queue<DataCollection<T>> pendingTables; | ||
| private final List<T> completedTables; |
There was a problem hiding this comment.
This ArrayList is written to by multiple threads and isn't guarded.
| protected boolean shouldUseParallelRead() { | ||
| if (parallelCoordinator == null) { | ||
| return false; | ||
| } | ||
|
|
||
| int remainingTables = context.dataCollectionsToBeSnapshottedCount(); | ||
| if (remainingTables < 2) { | ||
| LOGGER.trace("[{}] Only {} table(s) remaining, using sequential read", | ||
| Thread.currentThread().getName(), remainingTables); | ||
| return false; | ||
| } | ||
|
|
||
| LOGGER.debug("[{}] Multiple tables available ({} remaining), using parallel read", | ||
| Thread.currentThread().getName(), remainingTables); | ||
| return true; | ||
| } |
| protected final NotificationService<P, ? extends OffsetContext> notificationService; | ||
| protected ParallelIncrementalSnapshotCoordinator<P, T> parallelCoordinator; | ||
| protected final RetryExecutor retryPolicy; | ||
| protected final PostProcessorRegistry postProcessorRegistry; |
There was a problem hiding this comment.
The PostProcessorRegistry is set but never used, so it can be removed.
| try { | ||
| effectiveConnection = createSnapshotConnection(); | ||
| createdOnDemandConnection = true; | ||
| } | ||
| catch (UnsupportedOperationException e) { | ||
| LOGGER.trace("createSnapshotConnection not supported, using default connection"); | ||
| } | ||
| catch (Exception e) { | ||
| LOGGER.debug("Could not create snapshot connection, using default: {}", e.getMessage()); | ||
| } | ||
| } |
There was a problem hiding this comment.
Can we avoid relying on an exception here to use the default connection? I assume this can be easily controlled by a flag or an abstract method that dictates whether the incremental snapshot operates in concurrent or single-threaded mode without the need for exception handling.
| protected boolean validateRestoredContext(IncrementalSnapshotContext<T> context) { | ||
| try { | ||
| java.lang.reflect.Method method = context.getClass().getMethod("validateRestoredContext"); | ||
| Boolean result = (Boolean) method.invoke(context); | ||
| return result != null ? result : true; | ||
| } |
There was a problem hiding this comment.
What purpose does this logic serve now? This seems unnecessary is it not?
| public static final Field INCREMENTAL_SNAPSHOT_RETRY_INITIAL_DELAY_MS = Field.create("incremental.snapshot.retry.initial.delay.ms") | ||
| .withDisplayName("Incremental snapshot retry initial delay (ms)") | ||
| .withType(Type.LONG) | ||
| .withWidth(Width.SHORT) | ||
| .withImportance(Importance.LOW) | ||
| .withDescription("Initial backoff delay before retrying a failed chunk read.") | ||
| .withDefault(DEFAULT_INCREMENTAL_SNAPSHOT_RETRY_INITIAL_DELAY_MS); | ||
|
|
||
| public static final Field INCREMENTAL_SNAPSHOT_RETRY_MAX_DELAY_MS = Field.create("incremental.snapshot.retry.max.delay.ms") | ||
| .withDisplayName("Incremental snapshot retry max delay (ms)") | ||
| .withType(Type.LONG) | ||
| .withWidth(Width.SHORT) | ||
| .withImportance(Importance.LOW) | ||
| .withDescription("Upper bound on the exponential backoff delay between retries.") | ||
| .withDefault(DEFAULT_INCREMENTAL_SNAPSHOT_RETRY_MAX_DELAY_MS); |
There was a problem hiding this comment.
If these should be always positive, they should have validation applied.
| * Returns true if any per-table window buffer (parallel mode) or the | ||
| * legacy single window (sequential mode) currently holds events. Required | ||
| * because in parallel mode the legacy {@code window} field is always empty | ||
| * — chunk reads write into the coordinator's per-table buffers — so a |
There was a problem hiding this comment.
The use of emdashes is very AI-centric, let's avoid these. In fact, our AGENTS.md specifically mention to avoid these types of characters, among others. I'd suggest if you're using AI that it adhere or our AI Policy and Agent rules.
| final T dataCollectionId = context.currentDataCollectionId().getId(); | ||
| final Map<Struct, Object[]> currentWindow = getWindowForDataCollection(dataCollectionId); | ||
|
|
||
| LOGGER.debug("[{}] Sending {} events from window buffer for table {}", |
There was a problem hiding this comment.
There are a number of use cases where Thread.currentThread().getName() is used even on the single-threaded code path, which only adds noise and I believe is less useful than the actual parallel thread paths. Can we clean these up?
…nnectors Adds support for snapshotting multiple data collections in parallel during incremental snapshots, bounded by snapshot.max.threads. The JDBC connection pool opens on demand when a signal arrives and is released after a configurable grace period (incremental.snapshot.pool.release.delay.ms, default 60000), so back-to-back signal bursts reuse the same pool. Pooled connections are validated on borrow and reallocated lazily. The chunk read path is connection-threaded end to end and wrapped in a RetryExecutor with exponential backoff to absorb transient JDBC failures. The retry budget and the pool release delay are documented per connector in the relational AsciiDoc pages. See the PR description for the full design notes. Signed-off-by: ivan.senyk <ivan.senyk94@gmail.com>
5a79f69 to
0a3abf1
Compare
MrIvv
left a comment
There was a problem hiding this comment.
Hello Naros, I've made a few changes to improve the thread safety of this code, mainly focusing on the lifecycle of the connections
|
/packit test --labels oracle |
1304d44 to
74b2f67
Compare
|
Hello @Naros, I reviewed all CI jobs and found some regressions in my code. |
|
Thanks for the update @MrIvv, I've started CI. I'll review it once CI finishes. |
|
Hi @MrIvv can you just check that the failure in MariaDB isn't related. |
…ycle + Null-guard `currentDataCollectionId()` deref in `sendWindowEvents()` and `rereadChunk()`; the field can be null after the snapshot has advanced past the last collection. * `TableSnapshotWorker` keyless tables now fail fast with a diagnostic instead of routing to a broken `readKeylessTable`; DBLog window cannot dedup keyless rows. - Drop `readKeylessTable` and the `keylessTableRead` flag. + Wire `IncrementalSnapshotChangeEventSource.shutdown()` from `EventDispatcher.close()` so parallel coordinator resources are released on connector stop. Interface gets a default no-op for backward compatibility. * `TableSnapshotWorker.isChunkPositionComplete` now uses `Arrays.equals` instead of a per-key `compareTo` loop. The previous comparison was type-dependent (e.g. `UUID.compareTo` is signed-long-based while Postgres `ORDER BY uuid` is unsigned-lex), which caused premature completion after the first chunk on UUID-keyed tables. Signed-off-by: ivan.senyk <ivan.senyk94@gmail.com>
74b2f67 to
9cc2ae6
Compare
|
Hi @Naros, On the MariaDB CI failure (BinlogReadOnlyIncrementalSnapshotIT.filteredEvents) I do not believe it is caused by this PR. The stack is entirely in MariaDB-specific code that this PR does not touch: NPE: this.highWatermark is null The exception is swallowed by the catch in AbstractIncrementalSnapshotChangeEventSource.addDataCollectionNamesToSnapshot ("Error while executing incremental snapshot ... skipping and continuing streaming"), so the snapshot is silently abandoned and the consumer Happy to dig deeper if useful |
Fixes
debezium/dbz#1829
Summary
This PR adds table-level parallelism to incremental snapshots. Multiple data collections in a signal are scanned concurrently within a single DBLog watermark window, bounded by
snapshot.max.threads. Per-chunk deduplication semantics are preserved.Supporting changes: a retry policy with exponential backoff for transient JDBC failures during chunk reads, and a work-queue scheduler that keeps workers busy across the whole signal.
Motivation
Today's incremental snapshot scans tables strictly sequentially. On any reasonable hardware (multi-core, bonded NICs, fast storage) the bottleneck is one slow table at a time, not the total work — leaving CPU and JDBC bandwidth idle while the connector waits.
This PR removes that bottleneck without changing the semantics of the snapshot itself.
Changes
1. Parallel table scan during incremental snapshot
AbstractIncrementalSnapshotChangeEventSourceruns each round (one DBLog watermark window) acrosssnapshot.max.threadsworker threads, each holding its ownTableSnapshotContextand JDBC connection. The window is opened once for the whole round and closed once at the end — workers do not contend on the watermark, they only contend on the per-table chunk position.SignalBasedIncrementalSnapshotChangeEventSourceandPostgresReadOnlyIncrementalSnapshotChangeEventSourceare wired to use the parallel coordinator whensnapshot.max.threads > 1and degrade to the existing sequential path when= 1(default).2. Work-queue scheduling (no batch-and-wait idle)
When the signal carries N tables and the pool has K workers, all N tables go into a
ConcurrentLinkedQueue. Each worker picks the next table immediately upon completing its current one — noawaitCompletion()between sub-batches.awaitCompletion()callsworkerCounttotal3. New helper:
ParallelIncrementalSnapshotCoordinatorOwns the JDBC connection pool, the persistent worker executor, and the per-table window buffers. Pool lifecycle is on-demand:
snapshot.max.threadsJDBC connections.incremental.snapshot.pool.release.delay.ms, default60_000) once no signal is in flight. A0value releases immediately.AtomicLonggeneration counter aborts a scheduled close if a new signal arrives mid-grace.Pooled connections are validated on borrow via
JdbcConnection.isValid()and lazily reallocated on failure.CopyOnWriteArrayListtracks live connections so concurrent eviction stays safe. The worker executor is a singleThreads.newFixedThreadPoolfor the whole snapshot lifetime; it is shut down only when the coordinator itself terminates.Connection threading is end-to-end:
ChunkQueryBuilder.readTableChunkStatementnow takes the worker'sJdbcConnectionas a parameter, so workers do not contend on a shared field or perform connection-swap dances mid-chunk.The coordinator is annotated
@NotThreadSafefor state changes and uses aReentrantLock+AtomicReferencepattern (consistent withBaseSourceTask) for the public lifecycle methods (ensurePoolOpen,scheduleReleaseIfNotPending,shutdown).4. Retry policy with exponential backoff:
RetryExecutorNew utility in
debezium-util. Generic overCallable/Runnable, accepts any retryable predicate, exponential backoff with jitter, configurable cap. Used by the parallel chunk round to recover from transient JDBC failures (lock timeout, deadlock victim, broken connection in the pool) without aborting the whole snapshot.Configuration
snapshot.max.threads1= sequential (no behavior change).> 1= parallel incremental + chunked snapshotincremental.snapshot.pool.release.delay.ms0releases immediately.incremental.snapshot.retry.max.attemptsincremental.snapshot.retry.initial.delay.msincremental.snapshot.retry.max.delay.msincremental.snapshot.retry.backoff.multiplierAll five new properties carry
DEFAULT_*public static finalconstants. A newField::isPositiveDoublevalidator is added for the backoff multiplier; the other properties use existing validators.Testing
snapshot.max.threads=2, chunk size 20000. The parallel coordinator initializes the pool on first signal arrival, reuses it across back-to-back signal bursts, and releases it cleanly once the queue drains past the grace period. Workers stay continuously busy via the shared queue — verified zero idle time between sub-batches in the live logs. No leaked connections or scheduled-release races observed across multiple recovery cycles.Backward compatibility
snapshot.max.threadsdefaults to1→ no behavior change unless explicitly opted inaggregateType="Incremental Snapshot"+ same per-type schema)AI usage disclosure
I used Claude (Anthropic) as a code assistant and for troubleshooting during this work. I reviewed every change, ran the test suite locally, and validated the runtime behaviour. I take full responsibility for the code submitted.