Feature Request
Description
Add support for parallel incremental snapshots, with table-level workers running concurrent chunk reads inside a single DBLog watermark window.
Motivation
The current incremental snapshot implementation has two scaling limitations:
- Single-threaded: Tables are processed one at a time, leaving CPU and I/O bandwidth underutilized on multi-core hosts.
- Idle workers in batch-and-wait scheduling: When tables within a parallel batch have different sizes, faster workers sit idle until the slowest table completes before the next batch can start.
Proposed Changes
- Parallel snapshot infrastructure:
ParallelIncrementalSnapshotCoordinator (per-table window buffers + JDBC connection pool + worker activation policy), TableSnapshotContext, TableSnapshotWorker. The per-round worker pool lives in AbstractIncrementalSnapshotChangeEventSource using a fixed-size ExecutorService + CompletionService, scoped to the round and shut down at its end.
- Work-queue scheduler: Replaces batch-and-wait with a shared
ConcurrentLinkedQueue — workers pick the next table immediately upon completing the current one, eliminating cross-table idle time. JDBC connections are held by the worker across all its tables.
- Retry policy with exponential backoff: New
RetryExecutor in debezium-util. Generic over Callable/Runnable, accepts any retryable predicate, exponential backoff with jitter, configurable cap. Used by the parallel chunk round to recover from transient JDBC failures (lock timeout, deadlock victim, broken connection in the pool) without aborting the snapshot.
- PostgreSQL read-only support: Parallel-safe JDBC connections via
createSnapshotConnection() override in PostgresReadOnlyIncrementalSnapshotChangeEventSource.
Backward Compatibility
- Default
snapshot.max.threads=1 preserves original single-threaded behavior
createSnapshotConnection() throws UnsupportedOperationException by default (graceful degradation for connectors that don't yet support parallel snapshots)
- Notification format unchanged (same
aggregateType="Incremental Snapshot" and per-type schema)
- No breaking API changes
Feature Request
Description
Add support for parallel incremental snapshots, with table-level workers running concurrent chunk reads inside a single DBLog watermark window.
Motivation
The current incremental snapshot implementation has two scaling limitations:
Proposed Changes
ParallelIncrementalSnapshotCoordinator(per-table window buffers + JDBC connection pool + worker activation policy),TableSnapshotContext,TableSnapshotWorker. The per-round worker pool lives inAbstractIncrementalSnapshotChangeEventSourceusing a fixed-sizeExecutorService+CompletionService, scoped to the round and shut down at its end.ConcurrentLinkedQueue— workers pick the next table immediately upon completing the current one, eliminating cross-table idle time. JDBC connections are held by the worker across all its tables.RetryExecutorindebezium-util. Generic overCallable/Runnable, accepts any retryable predicate, exponential backoff with jitter, configurable cap. Used by the parallel chunk round to recover from transient JDBC failures (lock timeout, deadlock victim, broken connection in the pool) without aborting the snapshot.createSnapshotConnection()override inPostgresReadOnlyIncrementalSnapshotChangeEventSource.Backward Compatibility
snapshot.max.threads=1preserves original single-threaded behaviorcreateSnapshotConnection()throwsUnsupportedOperationExceptionby default (graceful degradation for connectors that don't yet support parallel snapshots)aggregateType="Incremental Snapshot"and per-type schema)