mysql_cdc: chunk large tables across workers via PK-range splitting by ankit481 · Pull Request #4342 · redpanda-data/connect

ankit481 · 2026-04-23T20:50:28Z

Summary

Closes #4341. Builds on #4320.

Adds an opt-in snapshot_chunks_per_table field to mysql_cdc. When left at the default (1) the snapshot flow is unchanged from #4320. When set higher, each table's first primary-key column is probed for MIN/MAX under the shared consistent-snapshot transaction and the resulting integer range is split into N half-open chunks that are dispatched across the existing snapshot_max_parallel_tables worker pool.

This is the intra-table parallelism piece. #4320 unblocks pipelines with many tables; this PR unblocks pipelines dominated by a single very large table — the shape behind the 400M-row reference workload in #4341.

Motivation

Inter-table parallelism alone cannot accelerate a snapshot where one table holds the bulk of the rows. Splitting that table across the worker pool is what closes the gap to AWS DMS.

Target from #4341: 400M rows in ~45 min (1h acceptable). At 16 workers each reading a chunked slice of the PK space, the per-worker throughput needed is ~25M rows/hr, which the existing per-worker code path already achieves on commodity RDS hardware in the observed 30M rows/hr baseline.

Design

Consistency model is unchanged

Every worker transaction is still opened inside the single FLUSH TABLES WITH READ LOCK window established by prepareParallelSnapshotSet. MIN/MAX probing runs inside one of those worker transactions, so boundaries computed during planning agree exactly with the state every worker subsequently reads. The binlog position captured under the lock applies uniformly to every chunk.

No new lock acquisition, no relaxation of isolation, no new handoff with the binlog stream.

Chunking math

For each table:

chunks_per_table <= 1: emit one whole-table unit (no planning query).
First PK column is a supported integer type: compute MIN(pk), MAX(pk), split [MIN, MAX] into N half-open [lo, hi) chunks.
First PK column is non-numeric: emit one whole-table unit and log the fallback reason.

Outermost chunks are open-ended — the first chunk has no lower bound and the last chunk has no upper bound. This guarantees every row in [MIN, MAX] is covered without off-by-one risk and that any row outside [MIN, MAX] under the snapshot is still picked up rather than silently dropped.

Composite primary keys

Chunking partitions on the leading PK column only. Per-chunk keyset pagination inside querySnapshotTable continues to use the full PK tuple, so ordering and pagination remain correct for composite PKs such as (tenant_id, id).

Tradeoff: a skewed leading column produces uneven chunks. Operators with that data shape should leave snapshot_chunks_per_table at 1 and rely on snapshot_max_parallel_tables alone. This is a documented limitation, not a correctness issue — no row is ever read twice, and no row is ever missed.

SQL shape

Example: chunks_per_table=4 on an INT PK with range [0, 100), 2nd chunk, mid-pagination:

SELECT * FROM t
WHERE id >= ? AND id < ? AND (id) > (?)
ORDER BY id
LIMIT ?

Bindings: [25, 50, lastSeenID, limit].

First chunk omits the lower bound. Last chunk omits the upper bound. Middle chunks have both.

Files

internal/impl/mysql/snapshot_chunking.go (new): planSnapshotWork, splitIntRange, buildChunkPredicate, numeric-PK detection via information_schema.columns.
internal/impl/mysql/snapshot.go: querySnapshotTable threads *chunkBounds through the WHERE clause. The existing buildOrderByClause and keyset pagination are untouched.
internal/impl/mysql/input_mysql_stream.go: new snapshot_chunks_per_table field with [1, 256] validation, renamed readSnapshotTable -> readSnapshotWorkUnit, chunking plan runs inside runParallelSnapshot.
internal/impl/mysql/parallel_snapshot.go: distributeTablesToWorkers generalised to distributeWorkToWorkers[T any] so work units of type snapshotWorkUnit use the same fan-out code path as tables did before. Removed the internal workerCount > len(tables) cap — the caller sizes the pool against the expected work-unit count.

Dispatch

startMySQLSync now routes to runParallelSnapshot whenever either snapshot_max_parallel_tables > 1 or snapshot_chunks_per_table > 1. When both are 1 (default) the original sequential path runs unchanged.

Backwards compatibility

Default snapshot_chunks_per_table: 1 produces byte-identical behaviour to #4320.

The config spec adds one Advanced() int field. Existing YAML is unaffected.
runSequentialSnapshot is untouched.
When parallel path is taken with chunks_per_table=1, every work unit has bounds: nil, so querySnapshotTable emits the same WHERE-less query as before (just via a slightly different code path).
Existing integration tests (TestIntegrationMySQLSnapshotAndCDC, TestIntegrationMySQLSnapshotConsistency, TestIntegrationMySQLCDCWithCompositePrimaryKeys, TestIntegrationMySQLCDCSchemaMetadata, TestIntegrationMySQLParallelSnapshot) all pass unchanged.

Tests added

Unit (`snapshot_chunking_test.go`)

Pure-function coverage of the chunking math and SQL predicate:

SingleChunkWhenNLEOne — n of 0, 1, -3 all produce one fully-open chunk.
SingleChunkWhenRangeCollapsed — lo == hi and reversed ranges degenerate to one chunk.
OutermostChunksAreOpenEnded — first chunk lo==nil, last chunk hi==nil.
ChunksCoverAllIntegersExactlyOnce — enumerates every integer in [lo, hi] for several n and asserts single-chunk membership under half-open semantics.
WhenNExceedsSpanStepIsAtLeastOne — short ranges asked for many chunks still cover every value.
LargeSpanDoesNotOverflow — hi-lo near int64 limits, guards the uint64 cast in splitIntRange.
BuildChunkPredicate_* — nil, both-bounds, lower-only, upper-only, fully-open variants produce the expected SQL fragment and arg list.
DistributeWorkToWorkers_SnapshotWorkUnitInstantiation — the generic fan-out helper accepts the new work-unit type and visits every item exactly once.

Existing distributeTablesToWorkers tests continue to pass — they now exercise distributeWorkToWorkers at T = string.

Config (`config_test.go`)

TestConfig_SnapshotChunksPerTable_DefaultAndExplicit — default of 1, explicit 16 round-trips through the spec.
TestConfig_SnapshotChunksPerTable_InvalidValuesRejected — zero, negative, above-cap, and absurdly-large values all violate the constructor's validation predicate.

Integration (`integration_test.go`)

TestIntegrationMySQLChunkedSnapshot — MySQL 8.0 via testcontainers. Creates one INT PK table and one composite (tenant_id, id) PK table, each loaded with 2000 rows. Runs mysql_cdc with snapshot_max_parallel_tables: 4, snapshot_chunks_per_table: 8. Asserts: every row emitted exactly once, no duplicates from overlapping chunk ranges (tracked via a sync.Map of observed PKs), post-snapshot inserts are picked up by the binlog stream.
TestIntegrationMySQLChunkedSnapshotNonNumericPKFallback — VARCHAR PK table with chunks_per_table: 8. Verifies the fallback path reads the whole table without error and emits every row.

Local test results

Unit (whole package, race + shuffle):

ok  internal/impl/mysql  7.170s   (-race -shuffle=on)

Integration — new tests:

--- PASS: TestIntegrationMySQLChunkedSnapshot                         (29.29s)
--- PASS: TestIntegrationMySQLChunkedSnapshotNonNumericPKFallback     (13.29s)
--- PASS: TestIntegrationMySQLParallelSnapshot                        (23.30s)
ok  internal/impl/mysql  32.223s

Integration — existing sequential-path regressions (backwards-compat sanity check):

--- PASS: TestIntegrationMySQLCDCSchemaMetadata                       (16.67s)
--- PASS: TestIntegrationMySQLSnapshotConsistency                     (20.84s)
--- PASS: TestIntegrationMySQLSnapshotAndCDC                          (28.12s)
--- PASS: TestIntegrationMySQLCDCWithCompositePrimaryKeys             (36.05s)
ok  internal/impl/mysql  39.084s

gofmt and go vet clean.

Log excerpt from TestIntegrationMySQLChunkedSnapshot confirming the planner emits 16 work units (2 tables x 8 chunks) across 4 workers, with correct open-ended outermost chunks and full-tuple keyset pagination for composite PKs:

Acquiring table-level read locks for parallel snapshot (4 workers): FLUSH TABLES `single_pk`, `composite_pk` WITH READ LOCK
Parallel snapshot planned: 2 tables -> 16 work units across 4 workers
Querying snapshot: SELECT * FROM single_pk WHERE `id` < ? ORDER BY id LIMIT ?                                                  (first chunk - no lower bound)
Querying snapshot: SELECT * FROM single_pk WHERE `id` >= ? AND `id` < ? ORDER BY id LIMIT ?                                    (middle chunk)
Querying snapshot: SELECT * FROM single_pk WHERE `id` >= ? ORDER BY id LIMIT ?                                                 (last chunk - no upper bound)
Querying snapshot: SELECT * FROM single_pk WHERE `id` >= ? AND `id` < ? AND (id) > (?) ORDER BY id LIMIT ?                     (mid-pagination within a chunk)
Querying snapshot: SELECT * FROM composite_pk WHERE `tenant_id` >= ? AND `tenant_id` < ? AND (tenant_id, id) > (?, ?) ORDER BY tenant_id, id LIMIT ?
starting MySQL CDC stream from binlog mysql-bin.000003 at offset 1218440

Out of scope / follow-ups

Non-numeric first-column PKs (UUID, VARCHAR, binary). Needs sampling-based or OFFSET-based boundary discovery; material complexity best kept behind its own config flag in a future PR.
Intra-table chunk skew handling. The documented workaround (leave chunks_per_table=1) is sufficient for the common case; adaptive partitioning is a separate feature.
Adaptive chunk sizing based on table size. Fixed N is simpler and predictable; adaptive can follow.

Test plan

Run unit tests for internal/impl/mysql with -race -shuffle=on
Run new integration tests for chunked snapshot (single PK, composite PK, non-numeric PK fallback)
Re-run existing sequential-path integration tests for regression
Verify gofmt/go vet cleanliness
Maintainer review, especially of the MIN/MAX + half-open chunk reasoning and the fallback for non-numeric first PK columns
CI integration matrix (MySQL 5.7, 8.0, 8.4 + MariaDB)
Production validation against the 400M-row reference workload from mysql_cdc: single-table snapshots are not parallelised, making very large tables the bottleneck #4341

Adds an opt-in `snapshot_max_parallel_tables` field to the `mysql_cdc` input. When left at the default (`1`) the snapshot flow is the existing single-transaction, single-goroutine path: bit-for-bit unchanged. When set above `1`, N REPEATABLE READ / CONSISTENT SNAPSHOT transactions are opened on independent connections under a single brief FLUSH TABLES ... WITH READ LOCK window. Every worker observes identical state at the same binlog position, and the configured tables are fanned out across the workers via an errgroup. This preserves the existing global consistent-snapshot invariant and the existing fail-halt failure mode, while removing the per-table serial bottleneck for pipelines with many tables. The inner per-table loop is extracted into readSnapshotTable so both paths share identical semantics. The sequential path is moved into runSequentialSnapshot (unchanged body); the parallel path lives in runParallelSnapshot and parallel_snapshot.go.

Defense-in-depth against a mis-typed config value that would otherwise try to open thousands of MySQL connections at snapshot time. 256 sits well above any realistic pipeline (the existing cap at len(tables) is the more common practical bound) and well below the range where a typo (e.g. 10000) would cause a connection storm before MySQLs own max_connections kicked in. Surfaces as a clear configuration error at Connect time rather than a runtime too-many-connections from the server.

Adds an opt-in snapshot_chunks_per_table field to mysql_cdc. When left at its default (1) the snapshot flow is unchanged. When set higher, each table's first primary-key column is probed for MIN and MAX under the shared consistent-snapshot transaction and the resulting integer range is split into N half-open chunks that are dispatched across the existing snapshot_max_parallel_tables worker pool. This is a follow-up to the inter-table parallelism introduced in the mysql_cdc: parallelise snapshot reads across tables change. Inter-table parallelism alone cannot accelerate a snapshot dominated by a single very large table, which is the most common shape for message/event tables. Chunking splits that single-table work across the worker pool instead. Chunking is supported for tables whose first primary-key column is an integer type (tinyint/smallint/mediumint/int/integer/bigint, signed or unsigned). Composite primary keys are supported - chunking partitions on the leading column only, and per-chunk keyset pagination continues to respect the full PK ordering. Tables with non-numeric first PK columns fall back to a whole-table read with an informational log line so mixed workloads keep working. Consistency model is unchanged. All worker transactions still begin under one FLUSH TABLES WITH READ LOCK window so every chunk observes identical state at the same binlog position. Planning runs inside one worker's snapshot transaction so MIN/MAX agree with what every worker subsequently reads. The outermost chunks in each table are open-ended (no lower bound on the first chunk, no upper bound on the last) so rows at the exact MIN/MAX endpoints and any rows outside [MIN, MAX] are captured rather than silently dropped. The fan-out helper (previously distributeTablesToWorkers) is generalised to a generic distributeWorkToWorkers so the parallel path can dispatch chunk-typed work units while the existing fan-out tests keep passing with string inputs. Field cap: snapshot_chunks_per_table is validated at config time to be within [1, 256], matching the pattern established for snapshot_max_parallel_tables. Tests added: - snapshot_chunking_test.go: splitIntRange coverage and overflow, buildChunkPredicate shapes, and generic fan-out against snapshotWorkUnit. - config_test.go: default, explicit, and out-of-range values for snapshot_chunks_per_table. - integration_test.go: TestIntegrationMySQLChunkedSnapshot exercises an int PK table and a composite (int, int) PK table with chunks=8 and asserts no duplicates across overlapping chunk ranges; TestIntegrationMySQLChunkedSnapshotNonNumericPKFallback confirms the VARCHAR-PK fallback reads the whole table without error.

ankit481 added 3 commits April 20, 2026 23:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mysql_cdc: chunk large tables across workers via PK-range splitting#4342

mysql_cdc: chunk large tables across workers via PK-range splitting#4342
ankit481 wants to merge 3 commits intoredpanda-data:mainfrom
ankit481:feat/mysql-cdc-snapshot-chunking

ankit481 commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ankit481 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Design

Consistency model is unchanged

Chunking math

Composite primary keys

SQL shape

Files

Dispatch

Backwards compatibility

Tests added

Unit (snapshot_chunking_test.go)

Config (config_test.go)

Integration (integration_test.go)

Local test results

Out of scope / follow-ups

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ankit481 commented Apr 23, 2026 •

edited

Loading

Unit (`snapshot_chunking_test.go`)

Config (`config_test.go`)

Integration (`integration_test.go`)