Skip to content

[FLINK-39207][mysql] Fix mysql cdc could get stuck in backfill binlog reading when reuse snapshot split reader#4311

Merged
lvyanquan merged 4 commits intoapache:masterfrom
chengcongchina:FLINK-39207
Mar 20, 2026
Merged

[FLINK-39207][mysql] Fix mysql cdc could get stuck in backfill binlog reading when reuse snapshot split reader#4311
lvyanquan merged 4 commits intoapache:masterfrom
chengcongchina:FLINK-39207

Conversation

@chengcongchina
Copy link
Contributor

@chengcongchina chengcongchina commented Mar 10, 2026

This closes FLINK-39207.

What is the purpose of the change

This PR fixes a bug where MySqlSourceReader could get stuck in the backfill phase after a failover during the snapshot phase.

When MySqlSourceReader processes multiple snapshot splits sequentially (typically after a failover), it reuses the same SnapshotSplitReader instance. However, the changeEventSourceContext (which controls the running state of the backfill binlog reading) is not properly reset to the "running" state when submitting the next split. This causes the backfill task for the subsequent split to exit immediately upon checking context.isRunning(), leading to the reader hanging indefinitely.

Brief change log

  • Update StoppableChangeEventSourceContext to include a startChangeEventSource() method to reset the isRunning flag to true.
  • Update SnapshotSplitReader#submitSplit to explicitly invoke the context reset method before starting the task, ensuring the reader is in a valid state for the new split.
  • Add a new unit test testMultipleSplitsWithBackfill in SnapshotSplitReaderTest to verify the fix by simulating multiple splits with a forced backfill phase.

Verifying this change

This change is verified by the newly added unit test:

  • SnapshotSplitReaderTest#testMultipleSplitsWithBackfill

This test simulates a scenario where multiple splits are processed sequentially with a backfill phase, ensuring that the second split can be processed correctly without hanging.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@chengcongchina chengcongchina changed the title [FLINK-39200][mysql] Fix mysql cdc could get stuck in backfill binlog reading when reuse snapshot split reader [FLINK-39207][mysql] Fix mysql cdc could get stuck in backfill binlog reading when reuse snapshot split reader Mar 10, 2026
@lvyanquan
Copy link
Contributor

Please check if the failed MySQL test is related to this change.

@chengcongchina
Copy link
Contributor Author

Please check if the failed MySQL test is related to this change.

@lvyanquan Yes, it is. The newly add tests modify the test table, thus the following tests is affected and the assumption of the original records in test table is not satisfied, I'll fix it.

Copy link
Contributor

@lvyanquan lvyanquan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

Copy link
Contributor

@ruanhang1993 ruanhang1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lvyanquan lvyanquan merged commit 78d6adf into apache:master Mar 20, 2026
33 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants