Skip to content

[FLINK-39149][mysql-cdc] Implement LATEST mode GTID merging to prevent replaying pre-checkpoint transactions with non-contiguous ranges#4286

Merged
lvyanquan merged 3 commits into
apache:masterfrom
Hisoka-X:FLINK-39149-mysql-gtid
Mar 19, 2026
Merged

[FLINK-39149][mysql-cdc] Implement LATEST mode GTID merging to prevent replaying pre-checkpoint transactions with non-contiguous ranges#4286
lvyanquan merged 3 commits into
apache:masterfrom
Hisoka-X:FLINK-39149-mysql-gtid

Conversation

@Hisoka-X
Copy link
Copy Markdown
Member

This close https://issues.apache.org/jira/browse/FLINK-39149

This pull request addresses a critical bug in GTID merging logic for MySQL CDC connectors, specifically in "LATEST" new-channel-position mode. The changes ensure that, when recovering from a checkpoint with non-contiguous GTID ranges, MySQL does not replay pre-checkpoint transactions. The fix is implemented by introducing a new method, computeLatestModeGtidSet, and updating the merging logic to use it.

…t replaying pre-checkpoint transactions with non-contiguous ranges
…t replaying pre-checkpoint transactions with non-contiguous ranges
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes a critical bug (FLINK-39149) in MySQL CDC's GTID merging logic for "LATEST" mode. When recovering from a checkpoint with non-contiguous GTID ranges (e.g., aaa-111:5000-8000 instead of aaa-111:1-8000), the previous implementation would cause MySQL to replay transactions that occurred before the checkpoint, leading to duplicate data processing.

Changes:

  • Introduced GtidUtils.computeLatestModeGtidSet() method to properly handle GTID merging for LATEST mode with non-contiguous ranges
  • Updated MySqlStreamingChangeEventSource.filterGtidSet() in both mysql-cdc and oceanbase-cdc connectors to use the new method
  • Added comprehensive unit and integration tests to prevent regression

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
flink-connector-mysql-cdc/src/main/java/io/debezium/connector/mysql/GtidUtils.java Adds new computeLatestModeGtidSet method that fixes old channels' GTID ranges and uses full server GTID for new channels
flink-connector-mysql-cdc/src/main/java/io/debezium/connector/mysql/MySqlStreamingChangeEventSource.java Updates LATEST mode branch in filterGtidSet to use new computeLatestModeGtidSet method
flink-connector-oceanbase-cdc/src/main/java/io/debezium/connector/mysql/MySqlStreamingChangeEventSource.java Mirrors the mysql-cdc changes for consistency across connectors
flink-connector-mysql-cdc/src/test/java/io/debezium/connector/mysql/GtidUtilsTest.java Adds parameterized tests for various GTID scenarios including purged GTIDs and source filters
flink-connector-mysql-cdc/src/test/java/io/debezium/connector/mysql/FilterGtidSetTest.java New integration test file ensuring the fix works end-to-end through filterGtidSet method
flink-connector-mysql-cdc/pom.xml Adds Mockito 3.12.4 dependency for testing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mielientiev
Copy link
Copy Markdown
Contributor

What happens if in checkpoint gtid is: abc:1-100:105-200 and from server you get abc:1-230?

@Hisoka-X
Copy link
Copy Markdown
Member Author

What happens if in checkpoint gtid is: abc:1-100:105-200 and from server you get abc:1-230?

@mielientiev Restart from abc:1-100:105-200 then mysql will sent abc:101-104:201-230.

Copy link
Copy Markdown
Contributor

@ruanhang1993 ruanhang1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Could this issue be covered by a test to verify the reliability of the code fix?

@Hisoka-X
Copy link
Copy Markdown
Member Author

LGTM. Could this issue be covered by a test to verify the reliability of the code fix?

Covered by
https://github.com/apache/flink-cdc/pull/4286/changes#diff-df50648e2087467ac0dcb5c41f29f16695d090494c35e65ee5b419e48e555950

Copy link
Copy Markdown
Contributor

@lvyanquan lvyanquan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

@lvyanquan lvyanquan merged commit c4b698b into apache:master Mar 19, 2026
17 checks passed
@Hisoka-X Hisoka-X deleted the FLINK-39149-mysql-gtid branch March 19, 2026 13:05
Mrart pushed a commit to Mrart/flink-cdc that referenced this pull request Mar 26, 2026
…t replaying pre-checkpoint transactions with non-contiguous ranges (apache#4286)
ThorneANN pushed a commit to ThorneANN/flink-cdc that referenced this pull request Mar 31, 2026
…t replaying pre-checkpoint transactions with non-contiguous ranges (apache#4286)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants