eventBroker: remove two sgate syncpoint by asddongmen · Pull Request #4807 · pingcap/ticdc

asddongmen · 2026-04-13T06:39:45Z

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Remove two stage syncpoint, wihch is too complicated to maintain.
Add a checkpoint-based cap for scan end when syncpoint is enabled:
scanEnd = min(scanEnd, checkpointTs + multiplier*syncPointInterval).
Default multiplier is 2.
Apply this cap in:
1. normal scan range calculation,
2. pending-DDL local-advance fallback,
3. table-trigger DDL/resolved-ts path.
Add lag-based syncpoint suppression in emitSyncPointEventIfNeeded:
- suppress when lag(sentResolvedTs, checkpointTs) > 20m,
- resume when lag <= 15m (hysteresis),
- always advance nextSyncPoint even when emission is suppressed.
Add debug config knobs:
- sync-point-checkpoint-cap-multiplier (default 2)
- sync-point-lag-suppress-threshold (default 20m)
- sync-point-lag-resume-threshold (default 15m)
Add metrics:
- syncpoint_lag_seconds
- syncpoint_suppressed_count
- scan_capped_by_checkpoint_count
Add focused unit tests for scan capping and syncpoint suppress/resume behavior.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Signed-off-by: dongmen <414110582@qq.com>

- cap scan upper bound by checkpointTs + 2*syncPointInterval when syncpoint is enabled - suppress syncpoint emission when dispatcher lag exceeds threshold, while still advancing nextSyncPoint - resume syncpoint emission with hysteresis to avoid flapping - apply checkpoint cap to normal scan path, pending-DDL local advance, and table-trigger DDL path - add metrics for syncpoint lag, suppression count, and checkpoint-cap hits - add unit tests for checkpoint cap and suppress/resume behavior Signed-off-by: dongmen <414110582@qq.com>

ti-chi-bot · 2026-04-13T06:39:50Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

ti-chi-bot · 2026-04-13T06:39:50Z

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

_{📖 For more info, you can check the "Contribute Code" section in the development guide.}

ti-chi-bot · 2026-04-13T06:39:52Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign flowbehappy for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS
pkg/config/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-04-13T06:39:55Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 425727a6-a327-4742-8e44-1dc3fd20ea89

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request refactors the syncpoint handling logic by removing the two-stage prepare/commit state machine and introducing lag-based suppression and checkpoint-based scan capping. New configuration options and metrics are added to support these features. Review feedback identifies critical issues including a race condition in syncpoint emission, an incorrect timestamp comparison that delays syncpoints, the loss of event type validation in action matching, and a reversal of the required DDL-to-syncpoint emission order.

gemini-code-assist · 2026-04-13T06:42:33Z

downstreamadapter/dispatcher/helper.go


-	pendingIsSyncPoint := b.blockPendingEvent.GetType() == commonEvent.TypeSyncPointEvent
-	return b.blockCommitTs == action.CommitTs && pendingIsSyncPoint == action.IsSyncPoint
+	return b.blockCommitTs == action.CommitTs


The check for action.IsSyncPoint was removed. If a DDL event and a SyncPoint event share the same CommitTs, the dispatcher might incorrectly match an action intended for one event to the other. This can lead to incorrect processing, such as passing a DDL that should have been written to the downstream.

pendingIsSyncPoint := b.blockPendingEvent.GetType() == commonEvent.TypeSyncPointEvent return b.blockCommitTs == action.CommitTs && pendingIsSyncPoint == action.IsSyncPoint

gemini-code-assist · 2026-04-13T06:42:33Z

pkg/eventservice/event_broker.go

+	for d.enableSyncPoint && ts > d.nextSyncPoint.Load() {
 		commitTs := d.nextSyncPoint.Load()
-		if !d.changefeedStat.isSyncPointInCommitStage(commitTs) {
-			if ts <= commitTs {
-				return
-			}
-			d.changefeedStat.tryEnterSyncPointPrepare(commitTs)
-			if !d.changefeedStat.isSyncPointInCommitStage(commitTs) {
-				return
-			}
-		} else if ts < commitTs {
-			return
-		}
-
-		nextSyncPoint := oracle.GoTimeToTS(oracle.GetTimeFromTS(commitTs).Add(d.syncPointInterval))
-		// Advance nextSyncPoint with CAS so concurrent send paths cannot emit the same
-		// syncpoint twice or move nextSyncPoint backward.
-		if !d.nextSyncPoint.CompareAndSwap(commitTs, nextSyncPoint) {
+		d.nextSyncPoint.Store(oracle.GoTimeToTS(oracle.GetTimeFromTS(commitTs).Add(d.syncPointInterval)))


There are two critical issues in this loop logic:

The condition ts > d.nextSyncPoint.Load() prevents a syncpoint from being emitted when the current timestamp ts is exactly equal to the syncpoint timestamp. This causes syncpoints to be delayed until the next event with a strictly greater timestamp arrives, or missed entirely if the stream stops at the syncpoint timestamp. It should be ts >= d.nextSyncPoint.Load().

Using d.nextSyncPoint.Store instead of CompareAndSwap introduces a race condition. Since emitSyncPointEventIfNeeded can be called concurrently (e.g., from a scan worker and the table trigger ticker), multiple goroutines might advance nextSyncPoint simultaneously, leading to duplicate syncpoint emissions.

Suggested change

for d.enableSyncPoint && ts > d.nextSyncPoint.Load() {

commitTs := d.nextSyncPoint.Load()

if !d.changefeedStat.isSyncPointInCommitStage(commitTs) {

if ts <= commitTs {

return

}

d.changefeedStat.tryEnterSyncPointPrepare(commitTs)

if !d.changefeedStat.isSyncPointInCommitStage(commitTs) {

return

}

} else if ts < commitTs {

return

}

nextSyncPoint := oracle.GoTimeToTS(oracle.GetTimeFromTS(commitTs).Add(d.syncPointInterval))

// Advance nextSyncPoint with CAS so concurrent send paths cannot emit the same

// syncpoint twice or move nextSyncPoint backward.

if !d.nextSyncPoint.CompareAndSwap(commitTs, nextSyncPoint) {

d.nextSyncPoint.Store(oracle.GoTimeToTS(oracle.GetTimeFromTS(commitTs).Add(d.syncPointInterval)))

for d.enableSyncPoint {

commitTs := d.nextSyncPoint.Load()

if ts < commitTs {

return

}

nextSyncPoint := oracle.GoTimeToTS(oracle.GetTimeFromTS(commitTs).Add(d.syncPointInterval))

if !d.nextSyncPoint.CompareAndSwap(commitTs, nextSyncPoint) {

continue

}

gemini-code-assist · 2026-04-13T06:42:33Z

pkg/eventservice/event_broker.go

-			return
-		}
-		c.emitSyncPointEventIfNeeded(e.FinishedTs, d, remoteID)
+	c.emitSyncPointEventIfNeeded(e.FinishedTs, d, remoteID)


The order of DDL and SyncPoint emission has been reversed. Previously, DDL events were sent before SyncPoint events if they shared the same timestamp, which is the expected order for the maintainer and dispatcher. Now, emitSyncPointEventIfNeeded is called before sending the DDL event. If e.FinishedTs matches the next syncpoint, the syncpoint will be emitted first (assuming the loop condition is fixed to >=).

asddongmen · 2026-04-13T06:59:00Z

/test all

ti-chi-bot · 2026-04-13T08:48:18Z

@asddongmen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-error-log-review	`e38bc9d`	link	true	`/test pull-error-log-review`
pull-cdc-mysql-integration-light	`e38bc9d`	link	true	`/test pull-cdc-mysql-integration-light`
pull-cdc-storage-integration-heavy	`e38bc9d`	link	true	`/test pull-cdc-storage-integration-heavy`
pull-cdc-mysql-integration-heavy	`e38bc9d`	link	true	`/test pull-cdc-mysql-integration-heavy`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

asddongmen added 3 commits April 13, 2026 13:26

revert tow stage syncpoint

b00d689

Signed-off-by: dongmen <414110582@qq.com>

eventBroker: sent signal resolvedTs for table trigger dispatcher

1600a5c

Signed-off-by: dongmen <414110582@qq.com>

ti-chi-bot bot added the do-not-merge/needs-linked-issue label Apr 13, 2026

ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Apr 13, 2026

ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 13, 2026

gemini-code-assist bot reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eventBroker: remove two sgate syncpoint#4807

eventBroker: remove two sgate syncpoint#4807
asddongmen wants to merge 3 commits intopingcap:masterfrom
asddongmen:0413-remove-two-sgate-syncpoint

asddongmen commented Apr 13, 2026

Uh oh!

ti-chi-bot bot commented Apr 13, 2026

Uh oh!

ti-chi-bot bot commented Apr 13, 2026

Uh oh!

ti-chi-bot bot commented Apr 13, 2026

Uh oh!

coderabbitai bot commented Apr 13, 2026

Review skipped

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

asddongmen commented Apr 13, 2026

Uh oh!

ti-chi-bot bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

asddongmen commented Apr 13, 2026

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Uh oh!

ti-chi-bot bot commented Apr 13, 2026

Uh oh!

ti-chi-bot bot commented Apr 13, 2026

Uh oh!

ti-chi-bot bot commented Apr 13, 2026

Uh oh!

coderabbitai bot commented Apr 13, 2026

Review skipped

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

asddongmen commented Apr 13, 2026

Uh oh!

ti-chi-bot bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant