During maintainer failover, TiCDC may create duplicate dispatchers for the same table span and startTs. One dispatcher is owned by the new maintainer, while another appears to be a delayed/orphaned dispatcher request from the previous maintainer. The orphan dispatcher is removed after the new maintainer observes it, but it may already have entered Working state and pushed DMLs into the sink, causing downstream write conflicts.
In the incident around 2026-05-11 23:08:45, new maintainer fba6e585-1710-4b74-9102-fcae45514fff bootstrapped with nodeCount=1, spanCount=17, and checkpoint/startTs 466233824429998495.
For tableID 121 (workload.sbtest20), the current maintainer created dispatcher 36949683455125699276247693423456213681. Another dispatcher, 106138415981049979221359258994415068339, was later created for the same full span and same startTs, but no corresponding maintainer span/operator was found. Maintainer then logged no span found, remove it, while the dispatcher still had tableProgressLen=378.
For tableID 142 (workload.sbtest27), the same pattern happened between dispatcher 354956623857882180211263781318113896091 and orphan dispatcher 1172258264747904389658988347929528985; the orphan had tableProgressLen=380 when removal started.
Shortly after both duplicated dispatcher pairs handshook, TiCDC reported downstream Error 9007 Write conflict and retried DMLs.
Expected behavior: for one changefeed/mode/table span, only one active dispatcher should be able to write to the downstream sink during failover.
Suspected cause: bootstrap only reconstructs state from alive dispatcher managers. A delayed create request from the previous maintainer can still be processed by dispatcher manager after the new maintainer has already recreated the same table span. There is no global fence preventing the orphan dispatcher from writing before it is detected and drained.
During maintainer failover, TiCDC may create duplicate dispatchers for the same table span and
startTs. One dispatcher is owned by the new maintainer, while another appears to be a delayed/orphaned dispatcher request from the previous maintainer. The orphan dispatcher is removed after the new maintainer observes it, but it may already have enteredWorkingstate and pushed DMLs into the sink, causing downstream write conflicts.In the incident around
2026-05-11 23:08:45, new maintainerfba6e585-1710-4b74-9102-fcae45514fffbootstrapped withnodeCount=1,spanCount=17, and checkpoint/startTs466233824429998495.For tableID
121(workload.sbtest20), the current maintainer created dispatcher36949683455125699276247693423456213681. Another dispatcher,106138415981049979221359258994415068339, was later created for the same full span and same startTs, but no corresponding maintainer span/operator was found. Maintainer then loggedno span found, remove it, while the dispatcher still hadtableProgressLen=378.For tableID
142(workload.sbtest27), the same pattern happened between dispatcher354956623857882180211263781318113896091and orphan dispatcher1172258264747904389658988347929528985; the orphan hadtableProgressLen=380when removal started.Shortly after both duplicated dispatcher pairs handshook, TiCDC reported downstream
Error 9007 Write conflictand retried DMLs.Expected behavior: for one changefeed/mode/table span, only one active dispatcher should be able to write to the downstream sink during failover.
Suspected cause: bootstrap only reconstructs state from alive dispatcher managers. A delayed create request from the previous maintainer can still be processed by dispatcher manager after the new maintainer has already recreated the same table span. There is no global fence preventing the orphan dispatcher from writing before it is detected and drained.