-
Notifications
You must be signed in to change notification settings - Fork 761
Prevent stale election cleanup from deleting a new leader key #10635
Copy link
Copy link
Open
Labels
affects-7.1This bug affects the 7.1.x(LTS) versions.This bug affects the 7.1.x(LTS) versions.affects-7.5This bug affects the 7.5.x(LTS) versions.This bug affects the 7.5.x(LTS) versions.affects-8.1This bug affects the 8.1.x(LTS) versions.This bug affects the 8.1.x(LTS) versions.affects-8.5This bug affects the 8.5.x(LTS) versions.This bug affects the 8.5.x(LTS) versions.component/electionElection related logic.Election related logic.component/mcsMicroservice.Microservice.severity/majortriage/accepttype/bugThe issue is confirmed as a bug.The issue is confirmed as a bug.
Metadata
Metadata
Assignees
Labels
affects-7.1This bug affects the 7.1.x(LTS) versions.This bug affects the 7.1.x(LTS) versions.affects-7.5This bug affects the 7.5.x(LTS) versions.This bug affects the 7.5.x(LTS) versions.affects-8.1This bug affects the 8.1.x(LTS) versions.This bug affects the 8.1.x(LTS) versions.affects-8.5This bug affects the 8.5.x(LTS) versions.This bug affects the 8.5.x(LTS) versions.component/electionElection related logic.Election related logic.component/mcsMicroservice.Microservice.severity/majortriage/accepttype/bugThe issue is confirmed as a bug.The issue is confirmed as a bug.
Problem
Election recovery can call
DeleteLeaderKeyafter a participant observes that the persisted leader/primary is itself. The current delete removes the election key unconditionally. If a different participant has already written a newer leader/primary key, the stale cleanup can delete that newer key.This can affect PD leader election and microservice primary election, including scheduling service primary discovery. Once the primary key is removed, PD watchers can clear the cached service primary even though service instances are still registered.
Root cause
pkg/election/leadership.godeletesleaderKeywithout comparing the observed key revision or leader value.pkg/member/member.goandpkg/member/participant.goalready read the leader/primarymod_revision, but the revision is not used when cleaning up the key.Expected behavior
Election cleanup should only delete the exact key version that was observed. If the key has already been replaced, cleanup should preserve the new key and return a conflict so the election loop can retry.
Fix direction
mod_revisionwhen member/participant recovery deletes its own stale key.DeleteLeaderKeyguarded by the current leader value for existing callers.