Skip to content

Prevent stale election cleanup from deleting a new leader key #10635

@okJiang

Description

@okJiang

Problem

Election recovery can call DeleteLeaderKey after a participant observes that the persisted leader/primary is itself. The current delete removes the election key unconditionally. If a different participant has already written a newer leader/primary key, the stale cleanup can delete that newer key.

This can affect PD leader election and microservice primary election, including scheduling service primary discovery. Once the primary key is removed, PD watchers can clear the cached service primary even though service instances are still registered.

Root cause

pkg/election/leadership.go deletes leaderKey without comparing the observed key revision or leader value. pkg/member/member.go and pkg/member/participant.go already read the leader/primary mod_revision, but the revision is not used when cleaning up the key.

Expected behavior

Election cleanup should only delete the exact key version that was observed. If the key has already been replaced, cleanup should preserve the new key and return a conflict so the election loop can retry.

Fix direction

  • Add a revision-protected delete helper in the election layer.
  • Use the observed mod_revision when member/participant recovery deletes its own stale key.
  • Keep DeleteLeaderKey guarded by the current leader value for existing callers.
  • Add a regression test proving an old revision cannot delete a newly campaigned leader key.

Metadata

Metadata

Assignees

No one assigned

    Labels

    affects-7.1This bug affects the 7.1.x(LTS) versions.affects-7.5This bug affects the 7.5.x(LTS) versions.affects-8.1This bug affects the 8.1.x(LTS) versions.affects-8.5This bug affects the 8.5.x(LTS) versions.component/electionElection related logic.component/mcsMicroservice.severity/majortriage/accepttype/bugThe issue is confirmed as a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions