Skip to content

Conversation

@roman-mazhut
Copy link
Contributor

This PR adds automatic validation of leader placement in the aggregator election manager to prevent scenarios where removed leaders continue operating and block new node elections.

Problem

When a leader node is removed from placement but the process continues running, it can cause new nodes to get stuck in PendingFollowerState indefinitely. This happens because:

  1. The removed leader continues to report itself as the active leader
  2. New nodes detect this "leader" but find it's not in their placement
  3. The verification logic in verifyPendingFollower expects leader changes, but none occur
  4. New nodes remain stuck and cannot transition to FollowerState
VLGzRyCu3DtrAuXEDj0amsq5ScXxJN23MmINZ5qOcxQ5o98Xoidbtn-joFd05TriyF7nyJr9hnpGXw4pYW_QzkYn0swnSYTj6wZHCDrLAgfnC67jRFZ2Zk1lCaSAkaIb120VkaSRuEcjwhHz0cHu-_XyB6qCbjIHV6x97tL9tpQZK5Pme3F3ef_AY_bydaUApIF1ob0PjBC_zrEgygr_zTXV45Ra8VFKkIDLK7lNmNLi_ctV

Solution

Added periodic leader placement validation that:

  • Runs automatically: New validateLeaderInPlacementLoop() goroutine checks leader validity every campaignStateCheckInterval
  • Validates placement: Confirms current leader exists in placement via placementManager.Placement()
  • Auto-resigns invalid leaders: Automatically calls Resign() when leader is not found in placement
  • Triggers re-election: Allows proper leader election with valid placement members
  • Only affects leaders: Validation only runs when current instance is in LeaderState

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant