An Improvement Idea for Leader Election in the Raft Algorithm #19559

ss-torres · 2025-03-09T13:44:20Z

ss-torres
Mar 9, 2025

For the Raft algorithm, the leader election process is as follows:

Node Roles
Nodes in the Raft algorithm have three roles:
(1) Leader: Responsible for handling client requests, managing log replication, and sending heartbeats.
(2) Follower: Passively accepts logs and heartbeats from the Leader and does not initiate requests.
(3) Candidate: During the election process, a Follower can transition to a Candidate and initiate an election.
Election Trigger Conditions
Leader elections are typically triggered under the following conditions:
(1) Leader failure: A Follower does not receive a heartbeat from the Leader within a certain period (election timeout).
(2) New node joining: A newly joined node may trigger an election.
(3) Partition recovery: An election may be triggered after network partition recovery.
Election Process
The election process consists of the following steps:
(1) Follower Transitions to Candidate
When a Follower does not receive a heartbeat from the Leader within the election timeout period, it considers the Leader to have failed.
The Follower transitions its role to Candidate and initiates a new election.
(2) Candidate Initiates Voting
The Candidate first increments its term number (Term) by 1, indicating a new round of elections.
The Candidate sends a RequestVote RPC to other nodes in the cluster to request votes.
The Candidate votes for itself.
(3) Other Nodes Vote
Upon receiving the RequestVote RPC, nodes check the following conditions:
• Whether the Candidate's Term is greater than or equal to their own Term.
• Whether they have already voted for another Candidate.
• Whether the Candidate's log is at least as up-to-date as their own.
If the conditions are met, the node votes for the Candidate and resets its election timeout.
(4) Election Result
If the Candidate receives votes from a majority of nodes, it becomes the new Leader.
The new Leader immediately sends heartbeats to other nodes to prevent them from initiating new elections.
(5) Election Failure
If the Candidate does not receive enough votes within the election timeout period, the election fails.
The Candidate waits for a random period before re-initiating the election (to avoid split votes caused by multiple nodes initiating elections simultaneously).
Election Timeout
(1) To prevent multiple nodes from initiating elections simultaneously, Raft uses randomized election timeout periods (typically 150ms-300ms).
(2) Randomized timeout periods reduce the probability of election conflicts.
Log Consistency Guarantee
(1) During the election process, Raft ensures that only nodes with sufficiently up-to-date logs can become Leaders by comparing the Term and Index of logs.
(2) This guarantees that the Leader's log contains all committed log entries, ensuring consistency.
Election Optimization
(1) Pre-Vote Mechanism: Before formally initiating an election, a Candidate can first send a Pre-Vote request to confirm whether it has a chance to win the election, avoiding unnecessary Term increments.
(2) Leader Transfer: In certain situations, the Leader can proactively transfer leadership to another node to avoid frequent elections.

Modified Election Process:
The election process is divided into the following steps:
(1) Follower Transitions to Candidate (unchanged)

When a Follower does not receive a heartbeat from the Leader within the election timeout period, it considers the Leader to have failed.
The Follower transitions its role to Candidate and initiates a new election.
(2) Candidate Initiates Voting (modified)
The Candidate first increments its term number (Term) by 1 and generates a random int64 number, randPriority, indicating a new round of elections.
The Candidate sends a RequestVote RPC to other nodes in the cluster (this RPC request carries the random number randPriority) to request votes.
The Candidate votes for itself.
(3) Other Nodes Vote
Upon receiving the RequestVote RPC, nodes check the following conditions:
• Whether the Candidate's Term is greater than or equal to their own Term.
• Whether they have already voted for another Candidate.
• Whether the Candidate's log is at least as up-to-date as their own.
If the conditions are met, the node votes for the Candidate and resets its election timeout.
Upon receiving the RequestVote RPC, nodes record the maximum value maxNodePriority (first by log offset, then by term, then by the random number randPriority) among Candidate nodes that meet the following conditions:
• The Candidate's Term is greater than or equal to their own Term.
• The Candidate's log offset is at least as up-to-date as their own.
If a node refuses to vote for the Candidate (conditions not met or already voted for another node), it will return a message informing the Candidate.
(4) Voting Result (Receiving Rejected Votes)
If the Candidate receives rejected vote messages and the following conditions are met:
• If the message informs the voting node that the log is outdated,
• If the message informs the voting node that it has already voted for another node, then the Candidate node will record these nodes in a list (VoteOtherList) until the number of rejected nodes exceeds a majority (meaning the node cannot be elected according to the standard Raft algorithm, in which case it needs to self-rescue). If the Candidate's nodePriority is less than the recorded maxNodePriority or less than the maximum nodePriority in VoteOtherList, the node will initiate a specific node re-voting process.
Specific Node Re-Voting Process:
• Actively relinquish the Candidate role for this round.
• Send a message to all nodes that voted for it, requesting them to re-vote (the message includes the Candidate node with the maximum nodePriority recorded, obtained from VoteOtherList and maxNodePriority).
• Upon receiving this message, nodes compare the maxNodePriority in the message with the locally recorded maxNodePriority and send a vote to the node with the larger value.
• The specific node re-voting process aims to concentrate votes on the node with the highest nodePriority, thereby completing the election process.
(5) Voting Result (Receiving an Accepted Vote)
If the Candidate receives votes from a majority of nodes, it becomes the new Leader.
The new Leader immediately sends heartbeats to other nodes to prevent them from initiating new elections.

(6) Election Failure

If the Candidate does not receive enough votes within the election timeout period, the election fails.
The Candidate waits for a random period before re-initiating the election.

Leader Election Examples
Example 1: Raft Election Succeeds in One Round
Assume a 5-node cluster (A, B, C, D, E):

Initially, A is the Leader, and the other nodes are Followers.
After A fails, B and C's election timeout expires, and they transition to Candidates.
B and C initiate elections, sending RequestVote RPCs to other nodes.
Assume B's log is more up-to-date than C's, and D and E vote for B.
B receives 3 votes (including its own) and becomes the new Leader.

Example 2: Raft Election Fails in One Round

Initially, A is the Leader, and the other nodes are Followers.
After A fails, B, C, D, and E's election timeout expires, and they transition to Candidates.
B, C, D, and E initiate elections, sending RequestVote RPCs to other nodes. B, C, D, and E receive random numbers 100, 90, 80, and 70, respectively.
Assume B, C, D, and E each receive 1 vote. Since C, D, and E each receive three rejected vote messages (exceeding a majority), these three nodes relinquish their Candidate roles for this round and send messages to themselves (these three nodes vote for themselves), instructing them to transfer their votes to node B.
Node B receives 4 votes, exceeding a majority, and completes the election process.

Example 3: Another Raft Scenario

Initially, A is the Leader, and the other nodes are Followers.
After A fails, B and C's election timeout expires, and they transition to Candidates.
B and C initiate elections, sending RequestVote RPCs to other nodes, with random numbers 900 and 1000, respectively.
Assume B and C have the same log offset and receive votes from D and E, respectively. Here, a heuristic approach can be considered: since node A cannot vote, both B and C believe that under the existing Raft algorithm logic, neither will become the Leader. Since node C has a larger offset, node C will not initiate specific node re-voting. Node B will relinquish its Candidate role for this round and instruct nodes B and D, which voted for it, to transfer their votes to node C.
Node C receives 4 votes, exceeding a majority, and completes the election process.

Correctness of the Idea
At any moment, each node will only vote for one Candidate, so at any moment, there is at most one node with votes from a majority of nodes.

Efficiency
Under normal Raft operation, the new modifications have no impact. However, when the Raft algorithm cannot elect a Leader in one election round, the new modifications may facilitate the election of a Leader.

Thank you for your opinions, and welcome to talk about it.

karthikmurakonda · 2025-04-28T20:38:09Z

karthikmurakonda
Apr 28, 2025

What if the node with high random number is failed just before starting re-election i.e in your Specific Node Re-Voting Process?

3 replies

ss-torres Apr 29, 2025
Author

Hi, @karthikmurakonda, the random number is created when one node starting re-election. And it would change in a different re-election process. You could use uint64 to avoid the confliction. And in fact, if there is a confliction, it would not influence the the correctness of the protocol.

karthikmurakonda Apr 29, 2025

Okay, I got your point: This special revoting is done in single election phase only. But what are the advantages of this modified leader election algorithm?

ss-torres Apr 30, 2025
Author

Hi, @karthikmurakonda. If there are some nodes who apply for being the leader, and they apply for the votes at a similar time. Then they all would get some votes, but all of them could not get the enough votes to be the leader. With the revoting, some of the nodes who apply for being the leader could give up, and the node who have the highest random number probably get most of the votes and become the leader. The advantage is speeding up the voting process, and make it success in one round in most of the cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

An Improvement Idea for Leader Election in the Raft Algorithm #19559

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

An Improvement Idea for Leader Election in the Raft Algorithm #19559

Uh oh!

ss-torres Mar 9, 2025

Replies: 1 comment · 3 replies

Uh oh!

karthikmurakonda Apr 28, 2025

Uh oh!

ss-torres Apr 29, 2025 Author

Uh oh!

karthikmurakonda Apr 29, 2025

Uh oh!

ss-torres Apr 30, 2025 Author

ss-torres
Mar 9, 2025

Replies: 1 comment 3 replies

karthikmurakonda
Apr 28, 2025

ss-torres Apr 29, 2025
Author

ss-torres Apr 30, 2025
Author