-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
NRG: Decouple Raft transport layer #7670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
sciascid
wants to merge
2
commits into
main
Choose a base branch
from
raft-transport-refactoring
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+612
−148
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
7776f56 to
2d9baba
Compare
1f309fd to
5650240
Compare
3a7dde7 to
1e3d5d1
Compare
MauriceVanVeen
approved these changes
Jan 6, 2026
Member
MauriceVanVeen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
server/raft_test.go
Outdated
|
|
||
| // Remove the follower while the leader is partitioned away | ||
| hub.partition(leader.node().ID(), 1) | ||
| leader.node().ProposeRemovePeer(followers[0].node().ID()) |
Member
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could optionally store f := followers[0].node() to save a couple repetitions.
The Raft implementation was tightly coupled to the server's internal client and send queue for the RPC communication. This makes it difficult to test scenarios like network partitions in deterministic manner. The primary benefit of this change is improved testability. A new mockTransport is introduced for testing, which allows for simulating network partitions and for injecting behavior after a message is sent. Signed-off-by: Daniele Sciascia <[email protected]>
1e3d5d1 to
6cc63ce
Compare
…ions This commit fixes the following bugs: - Inconsistent Cluster Size: When a leader was partitioned from the cluster, immediately after proposing a EntryAddPeer. The remaining nodes could end up with different view of the cluster size and quorum. So followers could have cluster size and would not match the number of peers in the peer set. A subsequent leader election, electing one of the followers, could break the quorum system. - Incorrect Leader Election: It was possible for a new leader to be elected without a proper quorum. This could happen if a partition occurred after a new peer was proposed but before that change was committed. A follower could add the uncommitted peer to its peer set but would not update its cluster size and quorum, leading to an invalid election. Both issues are solved by making sure that when a peer is added or removed from the membership, the cluster size and quorum are adjusted accordingly, at the same time. Followers would first add peers when receiving the EntryAddPeer, and then adjusting the cluster size only after commit. This patch changes this behavior such that the cluster size and quorum are recomputed upon receiving the EntryAddPeer / EntryRemovePeer proposals. This is inline with the membership protocol proposed in Ongaro's dissertation, section 4.1. This patch also removes the concept of a "known" peer from the Raft layer. A node would add a peer to its peer set when first receiving the corresponding appendEntry, and on commit it would be marked as "known". This distinction no longer applies. Signed-off-by: Daniele Sciascia <[email protected]>
6cc63ce to
bb1c801
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The Raft implementation was tightly coupled to the server's internal client and send queue for the RPC communication. This makes it difficult to test scenarios like network partitions in deterministic manner.
The primary benefit of this change is improved testability. A new mockTransport is introduced for testing, which allows for simulating network partitions and for injecting behavior after a message is sent.
Signed-off-by: Daniele Sciascia [email protected]