[FIXED] Stream leader can catchup from snapshot if required#8021
[FIXED] Stream leader can catchup from snapshot if required#8021neilalexander merged 1 commit intomainfrom
Conversation
|
Claude finished @MauriceVanVeen's task in 3m 44s —— View job PR Review
The fix is correct and well-targeted. Here's the breakdown:
|
ff7fc19 to
7f40103
Compare
|
@claude, brief re-review please. |
|
Claude finished @MauriceVanVeen's task in 1m 6s —— View job Re-review
The change since the last review is the fix for the inline comment concern — the
// If we are currently not a follower, make sure we step down.
if n.State() != Follower {
n.stepdownLocked(noLeader)
}Previously this only stepped down for The fix is minimal and correct. It doesn't change the behavior for the normal path — Overall: The original fix is solid, and the follow-up addresses the noted edge case cleanly. LGTM. |
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
7f40103 to
3643968
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3643968638
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if err := mset.processSnapshot(ss, ce.Index); err != nil && err != errAlreadyLeader { | ||
| return 0, err | ||
| } | ||
| if err := mset.processSnapshot(ss, ce.Index); err != nil { |
There was a problem hiding this comment.
Skip stale snapshot state when leader is already ahead
Calling processSnapshot() unconditionally on leaders re-applies snapshot metadata even when the leader does not need catchup. In processSnapshot, mset.setCLFS(snap.Failed) runs before the up-to-date check, so a SendSnapshot captured slightly behind current applied state can rewind CLFS on the leader. If intervening applied entries had already incremented CLFS, the next clustered message apply can hit the lseq != mset.lseq + clfs check in processJetStreamMsgWithBatch and return errLastSeqMismatch, which drives the cluster-reset path. This only appears under scale-up/leader traffic races, but it is a real regression from previously skipping leader-side snapshot processing.
Useful? React with 👍 / 👎.
After scaling up a stream, a follower could have received a snapshot (through
SendSnapshot) but not have caught up from it. Based on its log it could already become the new stream leader. When it got toprocessSnapshotit would error witherrAlreadyLeaderdue ton.PauseApply()and then skip catchup.This PR fixes that by always processing the incoming snapshot, even if we're leader, since we check that we're up-to-date first and otherwise step down and perform catch up. This will usually not happen on a leader, but can happen under certain edge cases during scale up.
Resolves #8020
Signed-off-by: Maurice van Veen github@mauricevanveen.com