Skip to content

[FIXED] Stream leader can catchup from snapshot if required#8021

Merged
neilalexander merged 1 commit intomainfrom
maurice/stream-leader-catchup
Apr 9, 2026
Merged

[FIXED] Stream leader can catchup from snapshot if required#8021
neilalexander merged 1 commit intomainfrom
maurice/stream-leader-catchup

Conversation

@MauriceVanVeen
Copy link
Copy Markdown
Member

After scaling up a stream, a follower could have received a snapshot (through SendSnapshot) but not have caught up from it. Based on its log it could already become the new stream leader. When it got to processSnapshot it would error with errAlreadyLeader due to n.PauseApply() and then skip catchup.

This PR fixes that by always processing the incoming snapshot, even if we're leader, since we check that we're up-to-date first and otherwise step down and perform catch up. This will usually not happen on a leader, but can happen under certain edge cases during scale up.

Resolves #8020

Signed-off-by: Maurice van Veen github@mauricevanveen.com

@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner April 8, 2026 22:49
@synadia-claude-reviewer
Copy link
Copy Markdown

synadia-claude-reviewer bot commented Apr 8, 2026

Claude finished @MauriceVanVeen's task in 3m 44s —— View job


PR Review

  • Read CLAUDE.md for repo-specific guidelines
  • Examine PR diff and changed files
  • Review processSnapshot logic in jetstream_cluster.go
  • Review test changes in jetstream_cluster_3_test.go
  • Review store.go changes
  • Analyze correctness, concurrency safety, and raft/JetStream clustering implications
  • Post inline comments on specific issues
  • Post overall review summary

The fix is correct and well-targeted. Here's the breakdown:

applyStreamEntries change (jetstream_cluster.go)

Removing the isRecovering || !mset.IsLeader() guard is the right call. The original guard was meant to avoid running processSnapshot on an active leader, but it broke the edge case described in the PR: a node that became leader while still holding an uncommitted snapshot. The new code unconditionally calls processSnapshot, relying on the function itself to handle the leader case safely via calculateSyncRequest — if the leader is already up-to-date, sreq == nil and processSnapshot returns early with no side effects.

Removing the err != errAlreadyLeader swallowing is also correct — that was masking the error rather than fixing the root cause.

processSnapshot leader handling (jetstream_cluster.go:9924–9933)

The step-down-and-retry logic is sound. StepDown() unconditionally transitions us to follower state (returns errNotLeader only if we're already not leader, not if there are no healthy peers), so the second PauseApply() should succeed in virtually all cases. The _ = n.StepDown() error discard is intentional and appropriate — if we've already stepped down by the time we call it, that's fine.

Note that processSnapshotDeletes runs at the top of processSnapshot before the leader check, so it can compact the store and sync deletes even on a leader. For the described bug scenario (truncated/empty store), this is harmless. For a leader that is already up-to-date, calculateSyncRequest returns nil and the function returns early before PauseApply is ever reached, so there is no unintended mutation.

Left an inline comment on the second PauseApply failure path — it's a safety net for an extremely unlikely race condition, but the DrainAndReplaySnapshot fallback it triggers could place a Raft leader into paused=true state. The subsequent resetClusteredState handles recovery in that edge case, so it's not a blocking concern.

isClusterResetErr / monitorStream fallback (store.go, jetstream_cluster.go:3188)

Adding errAlreadyLeader to isClusterResetErr and the DrainAndReplaySnapshot retry path is a reasonable defensive addition. If the second PauseApply somehow fails, the system will attempt to re-apply the snapshot before falling back to a full cluster state reset.

Test (jetstream_cluster_3_test.go)

The test directly exercises the described scenario: truncate the leader's store to empty, inject a snapshot with the original state, and assert cluster convergence. Using mset.store.Truncate(0) to bypass normal stream operations is appropriate for an artificially-crafted regression test. The 10s timeout for convergence is generous enough for CI flakiness.

Overall: The fix is clean, minimal, and targets the exact root cause. No blocking issues found.

@MauriceVanVeen MauriceVanVeen force-pushed the maurice/stream-leader-catchup branch from ff7fc19 to 7f40103 Compare April 8, 2026 22:57
@MauriceVanVeen
Copy link
Copy Markdown
Member Author

@claude, brief re-review please.

@synadia-claude-reviewer
Copy link
Copy Markdown

synadia-claude-reviewer bot commented Apr 8, 2026

Claude finished @MauriceVanVeen's task in 1m 6s —— View job


Re-review

  • Check diff since last review
  • Review raft.go pauseApplyLocked change
  • Post re-review summary

The change since the last review is the fix for the inline comment concern — the pauseApplyLocked update in raft.go.

pauseApplyLocked (raft.go:1095–1098)

// If we are currently not a follower, make sure we step down.
if n.State() != Follower {
    n.stepdownLocked(noLeader)
}

Previously this only stepped down for Candidate state. Now it also handles Leader, which directly addresses the flagged concern: DrainAndReplaySnapshot calls pauseApplyLocked directly (bypassing the leader-guard in PauseApply). In the extreme edge case where DrainAndReplaySnapshot runs while the node is still leader (step-down failed due to no healthy peers), it will now correctly step the node down before setting paused=true, rather than leaving a leader in a paused state.

The fix is minimal and correct. It doesn't change the behavior for the normal path — PauseApply still guards with errAlreadyLeader for leader state, so ordinary callers go through processSnapshot's step-down-and-retry logic as before.

Overall: The original fix is solid, and the follow-up addresses the noted edge case cleanly. LGTM.

Copy link
Copy Markdown
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen MauriceVanVeen force-pushed the maurice/stream-leader-catchup branch from 7f40103 to 3643968 Compare April 9, 2026 13:34
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3643968638

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if err := mset.processSnapshot(ss, ce.Index); err != nil && err != errAlreadyLeader {
return 0, err
}
if err := mset.processSnapshot(ss, ce.Index); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip stale snapshot state when leader is already ahead

Calling processSnapshot() unconditionally on leaders re-applies snapshot metadata even when the leader does not need catchup. In processSnapshot, mset.setCLFS(snap.Failed) runs before the up-to-date check, so a SendSnapshot captured slightly behind current applied state can rewind CLFS on the leader. If intervening applied entries had already incremented CLFS, the next clustered message apply can hit the lseq != mset.lseq + clfs check in processJetStreamMsgWithBatch and return errLastSeqMismatch, which drives the cluster-reset path. This only appears under scale-up/leader traffic races, but it is a real regression from previously skipping leader-side snapshot processing.

Useful? React with 👍 / 👎.

@neilalexander neilalexander merged commit a7e148e into main Apr 9, 2026
72 of 73 checks passed
@neilalexander neilalexander deleted the maurice/stream-leader-catchup branch April 9, 2026 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stream filestore divergence after rapid leader transition

2 participants