(2.14) Asynchronous stream snapshots#7876
Conversation
|
This pull request tries to reduce the tail latency of publishing to a stream. Without the optimization in this pull request we have the following results:
Under low load, single client publishing one message at a time, latency varies between 61 microseconds, up to ~29 milliseconds for unlucky messages. Under high load, single client publishing 500 messages at a time, latency goes from ~1ms all the way to ~46ms. One cause for the high tail latency is due to periodic snapshotting of the stream. Taking a stream snapshot involves writing a file and sync it to disk, and doing it in a safe way requires more than one call to fsync (710407c). With the changes in this pull request, tail latency improves significantly:
I used the "lats" benchmark client: https://github.com/synadia-labs/lats |
6e05f0f to
e32926e
Compare
e32926e to
2f79166
Compare
|
Claude finished @neilalexander's task in 4m 0s —— View job Code Review: Asynchronous Stream Snapshots
SummaryThis PR moves the file write and Raft log compaction of stream snapshots off the ObservationsConcurrency and mutex usage — looks correct
Behavior change: In the old code, // old
} else if err != errNoSnapAvailable && err != errNodeClosed && err != errCatchupsRunning {
s.RateLimitWarnf(...)
failedSnapshots++
}In the new code, // new (server/jetstream_cluster.go ~L3020)
if err != errNoSnapAvailable && err != errNodeClosed {
s.Warnf("Failed to install snapshot for '%s > %s' [%s]: %v", ...)
failedSnapshots++
}This means that while a catchup is in progress, each snapshot tick will log a warning and increment Shutdown with an in-flight async snapshot On Redundant When Double lock/unlock in the timer case (minor) // server/jetstream_cluster.go ~L3366
case <-t.C:
snapMu.Lock()
forceIfFailed := failedSnapshots > 0
snapMu.Unlock()
doSnapshot(forceIfFailed) // acquires snapMu again
If the server is already shutting down and VerdictThe implementation is correct and safe. The only item worth a second look is the |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2f79166212
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
server/jetstream_cluster.go
Outdated
| if err != errNoSnapAvailable && err != errNodeClosed { | ||
| s.Warnf("Failed to install snapshot for '%s > %s' [%s]: %v", |
There was a problem hiding this comment.
Ignore catchup-in-progress on snapshot checkpoint creation
When CreateSnapshotCheckpoint fails because a peer catchup is active (errCatchupsRunning from raft.createSnapshotCheckpointLocked), this branch treats it as a hard failure, increments failedSnapshots, and shortens the compaction ticker. On busy clusters this turns normal catchup behavior into repeated failure accounting and eventually forced snapshots, which is a behavioral regression from the previous stream path that explicitly ignored errCatchupsRunning and can trigger unnecessary blocking snapshots/log churn.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This would previously happen on the n.InstallSnapshot() call. This PR "splits" that call into n.CreateSnapshotCheckpoint() followed by InstallSnapshot on that checkpoint
|
needs rebase? |
2f79166 to
5885726
Compare
done |
5885726 to
e800634
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e8006345c5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // to force it through. We might have been catching up a peer for a long period, and this | ||
| // protects our log size from growing indefinitely. | ||
| forceSnapshot := failedSnapshots > 4 | ||
| c, err := n.CreateSnapshotCheckpoint(forceSnapshot) |
There was a problem hiding this comment.
Re-check catchup state before installing checkpoint
CreateSnapshotCheckpoint is invoked before flushAllPending() and stateSnapshot(), but the Raft catchup guard (len(n.progress) == 0) is only enforced at checkpoint creation time, not in checkpoint.InstallSnapshot. If a follower catchup starts in that window, this path will still install/compact a non-forced snapshot, which bypasses the previous safety behavior and can interrupt active catchups with avoidable retry/snapshot churn.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
I think this can be fixed easily, but the fix would likely belong to a separate PR (the fix would be in raft code, and it would affect meta snapshots as well).
Having said that, I think that this is not such a big issue. In unlucky situations InstallSnapshot may interrupt a catchup that has started after the call the CreateSnapshotCheckpoint. In that case, there wouldn't be a lot of work wasted in the first place.
Commit 0033a15 extended Raft's interface with the ability to create snapshot checkpoints which can be used to install snapshots asynchronously. This commit reuses the same interface to make stream snapshots asynchronous: writing of the snapshot file and Raft's log compaction steps are done in a separate goroutine. Signed-off-by: Daniele Sciascia <daniele@nats.io>
e800634 to
2a42738
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2a42738886
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // If this is the first failure, reduce the interval of the snapshot timer. | ||
| // This ensures we're not waiting too long for snapshotting to eventually become forced. | ||
| if failedSnapshots == 0 { | ||
| t.Reset(compactMinInterval) |
There was a problem hiding this comment.
Guard ticker reset after monitor shutdown
handleInstallResult runs from the async snapshot goroutine, but monitorStream can exit on mqch/qch and run defer t.Stop() before that goroutine completes. If that late callback reaches this t.Reset(...) path (for example when the in-flight snapshot fails because the node/store is being stopped), it re-arms a ticker that no longer has a consumer, leaving an orphaned periodic timer after the monitor is gone. This can accumulate background wakeups during stream moves/scale-downs when snapshots are in flight.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
I wonder if this is a real concern. Sure, if t is Reset after Stop, it may keep ticking unnecessarily. It also means that is it no longer referenced, so it will eventually stop when collected. Not sure it is worth introducing complexity for a ticker that will tick every 15 seconds, in the worst case.
|
@claude review again |
|
Claude finished @sciascid's task in 3m 53s —— View job Code Review: Asynchronous Stream Snapshots (re-review)
SummaryThe implementation is sound and the design is clean. The async goroutine is correctly bounded by Open Items Revisited1. This was flagged in both the previous Claude review and by Codex. The old // old
} else if err != errNoSnapAvailable && err != errNodeClosed && err != errCatchupsRunning {
failedSnapshots++
}The new code does not exclude it at checkpoint creation, so each tick during a catchup will log a warning and increment sciascid's response ("This would previously happen on the If the old silent-ignore behaviour for streams is preferred, the fix is one line: Fix this → 2. Catchup guard only at checkpoint creation, not at install time ( Codex flagged that a catchup could start in the window between 3. Ticker
4. Redundant
5. Shutdown with in-flight async snapshot When VerdictThe implementation is correct and safe. The only item worth explicit confirmation before merge is whether the |
@claude We should eventually force |
|
Claude finished @sciascid's task in 47s —— View job
Clarification AcknowledgedThe // If we had a significant number of failed snapshots, start relaxing Raft-layer checks
// to force it through. We might have been catching up a peer for a long period, and this
// protects our log size from growing indefinitely.
forceSnapshot := failedSnapshots > 4The design is: each All open items from prior reviews are now resolved:
The implementation is correct and safe. Ready to merge. |
Commit 0033a15 extended Raft's interface with the ability to create snapshot checkpoints which can be used to install snapshots asynchronously.
This commit reuses the same interface to make stream snapshots asynchronous: writing of the snapshot file and Raft's log compaction steps are done in a separate goroutine.
Signed-off-by: Daniele Sciascia daniele@nats.io