Skip to content

Commit 06e72ac

Browse files
committed
reduce FSM backpressure from blocked evals queue
The coarse grained lock on the blocked evals queue can cause backpressure on the FSM when there are a large number of evals getting unblocked and there's contention from this lock from a large number of scheduler goroutines. The `watchCapacity` goroutine in the blocked evals queue has a large buffered channel for unblock operations, but it takes the same lock that's used by the unblock methods called from the FSM. Meanwhile, `Eval.Reblock` RPCs arriving from scheduler workers attempt to take this same lock, and we end up with a backlog waiting on this mutex. This PR moves all the operations for the blocked evals queue onto a single goroutine that receives work from a large buffered channel. The `Eval.Reblock` RPCs and the `Unblock` methods called from the FSM push work onto this channel and immediately return. This prevents them from blocking except for during leader transitions where we flush the blocked evals queue, at which point we should not be making `Unblock` method calls from the FSM anyways. This also allows us to move the tracking of stats into one goroutine so we no longer need to copy the stats on each update. This reduces memory allocation and GC pressure significantly. Ref: https://hashicorp.atlassian.net/browse/NMD-1045
1 parent f9ce228 commit 06e72ac

File tree

5 files changed

+488
-224
lines changed

5 files changed

+488
-224
lines changed

0 commit comments

Comments
 (0)