Commit 06e72ac
committed
reduce FSM backpressure from blocked evals queue
The coarse grained lock on the blocked evals queue can cause backpressure on the
FSM when there are a large number of evals getting unblocked and there's
contention from this lock from a large number of scheduler goroutines. The
`watchCapacity` goroutine in the blocked evals queue has a large buffered
channel for unblock operations, but it takes the same lock that's used by the
unblock methods called from the FSM. Meanwhile, `Eval.Reblock` RPCs arriving
from scheduler workers attempt to take this same lock, and we end up with a
backlog waiting on this mutex.
This PR moves all the operations for the blocked evals queue onto a single
goroutine that receives work from a large buffered channel. The `Eval.Reblock`
RPCs and the `Unblock` methods called from the FSM push work onto this channel
and immediately return. This prevents them from blocking except for during
leader transitions where we flush the blocked evals queue, at which point we
should not be making `Unblock` method calls from the FSM anyways.
This also allows us to move the tracking of stats into one goroutine so we no
longer need to copy the stats on each update. This reduces memory allocation and
GC pressure significantly.
Ref: https://hashicorp.atlassian.net/browse/NMD-10451 parent f9ce228 commit 06e72ac
File tree
5 files changed
+488
-224
lines changed- nomad
5 files changed
+488
-224
lines changed
0 commit comments