Commit 06e72ac

committed

reduce FSM backpressure from blocked evals queue

The coarse grained lock on the blocked evals queue can cause backpressure on the FSM when there are a large number of evals getting unblocked and there's contention from this lock from a large number of scheduler goroutines. The `watchCapacity` goroutine in the blocked evals queue has a large buffered channel for unblock operations, but it takes the same lock that's used by the unblock methods called from the FSM. Meanwhile, `Eval.Reblock` RPCs arriving from scheduler workers attempt to take this same lock, and we end up with a backlog waiting on this mutex. This PR moves all the operations for the blocked evals queue onto a single goroutine that receives work from a large buffered channel. The `Eval.Reblock` RPCs and the `Unblock` methods called from the FSM push work onto this channel and immediately return. This prevents them from blocking except for during leader transitions where we flush the blocked evals queue, at which point we should not be making `Unblock` method calls from the FSM anyways. This also allows us to move the tracking of stats into one goroutine so we no longer need to copy the stats on each update. This reduces memory allocation and GC pressure significantly. Ref: https://hashicorp.atlassian.net/browse/NMD-1045

1 parent f9ce228 commit 06e72acCopy full SHA for 06e72ac

5 files changed

+488

-224

lines changed

nomad

5 files changed

+488

-224

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 06e72ac

5 files changed

5 files changed

File tree

5 files changed

5 files changed

0 commit comments