Description
This is a follow-up to #4424.
One issue with unspilling (asynchronous or not) is that, whenever you start executing a task whose inputs need unspilling, you end up with CPU under-utilization, since you have a task in executing state that's actually busy doing I/O.
When unspilling becomes asynchronous, we gain the option to pre-load the dependencies from disk.
Barring very complex solutions where the worker state machine becomes unspill-aware (that would require a new "unspilling-inputs" task state in between ready and executing, a new asynchronous instruction to match, and a wealth of new transitions), I would like to suggest a simpler, greedy design.
Proposed design
When a task reaches the top of the ready or constrained heap, but can't transition immediately to executing, the worker state machine fires an async_get
command (#4424 (comment)) to the SpillBuffer with the list of dependencies. This brings all inputs necessary for the task to the top of the LRU and out of disk.
The output of the command is discarded.
When Worker.execute
finally runs, it will call async_get
again, with exactly the same keys. If enough time has passed, all keys are now in fast. If they are still in the middle of unspilling, the SpillBuffer will just return a reference to the already-existing Futures.