Async disk access: thaw dependencies ahead of execute

This is a follow-up to #4424.

One issue with unspilling (asynchronous or not) is that, whenever you start executing a task whose inputs need unspilling, you end up with CPU under-utilization, since you have a task in executing state that's actually busy doing  I/O.
When unspilling becomes asynchronous, we gain the option to pre-load the dependencies from disk.

Barring very complex solutions where the worker state machine becomes unspill-aware (that would require a new "unspilling-inputs" task state in between ready and executing, a new asynchronous instruction to match, and a wealth of new transitions), I would like to suggest a simpler, greedy design.

# Proposed design
When a task reaches the top of the ready or constrained heap, but can't transition immediately to executing, the worker state machine fires an `async_get` command (https://github.com/dask/distributed/issues/4424#issuecomment-1464185143) to the SpillBuffer with the list of dependencies.  This brings all inputs necessary for the task to the top of the LRU and out of disk.

The output of the command is discarded.
When `Worker.execute` finally runs, it will call `async_get` again, with exactly the same keys. If enough time has passed, all keys are now in fast. If they are still in the middle of unspilling, the SpillBuffer will just return a reference to the already-existing Futures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Async disk access: thaw dependencies ahead of execute #7643

Proposed design

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Async disk access: thaw dependencies ahead of execute #7643

Description

Proposed design

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions