Description
This has been discussed in a few other places, but I think it's important enough (and a separate enough task) to warrant its own issue for tracking purposes. Related to:
Once #5900 is done, when one worker requests data from another, and that data is currently spilled to disk, we should not require slurping all the data into memory just to send it. Ideally, we'd stream the serialized bytes from disk directly over the network socket, without copying into userspace at all, via something like socket.sendfile
/sendfile(2)
. If this is not deemed possible, copying into a small userspace buffer and doing incremental sends would still be an improvement (but not ideal).
(As a bonus, it could be nice if workers could also receive data directly to disk. This might let us keep fetching dependencies even when workers are under memory pressure. This might make computations complete faster when workers are under memory pressure, but probably won't make the difference between them succeeding/failing like the sending side does, so it's less important for now.)
The biggest challenge is that the comms interface doesn't currently support streams, only individual messages. The get_data
response can also contain multiple keys, which further complicates the interface.
From a technical perspective, doing sendfile
over a socket should be easy. The challenge is just around making that feasible to do with our comms interface.
I believe this is one of the highest-impact changes we can make to improve memory performance and reduce out-of-memory failures. There are a few flaws currently with spill-to-disk, but to me this is the biggest: transfer-heavy graphs basically work against spilling to disk, because workers are currently un-spilling some/most of the data they just spilled in order to send it to their peers.