Description
While working on #8257, I noticed that Client.submit and Client.compute embed into a single, monolithic msgpack blob the whole dask graph, without extracting buffers. This can cause a substantial slowdown in the (fairly common) case where the graph embeds large-ish (>1 MiB) constants.
It should be straightforward to change it to send buffers without embedding them in msgpack - like worker-to-worker comms do.
Reproducer
data = b"0" * 2**20
c.submit(len, data)
but also
data = numpy.zeros(2**20, dtype='u1')
da.from_array(data, chunks=-1).sum().compute()
Expected behaviour
The client->scheduler comms temporarily create and transfer over the network a tiny msgpack object which embeds the pickled callable and little else, plus a 1 MiB buffer which is never deep-copied.
Idem for the scheduler->worker comms.
Actual behaviour
The client->scheduler comms make a temporary deep-copy on the client host of the whole 1 MiB constant, which is then sent over the network as a monolithic 1 MiB msgpack stream, which is then deep-copied once again when unpacking msgpack on the scheduler.
Not sure about the scheduler->worker part of the trip.