Description
Currently, all input partitions are split into output_number_partitions (n_out)
number of chunks and added to the shuffler. If there are n
input partitions in each worker then, there will be n*n_out
chunks in each worker. If there were p
workers, then on average n*n_out/p
number of messages are sent externally to another worker. These could potentially be very small.
Concatenating these small chunks to a batch could potentially improve the communication performance. Since the shuffler receives metadata and data buffers separately, this concatenation/ batching will have to be done on outside of cudf utils. Overhead of this approach would be increased memory pressure.
This approach is discussed further in this document.
ChunkBatch
would encapsulate a batch of chunks added to the shuffler, intended to be sent to a particular target worker/ rank. Just like a Chunk
, a ChunkBatch
will contain a uid. Then there will be two buffers, concatenated metadata and concatenated payloads.
Most trivial way of creating a batch would be to copy the metadata of each chunk on to the concatenated buffer one after the other. Same can follow the payloads. This allows us to traverse the a batch using a forward iterator. However, random access is not possible as we wouldn't know the stride for the i
th chunk in the batch.
If we were to add a random access iterator to the batch, we would need more information, such as the prefix sum of metadata buffers and payload buffers in each chunk.