Description
Describe the bug
Partitioning APIs that partition a table into n
partitions, like hash_partition
or round_robin_partition
, return a single table and a vector of n+1
offsets that points to the beginning of each partition and where the size of any partition i
can be determined by offsets[i+1] - offsets[i]
.
For example:
partitioned_table = {7}, {}, {3, 8, 9}, {42};
offsets = [0, 1, 1, 4, 5]
I would expect to be able to trivially pass the output of a partitioning API into an API like split
or slice
in order to get a vector of zero-copy table_view
s for each partition.
However, this is not possible because the expected inputs for split
or slice
are incompatible with the offsets
vector returned from a partitioning API.
slice
expects a vector of index pairs:
input: [{10, 12, 14, 16, 18, 20, 22, 24, 26, 28},
{50, 52, 54, 56, 58, 60, 62, 64, 66, 68}]
indices: {1, 3, 5, 9, 2, 4, 8, 8}
output: [{{12, 14}, {20, 22, 24, 26}, {14, 16}, {}},
{{52, 54}, {60, 22, 24, 26}, {14, 16}, {}}]
split
expects a vector of the split points:
input: {10, 12, 14, 16, 18, 20, 22, 24, 26, 28}
splits: {2, 5, 9}
output: {{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}}
Neither of these are trivially compatible with the output of a partitioning API.
split
is the closest. You can obtain the splits
vector from the offsets
vector by dropping the first and last element from offsets
. However, that is inconvenient.
Expected behavior
There should be an API that allows naively passing in the vector of offsets from a partitioning API and it returns a vector of zero-copy views for each partition.