[BUG] split/slice APIs do not align with partitioning APIs

**Describe the bug**

Partitioning APIs that partition a table into `n` partitions, like [`hash_partition`](https://github.com/rapidsai/cudf/blob/branch-0.14/cpp/include/cudf/partitioning.hpp#L43) or [`round_robin_partition`](https://github.com/rapidsai/cudf/blob/branch-0.14/cpp/include/cudf/partitioning.hpp#L185), return a single table and a vector of `n+1` offsets that points to the beginning of each partition and where the size of any partition `i` can be determined by `offsets[i+1] - offsets[i]`.

For example:
```
partitioned_table = {7}, {}, {3, 8, 9}, {42};
offsets = [0, 1, 1, 4, 5]
```


I would expect to be able to trivially pass the output of a partitioning API into an API like [`split`](https://github.com/rapidsai/cudf/blob/branch-0.13/cpp/include/cudf/copying.hpp#L389) or [`slice`](https://github.com/rapidsai/cudf/blob/branch-0.13/cpp/include/cudf/copying.hpp#L358) in order to get a vector of zero-copy `table_view`s for each partition. 

However, this is not possible because the expected inputs for `split` or `slice` are incompatible with the `offsets` vector returned from a partitioning API.

`slice` expects a vector of index pairs:
```
 input:   [{10, 12, 14, 16, 18, 20, 22, 24, 26, 28},
           {50, 52, 54, 56, 58, 60, 62, 64, 66, 68}]
 indices: {1, 3, 5, 9, 2, 4, 8, 8}
 output:  [{{12, 14}, {20, 22, 24, 26}, {14, 16}, {}},
           {{52, 54}, {60, 22, 24, 26}, {14, 16}, {}}]
```

`split` expects a vector of the split points:
```
 input:   {10, 12, 14, 16, 18, 20, 22, 24, 26, 28}
 splits:  {2, 5, 9}
 output:  {{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}}
```

Neither of these are trivially compatible with the output of a partitioning API.

`split` is the closest. You can obtain the `splits` vector from the `offsets` vector by dropping the first and last element from `offsets`. However, that is inconvenient. 


**Expected behavior**

There should be an API that allows naively passing in the vector of offsets from a partitioning API and it returns a vector of zero-copy views for each partition. 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] split/slice APIs do not align with partitioning APIs #4607

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] split/slice APIs do not align with partitioning APIs #4607

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions