reduce stream syncs due to `cudf::detail::gather`

50% of the time spent in the tpch1 benchmark spent syncing in [cudf::detail::gather](https://docs.rapids.ai/api/libcudf/legacy/group__copy__gather#files). The majority of that time is in `table_device_view::create`/`column_device_view::create`.



<img width="1677" height="785" alt="Image" src="https://github.com/user-attachments/assets/e6dbfe94-3b7c-46c9-9338-9d9f9bcc4c79" />

cudf calls `table_device_view::create`/`column_device_view::create` to copy the host `column_view` and `table_view` to the device. Notably, there's a fast and slow path in `column_device_view::create`:
https://github.com/rapidsai/cudf/blob/363920c83694ee88f2af12568241250d81983144/cpp/src/column/column_device_view.cu#L110-L120
```
std::unique_ptr<column_device_view, std::function<void(column_device_view*)>>
column_device_view::create(column_view source, rmm::cuda_stream_view stream)
{
  size_type num_children = source.num_children();
  if (num_children == 0) {
    // Can't use make_unique since the ctor is protected
    return std::unique_ptr<column_device_view>(new column_device_view(source));
  }

  return create_device_view_from_view<column_view, column_device_view>(source, stream);
}
```
The slow path is taken for types such as string, struct, list, and dictionary, which have children. It would be nice to avoid the slow path which has expensive stream syncs.

Collected these on 07d9afff525ad0e44426493599d6aeaaf94ab94c
[tpch_q1_cudf_parquet_10iters_gpu_samply.json.syms.json](https://github.com/user-attachments/files/28689489/tpch_q1_cudf_parquet_10iters_gpu_samply.json.syms.json)
[tpch_q1_cudf_parquet_10iters_gpu_samply.json.gz](https://github.com/user-attachments/files/28689488/tpch_q1_cudf_parquet_10iters_gpu_samply.json.gz)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce stream syncs due to `cudf::detail::gather` #75

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

reduce stream syncs due to cudf::detail::gather #75

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

reduce stream syncs due to `cudf::detail::gather` #75