Skip to content

Add env DeviceMemcpy::Batched and tests#7966

Open
gonidelis wants to merge 1 commit intoNVIDIA:mainfrom
gonidelis:memcopy_env
Open

Add env DeviceMemcpy::Batched and tests#7966
gonidelis wants to merge 1 commit intoNVIDIA:mainfrom
gonidelis:memcopy_env

Conversation

@gonidelis
Copy link
Member

fixes #7540

Adds environment overload for the only overload of DeviceMemcpy, DeviceMemcpy::Batched that is.

No deterministic guarantees are specified.

@gonidelis gonidelis requested a review from a team as a code owner March 10, 2026 01:49
@gonidelis gonidelis requested a review from gevtushenko March 10, 2026 01:49
@github-project-automation github-project-automation bot moved this to Todo in CCCL Mar 10, 2026
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Mar 10, 2026
@github-actions

This comment has been minimized.

@gonidelis gonidelis enabled auto-merge (squash) March 10, 2026 03:22
@github-actions
Copy link
Contributor

🥳 CI Workflow Results

🟩 Finished in 5h 37m: Pass: 100%/249 | Total: 1d 22h | Max: 1h 03m | Hits: 98%/156109

See results here.

{
// 3 buffers: [10, 20], [30, 40, 50], [60]
auto d_src = c2h::device_vector<int>{10, 20, 30, 40, 50, 60};
auto d_dst = c2h::device_vector<int>(6, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

Suggested change
auto d_dst = c2h::device_vector<int>(6, 0);
auto d_dst = c2h::device_vector<int>(6);


int num_buffers = 3;

thrust::counting_iterator<int> iota(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: Please prefer cuda iterators over thrust iterators.

Comment on lines +116 to +117
cudaStream_t custom_stream;
REQUIRE(cudaSuccess == cudaStreamCreate(&custom_stream));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: Please use cuda::stream to increase its test coverage

}
};

#if TEST_LAUNCH == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: What prevents us from running the below unit test for launch id 1 and 2?

Comment on lines +43 to +68
// 3 buffers of different sizes: [10, 20], [30, 40, 50], [60]
auto d_src = thrust::device_vector<int>{10, 20, 30, 40, 50, 60};
auto d_dst = thrust::device_vector<int>(6, 0);
auto d_offsets = thrust::device_vector<int>{0, 2, 5, 6};

int num_buffers = 3;

thrust::counting_iterator<int> iota(0);
auto input_it = thrust::make_transform_iterator(
iota, index_to_ptr<const int>{thrust::raw_pointer_cast(d_src.data()), thrust::raw_pointer_cast(d_offsets.data())});
auto output_it = thrust::make_transform_iterator(
iota, index_to_ptr<int>{thrust::raw_pointer_cast(d_dst.data()), thrust::raw_pointer_cast(d_offsets.data())});
auto sizes = thrust::make_transform_iterator(iota, get_size{thrust::raw_pointer_cast(d_offsets.data())});

cuda::stream stream{cuda::devices[0]};
cuda::stream_ref stream_ref{stream};
auto env = cuda::std::execution::env{stream_ref};

auto error = cub::DeviceMemcpy::Batched(input_it, output_it, sizes, num_buffers, env);
if (error != cudaSuccess)
{
std::cerr << "cub::DeviceMemcpy::Batched failed with status: " << error << std::endl;
}

thrust::device_vector<int> expected{10, 20, 30, 40, 50, 60};
// example-end memcpy-batched-env
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: I don't think this is a great API example. We should focus on the API's design of taking an iterator to pointers. Something like:

  auto d_src     = thrust::device_vector<int>{10, 20, 30, 40, 50, 60};
  auto d_src_pointers = thrust::device_vector<int*>{d_src[0], d_src[2], d_src[5]};
  • the raw pointer casts. You could even consider writing the results to two different buffers, to highlight the API's flexibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

Add env-based API for cub::DeviceMemcpy

2 participants