Expose PSTL algorithms through `<cuda/std/algorithm>` and `<cuda/std/numeric>` by miscco · Pull Request #7931 · NVIDIA/cccl

miscco · 2026-03-09T10:49:18Z

We discussed this internally and we are happy with the results of the parallel CUDA backend. So we want to expose this now rather than waiting for all algorithms to be implemented.

There are certain caveats:

We require random access iterators for the CUDA backend
We do not expose only a CUDA backend through cuda::execution::gpu. Standard execution policies will currently static_assert that there is a missing backend
We do not provide any fallback serial implementation. This would be dangerous, because the serial implementation would naively run on host and not device.

libcudacxx/benchmarks/bench/remove_copy/basic.cu

libcudacxx/include/cuda/__execution/policy.h

bernhardmgruber · 2026-03-09T19:13:30Z

libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.copy/pstl_copy_n.cu

    cuda::stream stream{cuda::device_ref{0}};
    cuda::device_memory_pool_ref device_resource = cuda::device_default_memory_pool(stream.device());
-    const auto policy = cuda::execution::__cub_par_unseq.with_memory_resource(device_resource).with_stream(stream);
+    const auto policy = cuda::execution::gpu.with_memory_resource(device_resource).with_stream(stream);


Remark: looking at this line, it may sound a bit better as:

Suggested change

const auto policy = cuda::execution::gpu.with_memory_resource(device_resource).with_stream(stream);

const auto policy = cuda::execution::gpu.with_memory_resource(device_resource).on_stream(stream);

But I guess nobody wants to do another bulk rename?

There might be another rename coming but not that one

bernhardmgruber · 2026-03-10T10:04:59Z

libcudacxx/include/cuda/std/algorithm

+// parallel algorithms
+#if _CCCL_HAS_PSTL_BACKEND()
+#  include <cuda/std/__pstl/adjacent_find.h>
+#  include <cuda/std/__pstl/all_of.h>
+#  include <cuda/std/__pstl/any_of.h>
+#  include <cuda/std/__pstl/copy.h>
+#  include <cuda/std/__pstl/copy_if.h>
+#  include <cuda/std/__pstl/copy_n.h>


Q: I thought many standard libraries would expose PSTL algorithms through the <execution> header and not <algorithm>. This would make the inclusion of <algorithm> cheaper.

Discussed this with @miscco offline and it seems the C++ standard requires the overloads to be in <algorithm>. However, it may not be observable to the common user, since they need to include <execution> in addition to supply an execution policy.

If it's not observable, then I would like to see exposing it in the <execution> header to avoid bloating <algorithm>.

I do not believe that is a correct statement.

<execution> can include it all and be fine, but then <algorithm> would not have it.

The point is that the pstl headers pull effectively all of <algorithm>

can include it all and be fine, but then would not have it.

Why is the advantage of <algorithm> having an overload that cannot be called if a user does not also include <execution>?

The point is that the pstl headers pull effectively all of

This is fine IMO, including a PSTL header can be more expensive.

bernhardmgruber · 2026-03-10T16:39:30Z

@miscco could you please measure the compile-time of

#include <cuda/std/algorithm>
int main() {
  return cuda::std::min(0, 2);
}

before and after this PR? I would be curious how much of an impact pulling in most of CUB has ;)

github-actions · 2026-03-11T21:26:54Z

🥳 CI Workflow Results

🟩 Finished in 4h 24m: Pass: 100%/156 | Total: 7d 07h | Max: 4h 23m | Hits: 62%/369717

See results here.

miscco requested review from a team as code owners March 9, 2026 10:49

miscco requested review from jrhemstad and shwina March 9, 2026 10:49

github-project-automation bot added this to CCCL Mar 9, 2026

github-project-automation bot moved this to Todo in CCCL Mar 9, 2026

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Mar 9, 2026

davebayer reviewed Mar 9, 2026

View reviewed changes

libcudacxx/benchmarks/bench/remove_copy/basic.cu Outdated Show resolved Hide resolved

bernhardmgruber reviewed Mar 9, 2026

View reviewed changes

libcudacxx/include/cuda/__execution/policy.h Outdated Show resolved Hide resolved

bernhardmgruber approved these changes Mar 9, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

miscco force-pushed the expose_pstl branch from 0163da0 to d6c92e4 Compare March 9, 2026 15:31

This comment has been minimized.

Sign in to view

bernhardmgruber approved these changes Mar 9, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

miscco force-pushed the expose_pstl branch from 9121ac1 to 5d3fc09 Compare March 10, 2026 09:51

bernhardmgruber reviewed Mar 10, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

jrhemstad assigned miscco Mar 10, 2026

miscco added 3 commits March 11, 2026 17:59

Drop excec policy in thrust benchmark

9fbe792

Make cuda::execution::par_unseq public

08c4de5

Expose the PSTL algorithms through <cuda/std/execution>

656544d

miscco force-pushed the expose_pstl branch from 5d3fc09 to 656544d Compare March 11, 2026 17:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose PSTL algorithms through `<cuda/std/algorithm>` and `<cuda/std/numeric>`#7931

Expose PSTL algorithms through `<cuda/std/algorithm>` and `<cuda/std/numeric>`#7931
miscco wants to merge 3 commits intoNVIDIA:mainfrom
miscco:expose_pstl

miscco commented Mar 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

bernhardmgruber Mar 9, 2026

Uh oh!

miscco Mar 9, 2026

Uh oh!

This comment has been minimized.

bernhardmgruber Mar 10, 2026

Uh oh!

bernhardmgruber Mar 10, 2026

Uh oh!

jrhemstad Mar 10, 2026

Uh oh!

miscco Mar 11, 2026

Uh oh!

bernhardmgruber Mar 11, 2026

Uh oh!

This comment has been minimized.

bernhardmgruber commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	const auto policy = cuda::execution::gpu.with_memory_resource(device_resource).with_stream(stream);
	const auto policy = cuda::execution::gpu.with_memory_resource(device_resource).on_stream(stream);

Conversation

miscco commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

bernhardmgruber Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

miscco Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

bernhardmgruber Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

jrhemstad Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

miscco Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

bernhardmgruber commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

🥳 CI Workflow Results

🟩 Finished in 4h 24m: Pass: 100%/156 | Total: 7d 07h | Max: 4h 23m | Hits: 62%/369717

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

miscco commented Mar 9, 2026 •

edited

Loading