Add env based `DeviceFor::*` algorithms by gonidelis · Pull Request #7798 · NVIDIA/cccl

gonidelis · 2026-02-26T02:11:46Z

Adds env based DeviceFor::*

closed #7541

There are not deterministic guarantees imposed for the DeviceFor::* algorithms.

copy-pr-bot · 2026-02-26T02:11:50Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cub/cub/device/device_for.cuh

gonidelis · 2026-02-26T20:52:49Z

One thing that might kill backwards compatibility from unifying the env overloads with the old ones is that we tend to make new overloads [[nodiscard]]. Adding this to the "new" env overloads (which just substitute the old) might pop compile erros to users out of the blue. @bernhardmgruber @gevtushenko should I avoid doing the overloads in this pr nodiscard?

gonidelis · 2026-02-26T20:59:19Z

todo: add docs about memory_resource (?)

bernhardmgruber · 2026-02-27T15:07:05Z

One thing that might kill backwards compatibility from unifying the env overloads with the old ones is that we tend to make new overloads [[nodiscard]]. Adding this to the "new" env overloads (which just substitute the old) might pop compile erros to users out of the blue. @bernhardmgruber @gevtushenko should I avoid doing the overloads in this pr nodiscard?

Good point. Please omit the [[nodiscard]] in this case.

cub/cub/device/device_for.cuh

cub/test/catch2_test_device_for_api.cu

cub/test/catch2_test_device_for_env.cu

cub/test/catch2_test_device_for_env_api.cu

…ayout/Extents

…TX range guard in unit tests

gonidelis · 2026-03-06T02:02:14Z

Since we replaced the cudaStream_t stream parameter with EnvT env, we broke backwards compatibility and callers that relied on implicit conversion from nvbench::cuda_stream to cudaStream_t. The extents benchmark showcases that. I see three options:

a) We go back to only adding env overloads (not converting old ones) for DeviceFor
b) We check internally if passed 3rd argument (env) is nvbench::cuda_stream and do the conversion explicitly inside the body
c) We break, and ask for people that were relying on implicit conversion to explicitly .get_stream()

bernhardmgruber · 2026-03-06T08:58:14Z

Since we replaced the cudaStream_t stream parameter with EnvT env, we broke backwards compatibility and callers that relied on implicit conversion from nvbench::cuda_stream to cudaStream_t. The extents benchmark showcases that.

That's interesting, because we hit the same case in cub::DeviceTransform with RMM's stream type. We solved this in #7266 and added tests in #7278. What we apparently missed are types that are convertible to cudaStream_t and are not copyable, because then you can't pass them to the environment parameter.

Let me ask @miscco

bernhardmgruber · 2026-03-06T11:18:36Z

Let me ask @miscco

Started an internal Slack thread.

In the meantime, I came up with a fix and pushed a commit.

cub/cub/device/device_for.cuh

bernhardmgruber · 2026-03-09T08:10:07Z

/ok to test 712a5f3

bernhardmgruber · 2026-03-09T12:34:51Z

pre-commit.ci autofix

copy-pr-bot · 2026-03-09T12:36:08Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

bernhardmgruber · 2026-03-09T12:37:53Z

/ok to test ceda482

github-actions · 2026-03-09T16:29:48Z

🥳 CI Workflow Results

🟩 Finished in 3h 49m: Pass: 100%/249 | Total: 7d 02h | Max: 3h 48m | Hits: 84%/155556

See results here.

gonidelis · 2026-03-12T21:25:48Z

cub/cub/device/device_for.cuh

@@ -926,6 +1127,10 @@ public:
  //! .. versionadded:: 2.4.0
  //!    First appears in CUDA Toolkit 12.5.


😡😡😡😡😡

github-project-automation bot added this to CCCL Feb 26, 2026

github-project-automation bot moved this to Todo in CCCL Feb 26, 2026

cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Feb 26, 2026

gonidelis force-pushed the for_env branch 2 times, most recently from af9d8de to 59ab137 Compare February 26, 2026 02:13

bernhardmgruber reviewed Feb 26, 2026

View reviewed changes

cub/cub/device/device_for.cuh Outdated Show resolved Hide resolved

gonidelis marked this pull request as ready for review February 26, 2026 20:39

gonidelis requested a review from a team as a code owner February 26, 2026 20:39

gonidelis requested a review from davebayer February 26, 2026 20:39

cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Feb 26, 2026

gonidelis force-pushed the for_env branch from 78dea63 to f018a41 Compare February 26, 2026 20:53

This comment has been minimized.

Sign in to view

bernhardmgruber reviewed Feb 27, 2026

View reviewed changes

gonidelis added 8 commits March 5, 2026 11:49

Add env DeviceFor

98f7bfe

Unify env overloads with old non-temp-storage overloads

f2661cd

Fix disregarding stream in old APIs introduced bug and add ForEachInL…

0069333

…ayout/Extents

Revert return codes as they were to conform to upstream/main

94a0258

Remove nodiscard as it breaks backwards compatibility

f2c6924

Check return code in non env api test and fix stream calls

c3c816f

Remove 'above header needs to be included first' comment regarding NV…

3a3dc7e

…TX range guard in unit tests

Add environment literalinclude examples in the docs

b99d57b

gonidelis force-pushed the for_env branch from c999be5 to b99d57b Compare March 5, 2026 20:11

gonidelis requested a review from bernhardmgruber March 5, 2026 20:12

This comment has been minimized.

Sign in to view

Support non-copyable stream types

ee7311c

This comment has been minimized.

Sign in to view

bernhardmgruber reviewed Mar 6, 2026

View reviewed changes

cub/cub/device/device_for.cuh Outdated Show resolved Hide resolved

Apply suggestion from @bernhardmgruber

712a5f3

This comment has been minimized.

Sign in to view

bernhardmgruber approved these changes Mar 9, 2026

View reviewed changes

[pre-commit.ci] auto code formatting

ceda482

gonidelis merged commit d0befe5 into NVIDIA:main Mar 9, 2026
266 checks passed

github-project-automation bot moved this from In Review to Done in CCCL Mar 9, 2026

gonidelis commented Mar 12, 2026

View reviewed changes

This was referenced Mar 12, 2026

Fix versionadd's in DeviceFor #8016

Merged

Revamp enable_if constraints in all env overloads #8033

Open

		@@ -926,6 +1127,10 @@ public:
		//! .. versionadded:: 2.4.0
		//! First appears in CUDA Toolkit 12.5.

Conversation

gonidelis commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Feb 26, 2026

Uh oh!

Uh oh!

gonidelis commented Feb 26, 2026

Uh oh!

gonidelis commented Feb 26, 2026

Uh oh!

This comment has been minimized.

bernhardmgruber commented Feb 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

gonidelis commented Mar 6, 2026

Uh oh!

bernhardmgruber commented Mar 6, 2026

Uh oh!

bernhardmgruber commented Mar 6, 2026

Uh oh!

This comment has been minimized.

Uh oh!

bernhardmgruber commented Mar 9, 2026

Uh oh!

This comment has been minimized.

bernhardmgruber commented Mar 9, 2026

Uh oh!

copy-pr-bot bot commented Mar 9, 2026

Uh oh!

bernhardmgruber commented Mar 9, 2026

Uh oh!

github-actions bot commented Mar 9, 2026

🥳 CI Workflow Results

🟩 Finished in 3h 49m: Pass: 100%/249 | Total: 7d 02h | Max: 3h 48m | Hits: 84%/155556

Uh oh!

Uh oh!

gonidelis Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gonidelis commented Feb 26, 2026 •

edited

Loading