Implement cccl-rt kernel launch patterns example#5892

Open

davebayer wants to merge 1 commit intoNVIDIA:mainfrom

davebayer:cudax_ex1

Contributor

davebayer commented Sep 16, 2025 •

edited

Loading

Closes #5707.

github-project-automation bot added this to CCCL

github-project-automation bot moved this to Todo in CCCL

Contributor

copy-pr-bot bot commented Sep 16, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cccl-authenticator-app bot moved this from Todo to In Progress in CCCL

davebayer force-pushed the cudax_ex1 branch 3 times, most recently from 0883fc1 to 3ce97db Compare

September 19, 2025 16:21

miscco approved these changes

View reviewed changes

Contributor

miscco left a comment

Looks great 🎉

cudax/examples/cccl-rt/CMakeLists.txt Outdated

Comment on lines +23 to +28

+                target_compile_options(${example_target} PRIVATE
+                  $<$<COMPILE_LANG_AND_ID:CUDA,NVIDIA>:--extended-lambda>
+                )
+                target_compile_definitions(${example_target} PRIVATE
+                  "LIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE"
+                )

Contributor

miscco Sep 22, 2025

Shouldnt those already be part of the cudax global target?

cudax/examples/cccl-rt/kernel_launch_patterns.cu Outdated

Comment on lines +39 to +40

		template <cuda::std::size_t N>
		name_buffer(const char (&str)[N])

Contributor

miscco Sep 22, 2025

Nitpick: Should this rather be a constructor from a cuda::std::span?

github-project-automation bot moved this from In Progress to In Review in CCCL

pciolkosz approved these changes

View reviewed changes

cudax/examples/cccl-rt/kernel_launch_patterns.cu Outdated

+              };
+              #if defined(__CUDACC_EXTENDED_LAMBDA__)
+              // Kernel lambda is another form of the kernel functor. It can optionally take the kernel_config as the first argument.

Contributor

pciolkosz Oct 29, 2025

Unfortunately due to extended lambda restrictions it has to take the configuration as the first argument. We have no way to inspect the lambda signature to check if it does, so I decided to require the configuration to be passed

jrhemstad reviewed

View reviewed changes

cudax/examples/cccl-rt/kernel_launch_patterns.cu Outdated Show resolved Hide resolved

jrhemstad reviewed

View reviewed changes

cudax/examples/cccl-rt/kernel_launch_patterns.cu Outdated

+                  // The main advantage of using kernel config instead of the ordinary kernel parameters is that the config can carry
+                  // statically defined extents. That means it is easier to generate kernels specialized for certain block sizes.
+                  static_assert(

Collaborator

jrhemstad Oct 29, 2025

suggestion: I'd love to see an example that uses C++20 concepts/requires clause to statically constrain the operator() to a specific block size. I think this is where the ability to statically specify and query grid extents really shines.

jrhemstad reviewed

View reviewed changes

cudax/examples/cccl-rt/kernel_launch_patterns.cu Outdated

Comment on lines +183 to +187

+                // Launch a kernel functor that takes a cudax::kernel_config. Note that the kernel config is passed automatically as
+                // the first argument by the cudax::launch function.
+                const auto config =
+                  cudax::make_config(hierarchy, cudax::dynamic_shared_memory<kernel_functor_with_config::dynamic_smem_layout>());
+                cudax::launch(stream, config, kernel_functor_with_config{}, name_buffer{"kernel functor with config"});

Collaborator

jrhemstad Oct 29, 2025

suggestion: It's not clear what the purpose of the cudax::dynamic_shared_memory config option is. I think it needs more explanation both here and in the kernel itself.

jrhemstad reviewed

View reviewed changes

cudax/examples/cccl-rt/kernel_launch_patterns.cu Outdated Show resolved Hide resolved

davebayer force-pushed the cudax_ex1 branch from 3ce97db to ed14c0f Compare

October 29, 2025 14:15

jrhemstad reviewed

View reviewed changes

cudax/examples/cccl-rt/kernel_launch_patterns.cu Outdated Show resolved Hide resolved

jrhemstad reviewed

View reviewed changes

cudax/examples/cccl-rt/kernel_launch_patterns.cu Outdated

		#include <stdexcept>

		#include <cuda.h>

Collaborator

jrhemstad Oct 29, 2025

suggestion: Similar comment to the other examples, add a block summarizing what this example is for and what it does.

jrhemstad reviewed

View reviewed changes

cudax/examples/cccl-rt/kernel_launch_patterns.cu Outdated

Comment on lines +208 to +213

+                // Launch a kernel functor that takes a cudax::kernel_config. Note that the kernel config is passed automatically as
+                // the first argument by the cudax::launch function.
+                //
+                // The kernel functor requires dynamic memory, so we need to create the kernel configuration with dynamic shared
+                // memory option. The config remembers the type passed inside the option and makes it the return type of the
+                // cudax::dynamic_smem_ref(config) call inside the device code. See demo_dynamic_shared_memory for more information.

Collaborator

jrhemstad Oct 29, 2025

suggestion: Lets fork off the dynamic_shared_memory to a separate example where we can dive into the nuance required here. To properply motivate this, we first need to explain why you can't just write code like this:

template <typename T>
__global__ void foo(T i){
    extern __shared__ T dyn_shmem[];
}

jrhemstad reviewed

View reviewed changes

cudax/examples/cccl-rt/kernel_launch_patterns.cu Outdated

		@@ -0,0 +1,254 @@
		//===----------------------------------------------------------------------===//

Collaborator

jrhemstad Oct 29, 2025

suggestion: On second thought, to make these examples easier to digest, I would suggest breaking this up into separate examples, one for each type of kernel launch. Otherwise for first time readers, it can be overwhelming and confusing to understand what parts are relevant.

Collaborator

jrhemstad Oct 29, 2025 •

edited

Loading

I'd suggest adding a cccl-rt/kernel_launch/ directory and add the separate example .cu files there.

jrhemstad reviewed

View reviewed changes

cudax/examples/cccl-rt/kernel_launch_patterns.cu Outdated

+              namespace cudax = cuda::experimental;
+              // A helper type for storing kernel launch patter name.
+              struct name_buffer

Collaborator

jrhemstad Oct 29, 2025

suggestion: I get the motivation for this, but I believe in an example any extra noise should be eliminated as much as possible to avoid distracting from the thing we are trying to teach. I would suggest getting rid of the name_buffer thing.

Contributor Author

davebayer Mar 2, 2026

I extracted this to common.cuh header

davebayer force-pushed the cudax_ex1 branch from 1262576 to 4c52d96 Compare

February 27, 2026 12:52

Collaborator

jrhemstad commented Feb 27, 2026

suggestion: For the launch pattern examples, I think users will appreciate opinionated guidance we can give them on which option is "best". For example, we'd want to explain to people that writing their kernel using the Config argument as the first argument is the best option and what benefits it gives them.

Contributor

pciolkosz commented Feb 27, 2026

I would add two more cases, one that shows how to use the dynamic_shared_memory() option and one with default config attached to the kernel functor

davebayer force-pushed the cudax_ex1 branch from 4c52d96 to e610b29 Compare

March 2, 2026 13:31

pciolkosz approved these changes

View reviewed changes

pciolkosz reviewed

View reviewed changes

examples/ccclrt/kernel_launch_patterns/kernel_functor.cu Outdated Show resolved Hide resolved


          Implement cccl-rt kernel launch paterns example

9a88e0e

davebayer force-pushed the cudax_ex1 branch from 0cfb472 to 9a88e0e Compare

March 5, 2026 18:28

davebayer marked this pull request as ready for review

March 5, 2026 18:30

davebayer requested a review from a team as a code owner

March 5, 2026 18:30

davebayer requested review from alliepiper and jrhemstad

March 5, 2026 18:30

Contributor

github-actions bot commented Mar 5, 2026

😬 CI Workflow Results

🟥 Finished in 20m 09s: Pass: 11%/9 | Total: 27m 56s | Max: 4m 15s

See results here.

jrhemstad reviewed

View reviewed changes

examples/ccclrt/kernel_launch_patterns/README.md

		@@ -0,0 +1,3 @@
		# Kernel Launch Patterns

		This example showcases how kernels and kernel functors can be launched using the `cuda::launch` function.

Collaborator

jrhemstad Mar 10, 2026

suggestion: It would help to enumerate and link to the different examples and provide a succinct summary of each example.

bernhardmgruber changed the title ~~Implement cccl-rt kernel launch paterns example~~ Implement cccl-rt kernel launch patterns example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet