Skip to content

Commit 9ed35ab

Browse files
authored
Refactor call_device with C++17 Fold Expressions and Fix launch_global Declaration Order (#5480)
## Summary This PR refactors the variadic device-side kernel helper `call_device` using C++17 fold expressions and reorders its definition in `Src/Base/AMReX_GpuLaunch.H`. Specifically: 1. Moved the definition of `call_device` above `launch_global` to ensure it is visible during Phase 1 of template parsing. 2. Replaced the traditional recursive template overloads of `call_device` with a native C++17 unary right fold expression over the comma operator (`(fs(), ...);`). ## Additional background Currently, `launch_global` invokes `call_device(fs...)` before `call_device` is fully defined or declared. This layout introduces brittleness to **two-phase name lookup**. If a user attempts to pass multiple lambdas belonging to the global/anonymous namespace (which is common in isolated unit tests or external mock environments), the compiler fails to resolve `call_device` via Argument-Dependent Lookup (ADL), throwing a compilation error. ```cpp auto f1 = [=] __device__ () { ... }; auto f2 = [=] __device__ () { ... }; auto f3 = [=] __device__ () { ... }; AMREX_LAUNCH_KERNEL_NOBOUND(1, 1, 0, 0, f1, f2, f3); // Fails to compile! ``` ERROR: all to function "call_device" that is neither visible in the template definition nor found by argument-dependent lookup This defect didn't surface within AMReX before because almost all internal production codes pass a **single** lambda function to `AMREX_LAUNCH_KERNEL`, which causes the compiler to optimize out or skip the recursive invocation during instantiation. The new C++17 fold expression cleanly handles both empty parameter packs (safely evaluating to `void()`) and multiple lambdas while perfectly preserving the original signature of `launch_global`. The fix has been verified under CUDA 12.0+. ## Checklist The proposed changes: - [x] fix a bug or incorrect behavior in AMReX - [ ] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate
1 parent 61f7201 commit 9ed35ab

1 file changed

Lines changed: 9 additions & 8 deletions

File tree

Src/Base/AMReX_GpuLaunch.H

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -53,19 +53,20 @@ namespace amrex {
5353
// Variadic lambda function wrappers for C++ CUDA/HIP Kernel calls.
5454

5555
#if defined(AMREX_USE_CUDA) || defined(AMREX_USE_HIP)
56-
template<class L, class... Lambdas>
57-
AMREX_GPU_GLOBAL void launch_global (L f0, Lambdas... fs) { f0(); call_device(fs...); }
5856

5957
/// \cond DOXYGEN_IGNORE
60-
template<class L>
61-
AMREX_GPU_DEVICE void call_device (L&& f0) noexcept { f0(); }
6258

63-
template<class L, class... Lambdas>
64-
AMREX_GPU_DEVICE void call_device (L&& f0, Lambdas&&... fs) noexcept {
65-
f0();
66-
call_device(std::forward<Lambdas>(fs)...);
59+
template <typename... Lambdas>
60+
AMREX_GPU_DEVICE void call_device (Lambdas&&... fs) noexcept
61+
{
62+
(std::forward<Lambdas>(fs)(), ...);
6763
}
64+
6865
/// \endcond
66+
67+
template<class L, class... Lambdas>
68+
AMREX_GPU_GLOBAL void launch_global (L f0, Lambdas... fs) { f0(); call_device(fs...); }
69+
6970
#endif
7071

7172
// CPU variation

0 commit comments

Comments
 (0)