Unconditional errors result in dynamic invocations

**Describe the bug**

Any use of `shfl_sync` throws an error saying `shfl_recurse` is a dynamic function.


**To reproduce**

The Minimal Working Example (MWE) for this bug:

Attempting to do a stream compaction:
```julia
using CUDA

# define a new arrays of 64 elements, and fill it with random ones and zeros
a = rand(0:1, 64)

a_gpu = CuArray(a)
b_gpu = CUDA.zeros(Int64, 64)
count = CUDA.zeros(Int64, 1)

function mykernel!(in, out, count)
	threadNum = threadIdx().x + blockDim().x * (blockIdx().x-1) # 1-indexed
	warpNum = (threadIdx().x - 1) ÷ 32 # 0-indexed
	laneNum = (threadIdx().x - 1) % 32 # 0-indexed

    shared_count = CuDynamicSharedArray(Int64, 1)
    
    if threadNum == 1
        shared_count[1] = 0
    end
    sync_threads()

    if threadNum <= 64
        is_nonzero = in[threadNum] != 0
        mask = CUDA.vote_ballot_sync(0xffffffff, is_nonzero)
        warp_count = count_ones(mask)

        warp_offset = 0
        if laneNum == 0
            warp_offset = CUDA.atomic_add!(pointer(shared_count, 1), warp_count)
        end
        warp_offset = CUDA.shfl_sync(0xffffffff, warp_offset, Int32(0)) #<<<<< This is the BUG code.

        if is_nonzero
            index = count_ones(mask & ((1u << laneNum) - 1)) + warp_offset
            out[index+1] = threadNum
        end
    end
    sync_threads()

    if threadIdx().x == 1
        CUDA.atomic_add!(CUDA.pointer(count), shared_count[1])
    end
	return
end

@cuda threads = 64 blocks = 1 shmem=sizeof(Int64) mykernel!(a_gpu, b_gpu, count)

println("nonzeros:$(collect(count))")
println(collect(b_gpu))
```

<details><summary>Manifest.toml</summary>
<p>

```
Package versions:
Status `~/.julia/environments/v1.11/Project.toml`
  [052768ef] CUDA v5.5.2
No Matches in `~/.julia/environments/v1.11/Project.toml`
No Matches in `~/.julia/environments/v1.11/Project.toml`
No Matches in `~/.julia/environments/v1.11/Project.toml`

CUDA details:
CUDA runtime version: 12.6.0
CUDA driver version: 12.6.0
CUDA capability: 9.0.0
```

</p>
</details>


**Expected behavior**

Expected behavior is that the shuffle function doesn't throw an error, and all zeros in `a` get removed when moved to `b`


**Version info**

Details on Julia:

```
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) Platinum 8462Y+
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, sapphirerapids)
Threads: 1 default, 0 interactive, 1 GC (on 64 virtual cores)
```

Details on CUDA:

```CUDA runtime 12.6, artifact installation
CUDA driver 12.6
NVIDIA driver 550.90.7

CUDA libraries: 
- CUBLAS: 12.6.3
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+550.90.7

Julia packages: 
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.3+0
- CUDA_Runtime_jll: 0.15.3+0

Toolchain:
- Julia: 1.11.1
- LLVM: 16.0.6

1 device:
  0: NVIDIA H100 80GB HBM3 (sm_90, 77.409 GiB / 79.647 GiB available)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unconditional errors result in dynamic invocations #649

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unconditional errors result in dynamic invocations #649

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions