Open
Description
Describe the bug
Reductions with GPU broadcasting error with Enzyme. @wsmoses suggested I open an issue here.
To reproduce
The Minimal Working Example (MWE) for this bug:
using Enzyme, CUDA
f(x, y) = sum(x .+ y)
x = CuArray(rand(5))
y = CuArray(rand(5))
dx = CuArray([1.0, 0.0, 0.0, 0.0, 0.0])
autodiff(Reverse, f, Active, Duplicated(x, dx), Const(y))
ERROR: Enzyme execution failed.
Enzyme compilation failed.
No create nofree of empty function (jl_gc_safe_enter) jl_gc_safe_enter)
at context: call fastcc void @julia__launch_configuration_979_4373([2 x i64]* noalias nocapture nofree noundef nonnull writeonly sret([2 x i64]) align 8 dereferenceable(16) %7, i64 noundef signext 0, { i64, {} addrspace(10)* } addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(32) %45) #715, !dbg !1090 (julia__launch_configuration_979_4373)
Stacktrace:
[1] launch_configuration
@ ~/.julia/dev/CUDA/lib/cudadrv/occupancy.jl:56
[2] #launch_heuristic#1204
@ ~/.julia/dev/CUDA/src/gpuarrays.jl:22
[3] launch_heuristic
@ ~/.julia/dev/CUDA/src/gpuarrays.jl:15
[4] _copyto!
@ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:78
[5] copyto!
@ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:44
[6] copy
@ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:29
[7] materialize
@ ./broadcast.jl:903
[8] f
@ ./REPL[2]:1
Stacktrace:
[1] throwerr(cstr::Cstring)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:1797
[2] launch_configuration
@ ~/.julia/dev/CUDA/lib/cudadrv/occupancy.jl:56 [inlined]
[3] #launch_heuristic#1204
@ ~/.julia/dev/CUDA/src/gpuarrays.jl:22 [inlined]
[4] launch_heuristic
@ ~/.julia/dev/CUDA/src/gpuarrays.jl:15 [inlined]
[5] _copyto!
@ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:78 [inlined]
[6] copyto!
@ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:44 [inlined]
[7] copy
@ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:29 [inlined]
[8] materialize
@ ./broadcast.jl:903 [inlined]
[9] f
@ ./REPL[2]:1 [inlined]
[10] diffejulia_f_2820wrap
@ ./REPL[2]:0
[11] macro expansion
@ ~/.julia/dev/Enzyme/src/compiler.jl:6819 [inlined]
[12] enzyme_call
@ ~/.julia/dev/Enzyme/src/compiler.jl:6419 [inlined]
[13] CombinedAdjointThunk
@ ~/.julia/dev/Enzyme/src/compiler.jl:6296 [inlined]
[14] autodiff
@ ~/.julia/dev/Enzyme/src/Enzyme.jl:314 [inlined]
[15] autodiff(::ReverseMode{…}, ::typeof(f), ::Type{…}, ::Duplicated{…}, ::Const{…})
@ Enzyme ~/.julia/dev/Enzyme/src/Enzyme.jl:326
[16] top-level scope
@ REPL[6]:1
Some type information was truncated. Use `show(err)` to see complete types.
Forward mode also fails. This is with Julia 1.10.3, Enzyme 0.12.26, GPUCompiler 0.26.7 and CUDA d7077da.
Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 36 × Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, cascadelake)
Threads: 18 default, 0 interactive, 9 GC (on 36 virtual cores)
Environment:
LD_LIBRARY_PATH = /usr/local/gromacs/lib
Details on CUDA:
UDA runtime 12.5, artifact installation
CUDA driver 12.5
NVIDIA driver 535.183.1, originally for CUDA 12.2
CUDA libraries:
- CUBLAS: 12.5.3
- CURAND: 10.3.6
- CUFFT: 11.2.3
- CUSOLVER: 11.6.3
- CUSPARSE: 12.5.1
- CUPTI: 2024.2.1 (API 23.0.0)
- NVML: 12.0.0+535.183.1
Julia packages:
- CUDA: 5.4.3
- CUDA_Driver_jll: 0.9.1+1
- CUDA_Runtime_jll: 0.14.1+0
Toolchain:
- Julia: 1.10.3
- LLVM: 15.0.7
2 devices:
0: NVIDIA RTX A6000 (sm_86, 46.970 GiB / 47.988 GiB available)
1: NVIDIA RTX A6000 (sm_86, 4.046 GiB / 47.988 GiB available)