Skip to content

Enzyme: Support for reductions with GPU broadcasting #2455

Open
@jgreener64

Description

@jgreener64

Describe the bug

Reductions with GPU broadcasting error with Enzyme. @wsmoses suggested I open an issue here.

To reproduce

The Minimal Working Example (MWE) for this bug:

using Enzyme, CUDA
f(x, y) = sum(x .+ y)
x = CuArray(rand(5))
y = CuArray(rand(5))
dx = CuArray([1.0, 0.0, 0.0, 0.0, 0.0])
autodiff(Reverse, f, Active, Duplicated(x, dx), Const(y))
ERROR: Enzyme execution failed.
Enzyme compilation failed.

No create nofree of empty function (jl_gc_safe_enter) jl_gc_safe_enter)
 at context:   call fastcc void @julia__launch_configuration_979_4373([2 x i64]* noalias nocapture nofree noundef nonnull writeonly sret([2 x i64]) align 8 dereferenceable(16) %7, i64 noundef signext 0, { i64, {} addrspace(10)* } addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(32) %45) #715, !dbg !1090 (julia__launch_configuration_979_4373)

Stacktrace:
 [1] launch_configuration
   @ ~/.julia/dev/CUDA/lib/cudadrv/occupancy.jl:56
 [2] #launch_heuristic#1204
   @ ~/.julia/dev/CUDA/src/gpuarrays.jl:22
 [3] launch_heuristic
   @ ~/.julia/dev/CUDA/src/gpuarrays.jl:15
 [4] _copyto!
   @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:78
 [5] copyto!
   @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:44
 [6] copy
   @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:29
 [7] materialize
   @ ./broadcast.jl:903
 [8] f
   @ ./REPL[2]:1


Stacktrace:
  [1] throwerr(cstr::Cstring)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:1797
  [2] launch_configuration
    @ ~/.julia/dev/CUDA/lib/cudadrv/occupancy.jl:56 [inlined]
  [3] #launch_heuristic#1204
    @ ~/.julia/dev/CUDA/src/gpuarrays.jl:22 [inlined]
  [4] launch_heuristic
    @ ~/.julia/dev/CUDA/src/gpuarrays.jl:15 [inlined]
  [5] _copyto!
    @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:78 [inlined]
  [6] copyto!
    @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:44 [inlined]
  [7] copy
    @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:29 [inlined]
  [8] materialize
    @ ./broadcast.jl:903 [inlined]
  [9] f
    @ ./REPL[2]:1 [inlined]
 [10] diffejulia_f_2820wrap
    @ ./REPL[2]:0
 [11] macro expansion
    @ ~/.julia/dev/Enzyme/src/compiler.jl:6819 [inlined]
 [12] enzyme_call
    @ ~/.julia/dev/Enzyme/src/compiler.jl:6419 [inlined]
 [13] CombinedAdjointThunk
    @ ~/.julia/dev/Enzyme/src/compiler.jl:6296 [inlined]
 [14] autodiff
    @ ~/.julia/dev/Enzyme/src/Enzyme.jl:314 [inlined]
 [15] autodiff(::ReverseMode{…}, ::typeof(f), ::Type{…}, ::Duplicated{…}, ::Const{…})
    @ Enzyme ~/.julia/dev/Enzyme/src/Enzyme.jl:326
 [16] top-level scope
    @ REPL[6]:1
Some type information was truncated. Use `show(err)` to see complete types.

Forward mode also fails. This is with Julia 1.10.3, Enzyme 0.12.26, GPUCompiler 0.26.7 and CUDA d7077da.

Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 36 × Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, cascadelake)
Threads: 18 default, 0 interactive, 9 GC (on 36 virtual cores)
Environment:
  LD_LIBRARY_PATH = /usr/local/gromacs/lib

Details on CUDA:

UDA runtime 12.5, artifact installation
CUDA driver 12.5
NVIDIA driver 535.183.1, originally for CUDA 12.2

CUDA libraries: 
- CUBLAS: 12.5.3
- CURAND: 10.3.6
- CUFFT: 11.2.3
- CUSOLVER: 11.6.3
- CUSPARSE: 12.5.1
- CUPTI: 2024.2.1 (API 23.0.0)
- NVML: 12.0.0+535.183.1

Julia packages: 
- CUDA: 5.4.3
- CUDA_Driver_jll: 0.9.1+1
- CUDA_Runtime_jll: 0.14.1+0

Toolchain:
- Julia: 1.10.3
- LLVM: 15.0.7

2 devices:
  0: NVIDIA RTX A6000 (sm_86, 46.970 GiB / 47.988 GiB available)
  1: NVIDIA RTX A6000 (sm_86, 4.046 GiB / 47.988 GiB available)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestextensionsStuff about package extensions.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions