Open
Description
Hi!
Using the package flux I want to scatter the following using NNlib:
using Flux
NNlib.scatter(+, [SVector(1,1,1),SVector(1,1,1),SVector(1,1,1)], [3,1,2])
3-element Vector{SVector{3, Int64}}:
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
Which works no problem. If I change the mid array to CuArray, then it works again, but tested that it is slow for large arrays (60k):
NNlib.scatter(+, CuArray([SVector(1,1,1),SVector(1,1,1),SVector(1,1,1)]), [3,1,2])
3-element CuArray{SVector{3, Int64}, 1, CUDA.Mem.DeviceBuffer}:
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
If I try to do everything on GPU:
NNlib.scatter(+, CuArray([SVector(1,1,1),SVector(1,1,1),SVector(1,1,1)]), CuArray([3,1,2]))
ERROR: InvalidIRError: compiling kernel #scatter_kernel!(typeof(+), CuDeviceVector{SVector{3, Int64}, 1}, CuDeviceVector{SVector{3, Int64}, 1}, CuDeviceVector{Int64, 1}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to atomic_cas!)
Which I think is an error?
More info: https://discourse.julialang.org/t/how-to-reduce-an-array/92945/14
Kind regards