Open
Description
I am using NNlibCUDA.maxpool
to calculate a sliding window maximum (I know there may be other/better ways of doing it). Unfortunately it fails catastrophically in some interesting cases. I will attach a MWE where I use a (8, 1, 1, 1) CuArray and a (5, 3) kernel, but in reality I use a (320001, 32) CuArray and a (2049, 3) kernel. I do not see the same behaviour when using NNlib and native arrays.
using CUDA
using NNlib
using NNlibCUDA
N = (8, 3, 1, 1)
K = (5, 3)
x = rand(N...)
x_c = CUDA.rand(N...)
nnlib = maxpool(x, K; pad=Tuple(k÷2 for k ∈ K), stride=(1, 1))
nnlib_cuda = maxpool(x_c, K; pad=Tuple(k÷2 for k ∈ K), stride=(1, 1))
@assert maximum(nnlib) == maximum(x)
@assert maximum(nnlib_cuda) == maximum(x_c)
From Project.toml
:
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
NNlibCUDA = "a00861dc-f156-4864-bf3c-e6376f28a68d"
Please let me know if you need any further information.