Open
Description
Looking at the SnoopCompile report, this is pirating the materialize function, causing latency problems after loading this package:
[1] materialize(bc::Base.Broadcast.Broadcasted{<:Any, <:Any, typeof(NNlib.relu), <:Tuple{CUDA.CuArray{<:Union{Float16, Float32, Float64}}}})
@ NNlibCUDA ~/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/activations.jl:28
[2] materialize(bc::Base.Broadcast.Broadcasted{<:Any, <:Any, typeof(NNlib.elu), <:Tuple{CUDA.CuArray{<:Union{Float16, Float32, Float64}}}})
@ NNlibCUDA ~/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/activations.jl:28
[3] materialize(bc::Base.Broadcast.Broadcasted{<:Any, <:Any, typeof(NNlib.σ), <:Tuple{CUDA.CuArray{<:Union{Float16, Float32, Float64}}}})
@ NNlibCUDA ~/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/activations.jl:28
[4] materialize(bc::Base.Broadcast.Broadcasted{<:Any, <:Any, typeof(tanh), <:Tuple{CUDA.CuArray{<:Union{Float16, Float32, Float64}}}})
@ NNlibCUDA ~/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/activations.jl:28
But if we look at the docs, the function for overloading is broadcasted
instead:
https://docs.julialang.org/en/v1/manual/interfaces/#man-interfaces-broadcasting