Open
Description
I seemed to remember that finalize
is slow, and that is why we implemented our own refcounting and provided unsafe_free!
. However, the cost seems manageable:
julia> @benchmark finalize(a) setup=(a=CuArray([1]))
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
Range (min … max): 18.506 ns … 36.669 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 19.458 ns ┊ GC (median): 0.00%
Time (mean ± σ): 19.489 ns ± 0.536 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▅█
▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▁▂▂▂▂▁▂▂▂▂▃▇▇▄▃███▅▃▃▃▃▂▂▂▂▂▁▂▁▂ ▃
18.5 ns Histogram: frequency by time 19.8 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark CUDA.unsafe_free!(a) setup=(a=CuArray([1]))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 3.010 ns … 18.370 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 3.080 ns ┊ GC (median): 0.00%
Time (mean ± σ): 3.093 ns ± 0.292 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▅ ▁
▂▁▁▁▂▁▁▁▂▁▁▁▂▃▁▁▁▃▁▁▁▂▆▁▁▁█▁▁▁▂█▁▁▁█▁▁▁▂▆▁▁▁▄▁▁▁▂▃▁▁▁▃▁▁▁▂ ▂
3.01 ns Histogram: frequency by time 3.14 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
@gbaraldi @vchuravy Thoughts? Does the cost maybe only manifest when the GC is loaded?