Open
Description
@inbounds
applied against the kernel function definition has no effect.
Additionally, @inbounds
does not propagate through function calls within a kernel, for example by calling zip()
.
The following benchmarks from https://github.com/torrance/AMDGPU-MWE/blob/main/inbounds.jl demonstrate the performance penalty. Note that the 3rd benchmark is likely doubly penalised since the call to zip()
isn't inlined.
function @inbounds
=> @inbounds
annotated at function definition
internal @inbounds
=> @inbounds
annotated at lines with indexing operations
using zip()
=> using a zip()
to iterate and index into arrays
Function @inbounds
BenchmarkTools.Trial: 18 samples with 1 evaluation.
Range (min … max): 283.219 ms … 287.235 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 283.964 ms ┊ GC (median): 0.00%
Time (mean ± σ): 284.278 ms ± 874.447 μs ┊ GC (mean ± σ): 0.10% ± 0.29%
▁█
▄▁▁▁▁▁▁▁▄▄██▁▁▁▁▄▄▁▁▄▁▁▁▄▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
283 ms Histogram: frequency by time 287 ms <
Memory estimate: 6.21 MiB, allocs estimate: 406760.
Internal @inbounds
BenchmarkTools.Trial: 36 samples with 1 evaluation.
Range (min … max): 141.340 ms … 141.616 ms ┊ GC (min … max): 1.78% … 0.00%
Time (median): 141.471 ms ┊ GC (median): 0.00%
Time (mean ± σ): 141.469 ms ± 69.181 μs ┊ GC (mean ± σ): 0.10% ± 0.42%
▃ ▃▃ ▃ ▃▃ █ ▃
▇▁▁▁▁▁▁▇▇█▁▇▇▁▁▁██▇█▁▁▇▇▁▁▁▁██▇█▇▁▁▁▁▇▁▁█▇▁▇▇▁▁▁▇▁▇▁▇▁▁▁▁▇▁▁▇ ▁
141 ms Histogram: frequency by time 142 ms <
Memory estimate: 3.06 MiB, allocs estimate: 200490.
Using zip()
BenchmarkTools.Trial: 16 samples with 1 evaluation.
Range (min … max): 318.848 ms … 319.049 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 318.942 ms ┊ GC (median): 0.00%
Time (mean ± σ): 318.950 ms ± 61.016 μs ┊ GC (mean ± σ): 0.10% ± 0.28%
▁ ▁ ▁ █▁ ▁ ▁ ▁ ▁ ▁█ ▁ ▁▁
█▁▁▁▁▁▁▁█▁▁▁▁█▁▁▁██▁▁▁█▁█▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁█▁▁▁▁▁██ ▁
319 ms Histogram: frequency by time 319 ms <