Skip to content

@inbounds not propagating correctly #342

Open
@torrance

Description

@torrance

@inbounds applied against the kernel function definition has no effect.

Additionally, @inbounds does not propagate through function calls within a kernel, for example by calling zip().

The following benchmarks from https://github.com/torrance/AMDGPU-MWE/blob/main/inbounds.jl demonstrate the performance penalty. Note that the 3rd benchmark is likely doubly penalised since the call to zip() isn't inlined.

function @inbounds => @inbounds annotated at function definition
internal @inbounds => @inbounds annotated at lines with indexing operations
using zip() => using a zip() to iterate and index into arrays

Function @inbounds
BenchmarkTools.Trial: 18 samples with 1 evaluation.
 Range (min  max):  283.219 ms  287.235 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     283.964 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   284.278 ms ± 874.447 μs  ┊ GC (mean ± σ):  0.10% ± 0.29%

            ▁█                                                   
  ▄▁▁▁▁▁▁▁▄▄██▁▁▁▁▄▄▁▁▄▁▁▁▄▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
  283 ms           Histogram: frequency by time          287 ms <

 Memory estimate: 6.21 MiB, allocs estimate: 406760.

Internal @inbounds
BenchmarkTools.Trial: 36 samples with 1 evaluation.
 Range (min  max):  141.340 ms  141.616 ms  ┊ GC (min  max): 1.78%  0.00%
 Time  (median):     141.471 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   141.469 ms ±  69.181 μs  ┊ GC (mean ± σ):  0.10% ± 0.42%

           ▃      ▃▃ ▃        ▃▃ █        ▃                      
  ▇▁▁▁▁▁▁▇▇█▁▇▇▁▁▁██▇█▁▁▇▇▁▁▁▁██▇█▇▁▁▁▁▇▁▁█▇▁▇▇▁▁▁▇▁▇▁▇▁▁▁▁▇▁▁▇ ▁
  141 ms           Histogram: frequency by time          142 ms <

 Memory estimate: 3.06 MiB, allocs estimate: 200490.

Using zip()
BenchmarkTools.Trial: 16 samples with 1 evaluation.
 Range (min  max):  318.848 ms  319.049 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     318.942 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   318.950 ms ±  61.016 μs  ┊ GC (mean ± σ):  0.10% ± 0.28%

  ▁       ▁    ▁   █▁   ▁ ▁       ▁  ▁       ▁█        ▁     ▁▁  
  █▁▁▁▁▁▁▁█▁▁▁▁█▁▁▁██▁▁▁█▁█▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁█▁▁▁▁▁██ ▁
  319 ms           Histogram: frequency by time          319 ms <

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions