Add inline and inbounds annotations to unsafe_dot by jacobga1998 · Pull Request #639 · JuliaDSP/DSP.jl

jacobga1998 · 2025-07-09T12:07:49Z

Hello,

I was using a FIRInterpolation filter and found that it was a bottleneck in my code. When profiling I realized that the compiler wasn't inlining the unsafe_dot function, and subsequently that there was a missing inbounds for the first multiplication. In the benchmark below the execution time after adding @inline and @inbounds is about half. I also tried longer filters, and decimation kernels, and didn't find any difference in the execution time, but I added the annotation to the other functions which did not already have them in case there is a scenario where it matters.

Let me know if you think this is worth merging.

kernel = rand(10);

interp_fil = FIRFilter(kernel, 8);
data = rand(ComplexF64, 10^6);

output = similar(data, 8 * 10^6);

@benchmark filt!(output, interp_fil, data)

Before PR(julia 1.11)

julia> @benchmark filt!(output, interp_fil, data)
BenchmarkTools.Trial: 124 samples with 1 evaluation per sample.
 Range (min … max):  35.798 ms … 45.617 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     40.198 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   40.403 ms ±  1.944 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

                     ▂ ▂▆▂  ▆▅▂█    ▂
  ▄▁▁▄▄▅▁▁▄▄▁▁▄▅▄▄▅▄██▇███▅██████▇▅███▄▅▁▅▁▁▅▄▁▄▁▁▅▁▄▄▄▄▄▁▅▅▄ ▄
  35.8 ms         Histogram: frequency by time        45.3 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

After PR

julia> @benchmark filt!(output, interp_fil, data)
BenchmarkTools.Trial: 228 samples with 1 evaluation per sample.
 Range (min … max):  19.426 ms … 27.220 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     21.760 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   21.913 ms ±  1.515 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃  ▂  ▃ ▂▄  ▄█▃▅ ▇▃▄▃▅▃▃▃▃   ▃ ▃
  █▅▃█▇▆██████████▇██████████▇▇█▆█▆▆▆▁▅▇▃▃▃▃▁▁▁▃▅▁▃▃▃▃▁▁▁▁▃▁▅ ▅
  19.4 ms         Histogram: frequency by time        26.7 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

codecov · 2025-07-09T12:13:10Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.13%. Comparing base (0660d55) to head (8253510).
Report is 1 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #639   +/-   ##
=======================================
  Coverage   98.13%   98.13%           
=======================================
  Files          19       19           
  Lines        3277     3277           
=======================================
  Hits         3216     3216           
  Misses         61       61

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

martinholters · 2025-07-09T12:17:08Z

Any idea how much difference the @inline and the @inbounds each make on their own?

jacobga1998 · 2025-07-09T12:22:54Z

It is mostly the inline. Rerunning the benchmark:

Without the @inbounds:
julia> @benchmark filt!(output, interp_fil, data)
BenchmarkTools.Trial: 190 samples with 1 evaluation per sample.
Range (min … max): 21.867 ms … 44.836 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 24.969 ms ┊ GC (median): 0.00%
Time (mean ± σ): 26.243 ms ± 3.886 ms ┊ GC (mean ± σ): 0.00% ± 0.00%

 ▇ █▇▄▄▄ ▄▃▁

▅▅▆█▆█████▇████▇▅▁▃▃▁▄▆▁▁▄▃▆▃▅▅▅▅▄▄▃▄▁▅▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▃▁▁▃▃ ▃
21.9 ms Histogram: frequency by time 39.7 ms <

Memory estimate: 16 bytes, allocs estimate: 1.

Without the @inline

julia> @benchmark filt!(output, interp_fil, data)
BenchmarkTools.Trial: 124 samples with 1 evaluation per sample.
Range (min … max): 33.064 ms … 54.623 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 39.826 ms ┊ GC (median): 0.00%
Time (mean ± σ): 40.553 ms ± 4.594 ms ┊ GC (mean ± σ): 0.00% ± 0.00%

       ▁  ▅   █  ▂▄  ▄

▃▁▅█▁▅▅▆██▆▅█▆█▅█▅███▆▆██▃▆▃▁▃▃▁▆▃▃▅▁▃▁▅▅▆▃▁▅▃▁▃▁▃▁▅▁▁▁▁▁▁▅ ▃
33.1 ms Histogram: frequency by time 53.6 ms <

Memory estimate: 16 bytes, allocs estimate: 1.

martinholters · 2025-07-09T12:27:44Z

Thanks. In my opinion, overriding the compiler heuristics needs good justification, and those benchmarks surely qualify. The additional @inbounds seem about as (un)safe as the existing ones, so seems fair to add them, too.

wheeheee · 2025-07-09T12:30:42Z

But if this helps only interpolation, maybe we can use call-site inlining for just that?

martinholters · 2025-07-09T13:28:17Z

But if this helps only interpolation, maybe we can use call-site inlining for just that?

Good point. Can you try that, @jacobga1998?

jacobga1998 · 2025-07-09T14:11:57Z

Seems to give the same benchmarks, I uploaded it.

wheeheee

Just a style preference, I like the inline macro right beside the function called. Other than that, LGTM. Also, a little benchmarking suggests to me that BLAS.dot is slower than the Julia version for small kernel lengths, but that's a separate issue and there is no urgent need to change that.

Co-authored-by: wheeheee <104880306+wheeheee@users.noreply.github.com>

jacobga1998 · 2025-07-10T06:37:54Z

Agreed, that looks nicer. Added the suggestions.

Add inline and inbounds annotations to unsafe_dot

585add6

martinholters approved these changes Jul 9, 2025

View reviewed changes

jacobga1998 added 2 commits July 9, 2025 16:09

Change unsafe_dot inline to callsite

ff983bb

Add inline for both insafe_dot calls in interpolation filt!

7fcbd9d

wheeheee approved these changes Jul 10, 2025

View reviewed changes

Comment thread src/Filters/stream_filt.jl Outdated

Comment thread src/Filters/stream_filt.jl Outdated

Move inline annotation in front of function call

8253510

Co-authored-by: wheeheee <104880306+wheeheee@users.noreply.github.com>

wheeheee merged commit 68341c4 into JuliaDSP:master Jul 10, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add inline and inbounds annotations to unsafe_dot#639

Add inline and inbounds annotations to unsafe_dot#639
wheeheee merged 4 commits into
JuliaDSP:masterfrom
jacobga1998:FIR_filt_optimization

jacobga1998 commented Jul 9, 2025 •

edited by wheeheee

Loading

Uh oh!

codecov Bot commented Jul 9, 2025 •

edited

Loading

Uh oh!

martinholters commented Jul 9, 2025

Uh oh!

jacobga1998 commented Jul 9, 2025

Uh oh!

martinholters commented Jul 9, 2025

Uh oh!

wheeheee commented Jul 9, 2025

Uh oh!

martinholters commented Jul 9, 2025

Uh oh!

jacobga1998 commented Jul 9, 2025

Uh oh!

wheeheee left a comment

Uh oh!

Uh oh!

Uh oh!

jacobga1998 commented Jul 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jacobga1998 commented Jul 9, 2025 • edited by wheeheee Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

martinholters commented Jul 9, 2025

Uh oh!

jacobga1998 commented Jul 9, 2025

Uh oh!

martinholters commented Jul 9, 2025

Uh oh!

wheeheee commented Jul 9, 2025

Uh oh!

martinholters commented Jul 9, 2025

Uh oh!

jacobga1998 commented Jul 9, 2025

Uh oh!

wheeheee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jacobga1998 commented Jul 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jacobga1998 commented Jul 9, 2025 •

edited by wheeheee

Loading

codecov Bot commented Jul 9, 2025 •

edited

Loading