Conversation
|
benchmark? |
|
I did one and it turned out same! Working on it... |
|
Maybe that means the operations can't benefit from further compiler optimization. As for multithreading (the other feature of @.. thread=trueto make it multithreaded: https://github.com/YingboMa/FastBroadcast.jl But even that may not yield a speed up because it depends on the relative cost of transforms vs broadcasts. |
|
I think I may know part of the reasons. For For example, for my 5900x (8 threads), using BenchmarkTools
using FastBroadcast
# "Small" Array Test
A = abs.(rand(100,100,100));
B = copy(A);
C = copy(A);
@btime @. A = B + C;
# 185.929 μs (2 allocations: 64 bytes)
@btime @.. thread=true A = B + C;
# 28.839 μs (2 allocations: 64 bytes)
# "Large" Array Test
A = abs.(rand(300,300,300));
B = copy(A);
C = copy(A);
@btime @. A = B + C;
#21.720 ms (2 allocations: 64 bytes)
@btime @.. thread=true A = B + C;
#20.708 ms (2 allocations: 64 bytes)However, if we are under the compute-bound regime, such as @btime @. A = 0.1*log10(B) + 0.25*log(C);
# 240.630 ms (10 allocations: 288 bytes)
@btime @.. thread=true A = 0.1*log10(B) + 0.25*log(C);
#35.817 ms (10 allocations: 288 bytes)Nonetheless, for small 2D problem, I think |
|
@doraemonho thank you for that very clear explanation! It's fortunate there isn't a slowdown in the memory-bound regime... Either way, I suppose this shows that this package may be more important downstream (eg GeophysicalFlows or users of GeophysicalFlows), where we might encounter more complex operations like |
Closes #337