For some time we've lamented that while FFTW is multithreaded, broadcasting operations such as
|
@. sol += clock.dt * (equation.L * sol + ts.N) |
are not.
FastBroadcast.jl provides an answer:
https://github.com/YingboMa/FastBroadcast.jl
It seems that even without multithreading, it can speed things up.