Description
Hi all! Thank you for developing this library. It is super useful in multiple projects!
I noticed something that might be an issue. But I am not sure.
I have a code where I multiply a complex valued (a) array with a real valued array (ker):
Basically, I need to multiply each element of 'a' twice.
My code is as follows:
auto func(real a1, real a2 complex ker):
// this trick halves the number of loads for ker also the reason why I use a1 and a2 instead of a
const auto low = xsimd::zip_lo(ker, ker);
const auto high= xsimd::zip_hi(ker, ker);
const auto res0 = a1 * low;
const auto res1 = a2 * high;
what I noticed is that the original implementation of reduce_add on my machine can be optimized. Is it possible to have a split function that returns low and hi? By doing split + add multiple times my code is 7 times faster.
I have pushed the benchmarks here:
https://github.com/DiamonDinoia/cpp-learning/tree/master/xsimd
it results in the following performance:
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
6.96 | 143,690,879.59 | 0.6% | 19.00 | 21.47 | 0.885 | 0.00 | 0.0% | 0.01 | add+store |
2.31 | 432,949,727.65 | 0.6% | 24.00 | 7.11 | 3.374 | 0.00 | 0.0% | 0.01 | hsum |
3.81 | 262,211,901.24 | 0.1% | 36.00 | 11.75 | 3.064 | 2.00 | 0.0% | 0.01 | reduce_add |
2.59 | 385,491,672.62 | 0.2% | 20.00 | 7.99 | 2.503 | 0.00 | 0.0% | 0.01 | union pun |
1.18 | 846,618,297.70 | 0.9% | 17.00 | 3.64 | 4.672 | 0.00 | 0.0% | 0.01 | double union pun |
I tweaked master a bit in https://github.com/DiamonDinoia/xsimd/tree/hadd-tweaks
and I got:
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
7.00 | 142,933,991.35 | 0.9% | 19.00 | 21.50 | 0.884 | 0.00 | 0.0% | 0.01 | add+store |
2.27 | 439,741,444.70 | 0.9% | 24.00 | 6.99 | 3.434 | 0.00 | 0.0% | 0.01 | hsum |
2.99 | 334,267,996.40 | 1.5% | 36.00 | 9.15 | 3.935 | 2.00 | 0.0% | 0.01 | reduce_add |
2.09 | 478,101,632.03 | 1.2% | 28.00 | 6.44 | 4.346 | 2.00 | 0.0% | 0.01 | union pun |
1.05 | 956,625,856.43 | 1.6% | 17.00 | 3.21 | 5.289 | 0.00 | 0.0% | 0.01 | double union pun |
Thanks,
Marco