- 
                Notifications
    
You must be signed in to change notification settings  - Fork 287
 
Description
I'm running macOS 15.7.1 on an Apple M1 Max.
I tried using the _mm256_fmadd_pd intrinsic, but ends up getting compiled a a pair of 128-bit muls & adds.
This is strange, because the _mm_fmadd_pd intrinsic does properly compile to a single FMA.
This is problematic as it leads to a loss of performance.
It is also questionable to me if this is an acceptable translation, given the different rounding characteristics: in some applications that really rely on the precise rounding behavior of FMA, this can lead to very unexpected results.
My current workaround is as follows:
#ifdef _mm256_fmadd_pd
#undef _mm256_fmadd_pd
#endif
inline static __m256d
arm_fmadd_pd(__m256d a, __m256d b, __m256d c)
{
   simde__m256d_private
   r_,
   a_ = simde__m256d_to_private(a),
   b_ = simde__m256d_to_private(b),
   c_ = simde__m256d_to_private(c);
   r_.m128d[0] = _mm_fmadd_pd(a_.m128d[0], b_.m128d[0], c_.m128d[0]);
   r_.m128d[1] = _mm_fmadd_pd(a_.m128d[1], b_.m128d[1], c_.m128d[1]);
   return simde__m256d_from_private(r_);
}
#define _mm256_fmadd_pd(a, b, c) arm_fmadd_pd(a, b, c)  
It's not a great solution, because I'm relying on unddocmented interfaces within SIMDE.
This seems like something that could easily be fixed.
If it were, then I could ifdef my code so that it only uses this hack for older versions of SIMDE.
Note that I'm using a slightly older version of SIMDE from homebrew, but the problem also exists on the current SIMDE version on github.
Specifically, see line 86 of x86/fma.h.
The above code sequence could be incorporated there.
In fact, it seems like you already do this for _mm256_fnmadd_pd...not sure why it is done there but not for _mm256_fmadd_pd.
Thbanks!