Skip to content

suuport for _mm256_fmadd_pd on Apple Silicon #1349

@victorshoup

Description

@victorshoup

I'm running macOS 15.7.1 on an Apple M1 Max.
I tried using the _mm256_fmadd_pd intrinsic, but ends up getting compiled a a pair of 128-bit muls & adds.
This is strange, because the _mm_fmadd_pd intrinsic does properly compile to a single FMA.
This is problematic as it leads to a loss of performance.
It is also questionable to me if this is an acceptable translation, given the different rounding characteristics: in some applications that really rely on the precise rounding behavior of FMA, this can lead to very unexpected results.

My current workaround is as follows:

#ifdef _mm256_fmadd_pd
#undef _mm256_fmadd_pd
#endif

inline static __m256d
arm_fmadd_pd(__m256d a, __m256d b, __m256d c)
{
   simde__m256d_private
   r_,
   a_ = simde__m256d_to_private(a),
   b_ = simde__m256d_to_private(b),
   c_ = simde__m256d_to_private(c);
   r_.m128d[0] = _mm_fmadd_pd(a_.m128d[0], b_.m128d[0], c_.m128d[0]);
   r_.m128d[1] = _mm_fmadd_pd(a_.m128d[1], b_.m128d[1], c_.m128d[1]);
   return simde__m256d_from_private(r_);
}


#define _mm256_fmadd_pd(a, b, c) arm_fmadd_pd(a, b, c)  

It's not a great solution, because I'm relying on unddocmented interfaces within SIMDE.

This seems like something that could easily be fixed.
If it were, then I could ifdef my code so that it only uses this hack for older versions of SIMDE.

Note that I'm using a slightly older version of SIMDE from homebrew, but the problem also exists on the current SIMDE version on github.
Specifically, see line 86 of x86/fma.h.
The above code sequence could be incorporated there.
In fact, it seems like you already do this for _mm256_fnmadd_pd...not sure why it is done there but not for _mm256_fmadd_pd.

Thbanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions