aarch64 op: Enable two‑stage SVE detection in component configuration #13203
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request refactors the build configuration for the OpenMPI aarch64 op component, aligning it with the existing approach used by the avx component.
Currently, the avx component configuration systematically tests SIMD instruction support (e.g., AVX2, AVX512) by incrementally applying compiler flags until compilation succeeds, independent from the host CPU's capabilities. In contrast, the existing aarch64 configuration lacks this mechanism, performing only basic compilation checks without utilizing additional flags or function attributes.
To address this gap, this pull request enhances the aarch64 configuration by incorporating checks using compiler function attributes (specifically,
__attribute__((__target__("+sve")))
). This enables SVE code and includes it in OpenMPI regardless of compiler flags provided explicitly via the command line, just like the avx component already does.If compilation via function attribute is successful, the attribute is automatically inserted via macro into the SVE function templates within
op_aarch64_functions.c
. Runtime detection of processor capabilities (NEON or SVE) remains unchanged. No API's have been changed, only the build systems is modified along with a macro definition in the source files.There is a discussion that goes into more detail in the developer user group.