Skip to content

Clarify Armv9 Support #696

@jserv

Description

@jserv

To consolidate Armv9 support, I conducted a review of current architecture detection logic with a specific focus on Armv9-A. The objective was to determine if existing preprocessor conditions require modification to support the newer standard.

  1. Armv9-A (AArch64) is already fully supported by the current logic; no changes are required.
  2. Armv8-A (AArch32) detection is currently too restrictive. A minor adjustment is proposed to ensure forward compatibility with updated ACLE specifications (Armv8.1+).

The findings are as follows:

  • Execution State: Armv9-A primarily targets the AArch64 execution state. Modern Armv9 implementations (e.g., Cortex-A715, X3, X4) dropped AArch32 support entirely, while others treat it as optional/legacy.
  • Compiler Behavior: When targeting Armv9-A, standard toolchains do not define __arm__. Instead, they define __aarch64__ or __arm64__, mirroring Armv8-A behavior.

Consequently, the existing detection logic in SSE2NEON remains valid for Armv9-A:

#if defined(__aarch64__) || defined(__arm64__)
    /* Correctly captures Armv8-A and Armv9-A in 64-bit mode */
#endif

While the AArch64 logic is robust, a limitation was identified in the Legacy AArch32 detection logic.

According to the Arm C Language Extensions (ACLE), the __ARM_ARCH macro typically encodes the version as an integer. However, for minor versions beyond 8.0, the value scales (e.g., 8.1 becomes 801).

Architecture __ARM_ARCH Value Current Status in SSE2NEON
Armv8.0-A 8 ✅ Detected
Armv8.1-A 801 Ignored
Armv8.2-A 802 Ignored
Armv9.x 9xx N/A (Uses AArch64 path)

The current check is overly strict:

/* Current Logic */
__ARM_ARCH == 8

This condition implicitly excludes Armv8.1 through Armv8.6 AArch32 targets. While this does not impact Armv9 (which uses the AArch64 path), it is a compatibility gap for updated 32-bit toolchains.

To improve ACLE compliance and ensure forward compatibility for 32-bit builds, I propose relaxing the equality check to a range check. Proposed Change:

Replace:

__ARM_ARCH == 8

With:

__ARM_ARCH >= 8

The following diff illustrates the minimal changes required to implement this fix.

diff --git a/sse2neon.h b/sse2neon.h
index 1234567..abcdef0 100644
--- a/sse2neon.h
+++ b/sse2neon.h
@@ -45,7 +45,7 @@
 #elif SSE2NEON_ARCH_AARCH64
     /* AArch64 NEON path */
 
-#elif __ARM_ARCH == 8
+#elif __ARM_ARCH >= 8
 #  if !defined(__ARM_NEON) && !defined(__ARM_NEON__)
 #    error "You must enable NEON instructions (e.g. -mfpu=neon-fp-armv8) to use SSE2NEON."
 #  endif
@@ -220,7 +220,7 @@
 #include <math.h>
 
 /* Include arm_acle.h only for Armv8-A AArch32 */
-#if (!SSE2NEON_ARCH_AARCH64) && (__ARM_ARCH == 8)
+#if (!SSE2NEON_ARCH_AARCH64) && (__ARM_ARCH >= 8)
 #  if defined __has_include && __has_include(<arm_acle.h>)
 #    include <arm_acle.h>
 #  endif

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions