-
Notifications
You must be signed in to change notification settings - Fork 231
Description
To consolidate Armv9 support, I conducted a review of current architecture detection logic with a specific focus on Armv9-A. The objective was to determine if existing preprocessor conditions require modification to support the newer standard.
- Armv9-A (AArch64) is already fully supported by the current logic; no changes are required.
- Armv8-A (AArch32) detection is currently too restrictive. A minor adjustment is proposed to ensure forward compatibility with updated ACLE specifications (Armv8.1+).
The findings are as follows:
- Execution State: Armv9-A primarily targets the AArch64 execution state. Modern Armv9 implementations (e.g., Cortex-A715, X3, X4) dropped AArch32 support entirely, while others treat it as optional/legacy.
- Compiler Behavior: When targeting Armv9-A, standard toolchains do not define
__arm__. Instead, they define__aarch64__or__arm64__, mirroring Armv8-A behavior.
Consequently, the existing detection logic in SSE2NEON remains valid for Armv9-A:
#if defined(__aarch64__) || defined(__arm64__)
/* Correctly captures Armv8-A and Armv9-A in 64-bit mode */
#endifWhile the AArch64 logic is robust, a limitation was identified in the Legacy AArch32 detection logic.
According to the Arm C Language Extensions (ACLE), the __ARM_ARCH macro typically encodes the version as an integer. However, for minor versions beyond 8.0, the value scales (e.g., 8.1 becomes 801).
| Architecture | __ARM_ARCH Value |
Current Status in SSE2NEON |
|---|---|---|
| Armv8.0-A | 8 |
✅ Detected |
| Armv8.1-A | 801 |
❌ Ignored |
| Armv8.2-A | 802 |
❌ Ignored |
| Armv9.x | 9xx |
N/A (Uses AArch64 path) |
The current check is overly strict:
/* Current Logic */
__ARM_ARCH == 8This condition implicitly excludes Armv8.1 through Armv8.6 AArch32 targets. While this does not impact Armv9 (which uses the AArch64 path), it is a compatibility gap for updated 32-bit toolchains.
To improve ACLE compliance and ensure forward compatibility for 32-bit builds, I propose relaxing the equality check to a range check. Proposed Change:
Replace:
__ARM_ARCH == 8With:
__ARM_ARCH >= 8The following diff illustrates the minimal changes required to implement this fix.
diff --git a/sse2neon.h b/sse2neon.h
index 1234567..abcdef0 100644
--- a/sse2neon.h
+++ b/sse2neon.h
@@ -45,7 +45,7 @@
#elif SSE2NEON_ARCH_AARCH64
/* AArch64 NEON path */
-#elif __ARM_ARCH == 8
+#elif __ARM_ARCH >= 8
# if !defined(__ARM_NEON) && !defined(__ARM_NEON__)
# error "You must enable NEON instructions (e.g. -mfpu=neon-fp-armv8) to use SSE2NEON."
# endif
@@ -220,7 +220,7 @@
#include <math.h>
/* Include arm_acle.h only for Armv8-A AArch32 */
-#if (!SSE2NEON_ARCH_AARCH64) && (__ARM_ARCH == 8)
+#if (!SSE2NEON_ARCH_AARCH64) && (__ARM_ARCH >= 8)
# if defined __has_include && __has_include(<arm_acle.h>)
# include <arm_acle.h>
# endif