Description
Relaxed SIMD adds a set of useful instructions that introduce local non-determinism where the results of the instructions may vary based on hardware support.
The SIMD proposal focuses on getting a set of SIMD instructions that will speed up real world use cases while staying true to the deterministic core of the language. However, there are instructions that can unlock even more performance, but due to the architecture-dependent semantics, were not included. These instructions include:
- Fused Multiply Add (single rounding if hardware supports it, double rounding if not)
- Approximate reciprocal/reciprocal sqrt
- Relaxed Swizzle (implementation defined out of bounds behavior)
- Relaxed Rounding Q-format Multiplication (optional saturation)
These instructions have been suggested in multiple places: FMA (1, 2, 3), approximate reciprocal/reciprocal sqrt (1). Such instructions have also been mentioned as part of future features.
There is a soft dependency on feature-detection proposal, which will allow code to determine if certain instructions are supported by the hardware and these instructions can be safely relied on.
Non-determinism: The non-determinism in this proposal is limited to the result of an individual instruction and and is consistent across runs. There are no global control or flags involved. This means that given the following pseudocode:
w = fma(x, y, z)
w
can have different values depending on available hardware support. Multiple usages of the instruction will return the same result w
, so the instruction is internally consistent.
Initial prototypes indicate performance improvement of ~30% on modern CPU architectures. The alternative, which is to provide a deterministic FMA result using emulation, will be too slow to be of any use.
Potential extension: introduce a relaxed mode for existing SIMD instructions. Such a mode would be tied to the feature-detection proposal, where if relaxed-mode is supported, the existing SIMD instructions will be have non-deterministic behavior, e.g. NaN canonicalization, FP IEEE compliance modes used by developers (e.g. no-honor-inf, no-signed-zeros, no-trapping-math..) .
Keywords (for SEO): Fast SIMD
Co-champions: Marat Dukhan (@Maratyszcza) and Zhi An Ng (@ngzhian)