Add mul_sub() fused multiply-add operation#492
Open
GrigoryEvko wants to merge 8 commits intorust-lang:masterfrom
Open
Add mul_sub() fused multiply-add operation#492GrigoryEvko wants to merge 8 commits intorust-lang:masterfrom
GrigoryEvko wants to merge 8 commits intorust-lang:masterfrom
Conversation
Implements fused multiply-add (FMA) operations for SimdFloat, the rust-lang#1 most critical missing feature in portable-simd based on analysis of PyTorch's SIMD implementation. Methods: - mul_add(a, b) - computes (self * a) + b with single rounding - mul_sub(a, b) - computes (self * a) - b with single rounding Benefits: - Improved accuracy: single rounding error vs two separate roundings - Better performance: 2 operations in 1 instruction on modern CPUs - Universal hardware support: FMA3 (x86), NEON vfma (ARM), RISC-V F extension Implementation: - Delegates to core::intrinsics::simd::simd_fma LLVM intrinsic - Zero-cost abstraction with #[inline] - mul_sub implemented as mul_add(a, -b) Testing (14 tests): - 3 accuracy tests proving FMA superiority: * Catastrophic cancellation: (1+ε)(1-ε) - 1 * Discriminant calculation: b² - 4ac (quadratic formula) * Polynomial evaluation with Horner's method - Basic operations (f32x4, f64x4, mul_add, mul_sub) - Special values (infinity, NaN, MAX, MIN, subnormals) - Size variations (f32x2, f32x8) - Negative values Example demonstrates: - Basic FMA usage - Polynomial evaluation (Horner's method) - Dot product accumulation - Accuracy comparison Use cases: - Neural networks (dot products, matrix multiply) - Scientific computing (polynomial evaluation, numerical stability) - Graphics (lighting calculations, transformations) - Physics simulations (force calculations, integration) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
ARM NEON uses flush-to-zero (FTZ) for subnormal values in SIMD operations. Updated test to accept either the correct subnormal result or zero.
StdFloat already provides mul_add. This PR now only adds mul_sub.
Author
|
Maybe move mul_add from std_float::StdFloat to core_simd/src/simd/num/float.rs as I accidentaly tried? |
Both mul_add and mul_sub now live in StdFloat for consistency.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds and
mul_sub()methods to theSimdFloattrait. This operation compute(self * a) - brespectively, using a single fused multiply-add instruction where available.