SIMD: AVX-512 VBMI as primary path for all byte/sbyte sizes#107
Merged
Conversation
Make AVX-512 VBMI the primary SIMD path for all byte/sbyte sizes (8-64), with AVX2 as a fallback for sizes 8-32 on hardware without VBMI support. Previously, byte/sbyte sizes ≤32 only had an AVX2 path using complex cross-lane shuffling (Permute2x128 + dual vpshufb + Or). Sizes 33-64 already used the simpler VBMI PermuteVar64x8. This change makes VBMI the preferred path for all sizes, following the same primary/fallback pattern used for short (AVX-512 BW → AVX2) and double (AVX-512F → AVX2). Changes in SimdX86Emitter.cs: - GetGuardCondition: always returns Avx512Vbmi for 1-byte types - CanEmitAvx2Fallback: add byte/sbyte support for sizes 8-32 - EmitAvx2Fallback: route byte types to new EmitByteAvx2 method - Emit: route all byte sizes to EmitByteAvx512Vbmi - Rename EmitByte → EmitByteAvx2 (AVX2 fallback, SortSimdAvx2_ naming) - Extend EmitByteAvx512Vbmi to handle sizes 8-32 (Vector128/256 → Vector512) Fixes #29 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep our updated case 1 comment (VBMI primary for all sizes) and take main's updated case 2 (64 elements with PermuteVar32x16x2). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the x86 SIMD code generator so byte/sbyte sorting networks prefer AVX-512 VBMI across the full supported size range, while still generating AVX2 fallbacks for smaller sizes on machines without VBMI. It fits into the generator’s existing pattern of “newer AVX-512 primary path, older AVX2 fallback” used for other element widths.
Changes:
- Switched byte/sbyte primary x86 guard/dispatch from size-dependent AVX2-or-VBMI logic to VBMI for all supported byte widths.
- Added AVX2 fallback emission for byte/sbyte sizes 8-32, including distinct
SortSimdAvx2_*method generation. - Extended the VBMI byte emitter to load/store sizes 8-32 by zero-extending smaller vectors into
Vector512<byte>.
- Add SimdCode_8Bit_HasAvx2Fallback generator test for byte/sbyte AVX2 fallback (sizes 8, 16, 28, 32) verifying both VBMI and AVX2 dispatch - Add (32, byte) to SimdCode_Compiles InlineData - Add SortingNetwork(32) for byte and sbyte in GeneratedSorters.cs - Add Sort_32Elements_Byte and Sort_32Elements_SByte stress tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep size 32 sbyte from branch, add sizes 48/64 sbyte from main. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Update SIMD example to show AVX-512 VBMI PermuteVar64x8 (was AVX2) - Update Design section: VBMI primary, AVX2 fallback for byte/sbyte - Update AVX-512 benchmarks with VBMI results across sizes 23-38 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Make AVX-512 VBMI the primary SIMD path for all byte/sbyte sizes (8-64), with AVX2 as a fallback for sizes 8-32 on hardware without VBMI support.
Fixes #29
Motivation
Previously, byte/sbyte sizes ≤32 only had an AVX2 path using complex cross-lane shuffling (
Permute2x128+ dualvpshufb+Or). Sizes 33-64 already used the much simpler VBMIPermuteVar64x8. This change makes VBMI the preferred path for all sizes, following the same primary/fallback pattern already used for:Changes
All changes are in
SimdX86Emitter.cs:GetGuardConditionAvx512Vbmi.IsSupportedfor 1-byte types (was conditional onsize > 32)CanEmitAvx2FallbackEmitAvx2FallbackEmitByteAvx2methodEmit()EmitByteAvx512VbmiEmitByte→EmitByteAvx2SortSimdAvx2_fallback methodsEmitByteAvx512VbmiGenerated dispatch (example)
Benchmark Results (AMD EPYC 9V74, AVX-512 VBMI)
byte
sbyte
Testing
All 455 tests pass across all four CI platforms (ubuntu x64, ubuntu ARM, windows, macOS).