SIMD: AVX-512 VBMI as primary path for all byte/sbyte sizes by jonathanpeppers · Pull Request #107 · jonathanpeppers/SortingNetworks

jonathanpeppers · 2026-05-03T01:01:00Z

Summary

Make AVX-512 VBMI the primary SIMD path for all byte/sbyte sizes (8-64), with AVX2 as a fallback for sizes 8-32 on hardware without VBMI support.

Fixes #29

Motivation

Previously, byte/sbyte sizes ≤32 only had an AVX2 path using complex cross-lane shuffling (Permute2x128 + dual vpshufb + Or). Sizes 33-64 already used the much simpler VBMI PermuteVar64x8. This change makes VBMI the preferred path for all sizes, following the same primary/fallback pattern already used for:

short/ushort/char: AVX-512 BW primary → AVX2 fallback
double: AVX-512F primary → AVX2 fallback

Changes

All changes are in SimdX86Emitter.cs:

Area	Change
`GetGuardCondition`	Always returns `Avx512Vbmi.IsSupported` for 1-byte types (was conditional on `size > 32`)
`CanEmitAvx2Fallback`	Added byte/sbyte support for sizes 8-32
`EmitAvx2Fallback`	Routes byte types to new `EmitByteAvx2` method
`Emit()`	Routes all byte sizes to `EmitByteAvx512Vbmi`
`EmitByte` → `EmitByteAvx2`	Renamed; generates `SortSimdAvx2_` fallback methods
`EmitByteAvx512Vbmi`	Extended to handle sizes 8-32 (Vector128/256 → Vector512 zero-extension)

Generated dispatch (example)

if (Avx512Vbmi.IsSupported) {
    if (n == 8)  { SortSimd8_byte(span); return; }   // VBMI Vector512
    if (n == 16) { SortSimd16_byte(span); return; }  // VBMI Vector512
    if (n == 48) { SortSimd48_byte(span); return; }  // VBMI Vector512
}
else if (Avx2.IsSupported) {
    if (n == 8)  { SortSimdAvx2_8_byte(span); return; }   // AVX2 Vector256
    if (n == 16) { SortSimdAvx2_16_byte(span); return; }  // AVX2 Vector256
}
if (AdvSimd.Arm64.IsSupported) { ... }
// scalar fallback

Benchmark Results (AMD EPYC 9V74, AVX-512 VBMI)

byte

Size	ArraySort	GeneratedSort	Speedup
23	1,028 ns	55 ns	19x
27	1,250 ns	53 ns	24x
28	1,415 ns	54 ns	26x
32	1,516 ns	54 ns	28x
34	1,759 ns	64 ns	27x

sbyte

Size	ArraySort	GeneratedSort	Speedup
27	1,355 ns	57 ns	24x
28	1,495 ns	58 ns	26x
32	1,598 ns	58 ns	28x
38	2,160 ns	68 ns	32x

All elements fit in a single Vector512<byte> with PermuteVar64x8 shuffles. Zero allocations. On CPUs without VBMI, sizes 8-32 fall back to AVX2.

Testing

All 455 tests pass across all four CI platforms (ubuntu x64, ubuntu ARM, windows, macOS).

Make AVX-512 VBMI the primary SIMD path for all byte/sbyte sizes (8-64), with AVX2 as a fallback for sizes 8-32 on hardware without VBMI support. Previously, byte/sbyte sizes ≤32 only had an AVX2 path using complex cross-lane shuffling (Permute2x128 + dual vpshufb + Or). Sizes 33-64 already used the simpler VBMI PermuteVar64x8. This change makes VBMI the preferred path for all sizes, following the same primary/fallback pattern used for short (AVX-512 BW → AVX2) and double (AVX-512F → AVX2). Changes in SimdX86Emitter.cs: - GetGuardCondition: always returns Avx512Vbmi for 1-byte types - CanEmitAvx2Fallback: add byte/sbyte support for sizes 8-32 - EmitAvx2Fallback: route byte types to new EmitByteAvx2 method - Emit: route all byte sizes to EmitByteAvx512Vbmi - Rename EmitByte → EmitByteAvx2 (AVX2 fallback, SortSimdAvx2_ naming) - Extend EmitByteAvx512Vbmi to handle sizes 8-32 (Vector128/256 → Vector512) Fixes #29 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Keep our updated case 1 comment (VBMI primary for all sizes) and take main's updated case 2 (64 elements with PermuteVar32x16x2). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR updates the x86 SIMD code generator so byte/sbyte sorting networks prefer AVX-512 VBMI across the full supported size range, while still generating AVX2 fallbacks for smaller sizes on machines without VBMI. It fits into the generator’s existing pattern of “newer AVX-512 primary path, older AVX2 fallback” used for other element widths.

Changes:

Switched byte/sbyte primary x86 guard/dispatch from size-dependent AVX2-or-VBMI logic to VBMI for all supported byte widths.
Added AVX2 fallback emission for byte/sbyte sizes 8-32, including distinct SortSimdAvx2_* method generation.
Extended the VBMI byte emitter to load/store sizes 8-32 by zero-extending smaller vectors into Vector512<byte>.

- Add SimdCode_8Bit_HasAvx2Fallback generator test for byte/sbyte AVX2 fallback (sizes 8, 16, 28, 32) verifying both VBMI and AVX2 dispatch - Add (32, byte) to SimdCode_Compiles InlineData - Add SortingNetwork(32) for byte and sbyte in GeneratedSorters.cs - Add Sort_32Elements_Byte and Sort_32Elements_SByte stress tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Keep size 32 sbyte from branch, add sizes 48/64 sbyte from main. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Update SIMD example to show AVX-512 VBMI PermuteVar64x8 (was AVX2) - Update Design section: VBMI primary, AVX2 fallback for byte/sbyte - Update AVX-512 benchmarks with VBMI results across sizes 23-38 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 3, 2026 01:01

Copilot started reviewing on behalf of jonathanpeppers May 3, 2026 01:01 View session

Merge origin/main, resolve conflict in MaxElements comment

ee51dc4

Keep our updated case 1 comment (VBMI primary for all sizes) and take main's updated case 2 (64 elements with PermuteVar32x16x2). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI reviewed May 3, 2026

View reviewed changes

Comment thread SortingNetworks.Generators/SimdX86Emitter.cs

Comment thread SortingNetworks.Generators/SimdX86Emitter.cs

jonathanpeppers and others added 3 commits May 2, 2026 20:10

Merge origin/main, resolve conflict in GeneratedSorters.cs

d9def8f

Keep size 32 sbyte from branch, add sizes 48/64 sbyte from main. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jonathanpeppers merged commit 403d245 into main May 3, 2026
6 checks passed

jonathanpeppers deleted the jonathanpeppers/avx512-vbmi-byte-sbyte branch May 3, 2026 02:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMD: AVX-512 VBMI as primary path for all byte/sbyte sizes#107

SIMD: AVX-512 VBMI as primary path for all byte/sbyte sizes#107
jonathanpeppers merged 5 commits into
mainfrom
jonathanpeppers/avx512-vbmi-byte-sbyte

jonathanpeppers commented May 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jonathanpeppers commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Generated dispatch (example)

Benchmark Results (AMD EPYC 9V74, AVX-512 VBMI)

byte

sbyte

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jonathanpeppers commented May 3, 2026 •

edited

Loading