Skip to content

Improve load, store, gather, scatter codegen for targets without masked load/store instructions #494

@okaneco

Description

@okaneco

On at least the baseline x86_64 sse2 target, I'd like the code output to be more similar to the manual scalar implementation. I end up having to write out the manual loops and avoid the load/store functions for performance.

https://rust.godbolt.org/z/57hsoTPbd

portable_simd

#![feature(portable_simd)]

use std::simd::*;

const N: usize = 8;

#[inline(never)]
pub fn load_or_default_simd(slice: &[f32]) -> Simd<f32, N> {
    Simd::load_or_default(slice)
}

#[inline(never)]
pub fn store_simd(data: Simd<f32, N>, slice: &mut [f32]) {
    data.store_select(slice, Mask::splat(true))
}

#[inline(never)]
pub fn gather_or_default_simd(slice: &[f32], idxs: Simd<usize, N>) -> Simd<f32, N> {
    Simd::gather_or_default(slice, idxs)
}

#[inline(never)]
pub fn scatter_simd(data: Simd<f32, N>, slice: &mut [f32], idxs: Simd<usize, N>) {
    data.scatter(slice, idxs)
}

scalar

#![feature(portable_simd)]

use std::simd::*;

const N: usize = 8;

#[inline(never)]
pub fn load_or_default_scalar(slice: &[f32]) -> Simd<f32, N> {
    let mut result = [0.0; N];
    for (&s, r) in slice.iter().zip(result.iter_mut()) {
        *r = s;
    }
    Simd::from(result)
}

#[inline(never)]
pub fn store_scalar(data: Simd<f32, N>, slice: &mut [f32]) {
    for (s, &d) in slice.iter_mut().zip(data.as_array().iter()) {
        *s = d;
    }
}

#[inline(never)]
pub fn gather_or_default_scalar(slice: &[f32], idxs: Simd<usize, N>) -> Simd<f32, N> {
    let mut result = [0.0; N];
    for (&i, r) in idxs.as_array().iter().zip(result.iter_mut()) {
        *r = *slice.get(i).unwrap_or(&0.0);
    }
    Simd::from(result)
}

#[inline(never)]
pub fn scatter_scalar(data: Simd<f32, N>, slice: &mut [f32], idxs: Simd<usize, N>) {
    for (&d, &idx) in data.as_array().iter().zip(idxs.as_array().iter()) {
        if let Some(s) = slice.get_mut(idx) {
            *s = d;
        }
    }
}

I'm mainly concerned about load/store, I don't expect much from gather/scatter.

I'm not sure if this is something to be solved on the Rust side or LLVM for the intrinsics which are present in the IR

  • llvm.masked.load.*
  • llvm.masked.store.*
  • llvm.masked.gather.*
  • llvm.masked.scatter.*

Meta

rustc 1.93.0-nightly (c86564c41 2025-11-27)
binary: rustc
commit-hash: c86564c412a5949088a53b665d8b9a47ec610a39
commit-date: 2025-11-27
host: x86_64-unknown-linux-gnu
release: 1.93.0-nightly
LLVM version: 21.1.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCategory: Bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions