Improve `load`, `store`, `gather`, `scatter` codegen for targets without masked load/store instructions

On at least the baseline x86_64 `sse2` target, I'd like the code output to be more similar to the manual scalar implementation. I end up having to write out the manual loops and avoid the load/store functions for performance.

https://rust.godbolt.org/z/57hsoTPbd

<details><summary>portable_simd</summary>
<p>

```rust
#![feature(portable_simd)]

use std::simd::*;

const N: usize = 8;

#[inline(never)]
pub fn load_or_default_simd(slice: &[f32]) -> Simd<f32, N> {
    Simd::load_or_default(slice)
}

#[inline(never)]
pub fn store_simd(data: Simd<f32, N>, slice: &mut [f32]) {
    data.store_select(slice, Mask::splat(true))
}

#[inline(never)]
pub fn gather_or_default_simd(slice: &[f32], idxs: Simd<usize, N>) -> Simd<f32, N> {
    Simd::gather_or_default(slice, idxs)
}

#[inline(never)]
pub fn scatter_simd(data: Simd<f32, N>, slice: &mut [f32], idxs: Simd<usize, N>) {
    data.scatter(slice, idxs)
}
```

</p>
</details>

<details><summary>scalar</summary>
<p>

```rust
#![feature(portable_simd)]

use std::simd::*;

const N: usize = 8;

#[inline(never)]
pub fn load_or_default_scalar(slice: &[f32]) -> Simd<f32, N> {
    let mut result = [0.0; N];
    for (&s, r) in slice.iter().zip(result.iter_mut()) {
        *r = s;
    }
    Simd::from(result)
}

#[inline(never)]
pub fn store_scalar(data: Simd<f32, N>, slice: &mut [f32]) {
    for (s, &d) in slice.iter_mut().zip(data.as_array().iter()) {
        *s = d;
    }
}

#[inline(never)]
pub fn gather_or_default_scalar(slice: &[f32], idxs: Simd<usize, N>) -> Simd<f32, N> {
    let mut result = [0.0; N];
    for (&i, r) in idxs.as_array().iter().zip(result.iter_mut()) {
        *r = *slice.get(i).unwrap_or(&0.0);
    }
    Simd::from(result)
}

#[inline(never)]
pub fn scatter_scalar(data: Simd<f32, N>, slice: &mut [f32], idxs: Simd<usize, N>) {
    for (&d, &idx) in data.as_array().iter().zip(idxs.as_array().iter()) {
        if let Some(s) = slice.get_mut(idx) {
            *s = d;
        }
    }
}
```

</p>
</details>

I'm mainly concerned about load/store, I don't expect much from gather/scatter.

I'm not sure if this is something to be solved on the Rust side or LLVM for the intrinsics which are present in the IR
- `llvm.masked.load.*`
- `llvm.masked.store.*`
- `llvm.masked.gather.*`
- `llvm.masked.scatter.*`


### Meta

```
rustc 1.93.0-nightly (c86564c41 2025-11-27)
binary: rustc
commit-hash: c86564c412a5949088a53b665d8b9a47ec610a39
commit-date: 2025-11-27
host: x86_64-unknown-linux-gnu
release: 1.93.0-nightly
LLVM version: 21.1.5
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve `load`, `store`, `gather`, `scatter` codegen for targets without masked load/store instructions #494

Meta

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve load, store, gather, scatter codegen for targets without masked load/store instructions #494

Description

Meta

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Improve `load`, `store`, `gather`, `scatter` codegen for targets without masked load/store instructions #494