Open
Description
I tried this code:
use std::time::Instant;
fn main() {
let mut scratch_buf = [25; 4096];
let color = [30; 16];
let x = 0;
let width = 256;
let start = Instant::now();
for _ in 0..200000 {
fill_solid(&mut scratch_buf, &color, x, width);
}
println!("Ran for {:?}", start.elapsed());
}
pub(crate) fn fill_solid(
scratch: &mut [u8; 4096],
color: &[u8; 16],
x: usize,
width: usize,
) {
let target = &mut scratch[x * 16..][..16 * width];
let dest = target.chunks_exact_mut(16);
for cb in dest {
for i in 0..16 {
cb[i] = color[i] + ((color[i] as u16 * cb[i] as u16) / 255) as u8;
}
}
}
I expected to see this happen: When compiling with RUSTFLAGS="-C target-feature=+avx2"
, I at least expected the code to not be much slower.
Instead, this happened: The code runs 7x slower than when compiled without this target feature.
RUSTFLAGS="-C target-feature=+avx2" cargo run --release
Ran for 226.4182ms
cargo run --release
Ran for 31.5613ms
I can't tell what exactly is going on, but the AVX code definitely looks much more verbose: https://godbolt.org/z/s48ccT1fn
Meta
rustc --version --verbose
:
rustc 1.84.0 (9fc6b4312 2025-01-07)
binary: rustc
commit-hash: 9fc6b43126469e3858e2fe86cafb4f0fd5068869
commit-date: 2025-01-07
host: x86_64-pc-windows-msvc
release: 1.84.0
LLVM version: 19.1.5
Metadata
Metadata
Assignees
Labels
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generationArea: Enabling/disabling target features like AVX, Neon, etc.Category: This is a bug.Issue: Problems and improvements with respect to performance of generated code.Relevant to the compiler team, which will review and decide on the PR/issue.