Skip to content

Inefficient code generation with target-feature AVX2 #137335

Open
@LaurenzV

Description

@LaurenzV

I tried this code:

use std::time::Instant;

fn main() {
  let mut scratch_buf = [25; 4096];
  let color = [30; 16];
  let x = 0;
  let width = 256;

  let start = Instant::now();
  for _ in 0..200000 {
    fill_solid(&mut scratch_buf, &color, x, width);
  }

  println!("Ran for {:?}", start.elapsed());
}

pub(crate) fn fill_solid(
  scratch: &mut [u8; 4096],
  color: &[u8; 16],
  x: usize,
  width: usize,
) {
  let target = &mut scratch[x * 16..][..16 * width];

  let dest = target.chunks_exact_mut(16);

  for cb in dest {
    for i in 0..16 {
      cb[i] = color[i] + ((color[i] as u16 * cb[i] as u16) / 255) as u8;
    }
  }
}

I expected to see this happen: When compiling with RUSTFLAGS="-C target-feature=+avx2", I at least expected the code to not be much slower.

Instead, this happened: The code runs 7x slower than when compiled without this target feature.

RUSTFLAGS="-C target-feature=+avx2" cargo run --release
Ran for 226.4182ms

cargo run --release
Ran for 31.5613ms

I can't tell what exactly is going on, but the AVX code definitely looks much more verbose: https://godbolt.org/z/s48ccT1fn

Meta

rustc --version --verbose:

rustc 1.84.0 (9fc6b4312 2025-01-07)
binary: rustc
commit-hash: 9fc6b43126469e3858e2fe86cafb4f0fd5068869
commit-date: 2025-01-07
host: x86_64-pc-windows-msvc
release: 1.84.0
LLVM version: 19.1.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-codegenArea: Code generationA-target-featureArea: Enabling/disabling target features like AVX, Neon, etc.C-bugCategory: This is a bug.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions