Open
Description
I expect the two diffent functions to be compiled to the same asm:
const N: usize = 1_000_000;
pub fn foo1() -> Vec<[u32; N]> {
vec![[0; N]; 3]
}
pub fn foo2() -> Vec<u32> {
vec![0; N * 3]
}
But they aren't, and this slows down code:
foo1:
push r14
push rbx
push rax
mov r14, rdi
mov edi, 12000000
mov esi, 4
call qword ptr [rip + __rust_alloc@GOTPCREL]
test rax, rax
je .LBB0_1
mov rbx, rax
mov edx, 12000000
mov rdi, rax
xor esi, esi
call qword ptr [rip + memset@GOTPCREL]
mov qword ptr [r14], rbx
vmovddup xmm0, qword ptr [rip + .LCPI0_0]
vmovups xmmword ptr [r14 + 8], xmm0
mov rax, r14
add rsp, 8
pop rbx
pop r14
ret
.LBB0_1:
mov edi, 12000000
mov esi, 4
call qword ptr [rip + alloc::alloc::handle_alloc_error@GOTPCREL]
ud2
foo2:
push rbx
mov rbx, rdi
mov edi, 12000000
mov esi, 4
call qword ptr [rip + __rust_alloc_zeroed@GOTPCREL]
test rax, rax
je .LBB1_1
mov qword ptr [rbx], rax
vmovddup xmm0, qword ptr [rip + .LCPI1_0]
vmovups xmmword ptr [rbx + 8], xmm0
mov rax, rbx
pop rbx
ret
.LBB1_1:
mov edi, 12000000
mov esi, 4
call qword ptr [rip + alloc::alloc::handle_alloc_error@GOTPCREL]
ud2
Using (and using aggressive compilation flags):
rustc 1.67.0-nightly (70f8737b2 2022-11-23)
binary: rustc
commit-hash: 70f8737b2f5d3bf7d6b784fad00b663b7ff9feda
commit-date: 2022-11-23
host: x86_64-pc-windows-gnu
release: 1.67.0-nightly
LLVM version: 15.0.4
It's a small difference but I can see it in my benchmarks. Fixing this problem is sufficiently important because Vec is used everywhere in most of the Rust ecosystem. So even small Vec improvements are worth having.
Currently as a workaround to reduce the number of heap allocation you can use something like:
let mut buf = vec![0_u32; N * 3];
let ([x1, x2, x3], []) = buf.as_chunks_mut::<N>() else { panic!() };
Metadata
Metadata
Assignees
Labels
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: `[T; N]`Category: An issue highlighting optimization opportunities or PRs implementing suchIssue: Problems and improvements with respect to performance of generated code.Relevant to the compiler team, which will review and decide on the PR/issue.