Skip to content

[ER] Sub-optimal allocation of simple zeroed Vec of arrays #104847

Open
@leonardo-m

Description

@leonardo-m

I expect the two diffent functions to be compiled to the same asm:

const N: usize = 1_000_000;
pub fn foo1() -> Vec<[u32; N]> {
    vec![[0; N]; 3]
}
pub fn foo2() -> Vec<u32> {
    vec![0; N * 3]
}

But they aren't, and this slows down code:

foo1:
        push    r14
        push    rbx
        push    rax
        mov     r14, rdi
        mov     edi, 12000000
        mov     esi, 4
        call    qword ptr [rip + __rust_alloc@GOTPCREL]
        test    rax, rax
        je      .LBB0_1
        mov     rbx, rax
        mov     edx, 12000000
        mov     rdi, rax
        xor     esi, esi
        call    qword ptr [rip + memset@GOTPCREL]
        mov     qword ptr [r14], rbx
        vmovddup        xmm0, qword ptr [rip + .LCPI0_0]
        vmovups xmmword ptr [r14 + 8], xmm0
        mov     rax, r14
        add     rsp, 8
        pop     rbx
        pop     r14
        ret
.LBB0_1:
        mov     edi, 12000000
        mov     esi, 4
        call    qword ptr [rip + alloc::alloc::handle_alloc_error@GOTPCREL]
        ud2


foo2:
        push    rbx
        mov     rbx, rdi
        mov     edi, 12000000
        mov     esi, 4
        call    qword ptr [rip + __rust_alloc_zeroed@GOTPCREL]
        test    rax, rax
        je      .LBB1_1
        mov     qword ptr [rbx], rax
        vmovddup        xmm0, qword ptr [rip + .LCPI1_0]
        vmovups xmmword ptr [rbx + 8], xmm0
        mov     rax, rbx
        pop     rbx
        ret
.LBB1_1:
        mov     edi, 12000000
        mov     esi, 4
        call    qword ptr [rip + alloc::alloc::handle_alloc_error@GOTPCREL]
        ud2

Using (and using aggressive compilation flags):

rustc 1.67.0-nightly (70f8737b2 2022-11-23)
binary: rustc
commit-hash: 70f8737b2f5d3bf7d6b784fad00b663b7ff9feda
commit-date: 2022-11-23
host: x86_64-pc-windows-gnu
release: 1.67.0-nightly
LLVM version: 15.0.4

It's a small difference but I can see it in my benchmarks. Fixing this problem is sufficiently important because Vec is used everywhere in most of the Rust ecosystem. So even small Vec improvements are worth having.

Currently as a workaround to reduce the number of heap allocation you can use something like:

let mut buf = vec![0_u32; N * 3];
let ([x1, x2, x3], []) = buf.as_chunks_mut::<N>() else { panic!() };

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-arrayArea: `[T; N]`C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions