Skip to content

Suboptimal stack zeroing on AVX512 #114274

Open
@pentp

Description

@pentp

When compiling for AVX512 RyuJIT uses vmovdqu32 to efficiently zero 64 bytes at a time, but it does not use 32 byte zeroing like on AVX2 and falls back straight to 16 byte zeroing for remaining zeroing needs.

Sample repro:

sub      rsp, 168
vxorps   xmm4, xmm4, xmm4
vmovdqu32 zmmword ptr [rsp+0x20], zmm4
vmovdqa  xmmword ptr [rsp+0x60], xmm4 ; this should be vmovdqu ymmword ptr [rsp+0x60], ymm4
vmovdqa  xmmword ptr [rsp+0x70], xmm4 ; this could be omitted then
vmovdqa  xmmword ptr [rsp+0x80], xmm4
...

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIavx512Related to the AVX-512 architectureoptimization

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions