Suboptimal stack zeroing on AVX512

When compiling for AVX512 RyuJIT uses `vmovdqu32` to efficiently zero 64 bytes at a time, but it does not use 32 byte zeroing like on AVX2 and falls back straight to 16 byte zeroing for remaining zeroing needs.

[Sample repro](https://godbolt.org/z/bTqf8Wcoh):
```asm
sub      rsp, 168
vxorps   xmm4, xmm4, xmm4
vmovdqu32 zmmword ptr [rsp+0x20], zmm4
vmovdqa  xmmword ptr [rsp+0x60], xmm4 ; this should be vmovdqu ymmword ptr [rsp+0x60], ymm4
vmovdqa  xmmword ptr [rsp+0x70], xmm4 ; this could be omitted then
vmovdqa  xmmword ptr [rsp+0x80], xmm4
...
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Suboptimal stack zeroing on AVX512 #114274

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Suboptimal stack zeroing on AVX512 #114274

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions