Open
Description
Recently @CarolEidt improved handling of HW Intrinsics addressing modes.
See https://github.com/dotnet/coreclr/issues/19550#issuecomment-476898428
However, there are still some remaining issues affecting code performance of more general nature. Namely address calculations should not use more than 2 operands. Currently generated code may have some inefficient instructions like:
vmovupd ymm0, ymmword ptr[rsi]
vsubps ymm0, ymm0, ymmword ptr[rcx+4*rdx+4] // address calculation should be reduced to 2 ops instead of 4
vmovups ymmword ptr[rcx+4], ymm0
vmovups ymm6, ymmword ptr[rcx+4]
category:cq
theme:addressing-modes
skill-level:expert
cost:large