Skip to content

RyuJIT: should always decompose GT_MOD into a - (a / b) * b for CSE #32615

Open
@EgorBo

Description

@EgorBo

A % B can be transformed into A - ((A / B) * B) and it's always done for ARM because it doesn't have the remainder instruction.
Also for some reason it's done on x64 only when B is a const and a power of two and I suggest we always use what is used for ARM.

It gives VN/CSE more opportunities to find what to optimize, e.g.

int x = a / b;
int y = a % b;

can be decomposed into:

int x = a / b;
int y = a - ((a / b) * b);

then VN/CSE optimizes it to:

int x = a / b;
int y = a - x * b;

So the existing hack in Math.DivRem won't be needed, this pattern when we need both / and % is quite popular, e.g.: MemoryExtensions:Overlaps
or here is a jit-diff:

Top method improvements (percentages):
         -15 (-23.08% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.CodeAnalysis.XmlCharType:SplitSurrogateChar(int,byref,byref)
         -15 (-23.08% of base) : System.Private.Xml.dasm - System.Xml.XmlCharType:SplitSurrogateChar(int,byref,byref)
         -13 (-21.31% of base) : System.Runtime.Extensions.dasm - System.Net.WebUtility:ConvertSmpToUtf16(int,byref,byref)
          -9 (-14.52% of base) : System.Collections.dasm - System.Collections.Generic.BitHelper:MarkBit(int):this
         -14 (-10.29% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Int32],System.ReadOnlySpan`1[Int32],byref):bool
         -14 (-10.29% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Double],System.ReadOnlySpan`1[Double],byref):bool
         -14 (-10.29% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Vector`1],System.ReadOnlySpan`1[Vector`1],byref):bool
         -14 (-10.29% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Int64],System.ReadOnlySpan`1[Int64],byref):bool
         -14 (-10.22% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[__Canon],System.ReadOnlySpan`1[__Canon],byref):bool
          -7 (-9.33% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[Double][System.Double]:SetValue(int,double):this
         -11 (-8.80% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Int16],System.ReadOnlySpan`1[Int16],byref):bool
          -6 (-8.45% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[__Canon][System.__Canon]:SetValue(int,System.__Canon):this
          -6 (-8.45% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[Int32][System.Int32]:SetValue(int,int):this
          -6 (-8.45% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[Int64][System.Int64]:SetValue(int,long):this
          -6 (-8.33% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[Byte][System.Byte]:SetValue(int,ubyte):this

the only thing, lowering should compose it back to GT_MOD on x64 if CSE didn't find anything (when op2 is not a constant).

I think this optimization should be way easier to implement than the extraction of both values from idiv (#5213)

@dotnet/jit-contrib

category:cq
theme:optimization
skill-level:intermediate
cost:medium
impact:small

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIoptimization

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions