RyuJIT: should always decompose GT_MOD into `a - (a / b) * b` for CSE

`A % B` can be transformed into `A - ((A / B) * B)` and it's always done for ARM because it doesn't have the remainder instruction.
Also for some reason it's done on x64 only when B is a const and a power of two and I suggest we always use what [is used for ARM](https://github.com/dotnet/runtime/blob/a607cffe1243e633f9b5028560adc4e3505f48ac/src/coreclr/src/jit/morph.cpp#L11197-L11224).

It gives VN/CSE more opportunities to find what to optimize, e.g.
```csharp
int x = a / b;
int y = a % b;
```
can be decomposed into:
```csharp
int x = a / b;
int y = a - ((a / b) * b);
```
then VN/CSE optimizes it to:
```csharp
int x = a / b;
int y = a - x * b;
```
So the existing hack in [Math.DivRem](https://github.com/dotnet/runtime/blob/a607cffe1243e633f9b5028560adc4e3505f48ac/src/libraries/System.Private.CoreLib/src/System/Math.cs#L192-L194) won't be needed, this pattern when we need both `/` and `%` is quite popular, e.g.: [MemoryExtensions:Overlaps](https://github.com/dotnet/runtime/blob/a607cffe1243e633f9b5028560adc4e3505f48ac/src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.cs#L1604-L1607)
or here is a jit-diff:
```
Top method improvements (percentages):
         -15 (-23.08% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.CodeAnalysis.XmlCharType:SplitSurrogateChar(int,byref,byref)
         -15 (-23.08% of base) : System.Private.Xml.dasm - System.Xml.XmlCharType:SplitSurrogateChar(int,byref,byref)
         -13 (-21.31% of base) : System.Runtime.Extensions.dasm - System.Net.WebUtility:ConvertSmpToUtf16(int,byref,byref)
          -9 (-14.52% of base) : System.Collections.dasm - System.Collections.Generic.BitHelper:MarkBit(int):this
         -14 (-10.29% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Int32],System.ReadOnlySpan`1[Int32],byref):bool
         -14 (-10.29% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Double],System.ReadOnlySpan`1[Double],byref):bool
         -14 (-10.29% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Vector`1],System.ReadOnlySpan`1[Vector`1],byref):bool
         -14 (-10.29% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Int64],System.ReadOnlySpan`1[Int64],byref):bool
         -14 (-10.22% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[__Canon],System.ReadOnlySpan`1[__Canon],byref):bool
          -7 (-9.33% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[Double][System.Double]:SetValue(int,double):this
         -11 (-8.80% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Int16],System.ReadOnlySpan`1[Int16],byref):bool
          -6 (-8.45% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[__Canon][System.__Canon]:SetValue(int,System.__Canon):this
          -6 (-8.45% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[Int32][System.Int32]:SetValue(int,int):this
          -6 (-8.45% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[Int64][System.Int64]:SetValue(int,long):this
          -6 (-8.33% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[Byte][System.Byte]:SetValue(int,ubyte):this
```
the only thing, lowering should compose it back to `GT_MOD` on x64 if CSE didn't find anything (when op2 is not a constant).

I think this optimization should be way easier to implement than the extraction of both values from `idiv` (https://github.com/dotnet/runtime/issues/5213)

@dotnet/jit-contrib 

category:cq
theme:optimization
skill-level:intermediate
cost:medium
impact:small

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RyuJIT: should always decompose GT_MOD into `a - (a / b) * b` for CSE #32615

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RyuJIT: should always decompose GT_MOD into a - (a / b) * b for CSE #32615

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

RyuJIT: should always decompose GT_MOD into `a - (a / b) * b` for CSE #32615