Description
A % B
can be transformed into A - ((A / B) * B)
and it's always done for ARM because it doesn't have the remainder instruction.
Also for some reason it's done on x64 only when B is a const and a power of two and I suggest we always use what is used for ARM.
It gives VN/CSE more opportunities to find what to optimize, e.g.
int x = a / b;
int y = a % b;
can be decomposed into:
int x = a / b;
int y = a - ((a / b) * b);
then VN/CSE optimizes it to:
int x = a / b;
int y = a - x * b;
So the existing hack in Math.DivRem won't be needed, this pattern when we need both /
and %
is quite popular, e.g.: MemoryExtensions:Overlaps
or here is a jit-diff:
Top method improvements (percentages):
-15 (-23.08% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.CodeAnalysis.XmlCharType:SplitSurrogateChar(int,byref,byref)
-15 (-23.08% of base) : System.Private.Xml.dasm - System.Xml.XmlCharType:SplitSurrogateChar(int,byref,byref)
-13 (-21.31% of base) : System.Runtime.Extensions.dasm - System.Net.WebUtility:ConvertSmpToUtf16(int,byref,byref)
-9 (-14.52% of base) : System.Collections.dasm - System.Collections.Generic.BitHelper:MarkBit(int):this
-14 (-10.29% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Int32],System.ReadOnlySpan`1[Int32],byref):bool
-14 (-10.29% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Double],System.ReadOnlySpan`1[Double],byref):bool
-14 (-10.29% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Vector`1],System.ReadOnlySpan`1[Vector`1],byref):bool
-14 (-10.29% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Int64],System.ReadOnlySpan`1[Int64],byref):bool
-14 (-10.22% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[__Canon],System.ReadOnlySpan`1[__Canon],byref):bool
-7 (-9.33% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[Double][System.Double]:SetValue(int,double):this
-11 (-8.80% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps(System.ReadOnlySpan`1[Int16],System.ReadOnlySpan`1[Int16],byref):bool
-6 (-8.45% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[__Canon][System.__Canon]:SetValue(int,System.__Canon):this
-6 (-8.45% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[Int32][System.Int32]:SetValue(int,int):this
-6 (-8.45% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[Int64][System.Int64]:SetValue(int,long):this
-6 (-8.33% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.CompressedSparseTensor`1[Byte][System.Byte]:SetValue(int,ubyte):this
the only thing, lowering should compose it back to GT_MOD
on x64 if CSE didn't find anything (when op2 is not a constant).
I think this optimization should be way easier to implement than the extraction of both values from idiv
(#5213)
@dotnet/jit-contrib
category:cq
theme:optimization
skill-level:intermediate
cost:medium
impact:small