Arm64 SVE: Optimise zero/allbits vectors the same as masks #115566

a74nh · 2025-05-14T16:16:49Z

Fixes #114443
Fixes #114431
Fixes #114433

TrueMasks are imported as constant vectors with the mask pattern expanded into the constant. If the pattern is unknown or cannot be expanded, a TrueMaskX node is imported.
FalseMasks are imported as constant vector zero
The mask used in a ConvertVectorToMask is imported as a ConversionTrueMask
During folding, constant vectors which are used as masks are turned into constant masks. This captures both imported TrueMasks/Falsemasks and constant vectors created by users.
Optimisations on masks should use IsTrueMask() and IsFalseMask().
IsTrueMask() checks for a constant mask pattern that matches the type of the parent.
IsFalseMask() checks for a constant mask zero
TrueMaskX and ConversionTrueMask are generally not optimised.
At code generation, constant masks are pattern matched back into a mask pattern for use in ptrue/pfalse.

Fixes dotnet#114443 * IsVectorZero() should allow for all zero vectors and false masks that have been converted to vectors. * IsVectorAllBitsSet() should allow for all bits set vectors and true masks that have been converted to vectors. * IsMaskZero() should all for false masks and all zero vectors that have been converted to masks. * IsMaskAllBitsSet() should allow for true masks and all bit set vectors that have been converted to masks. In addition: * Fix up all the errors caused by these changes. * Add a bunch of asmcheck tests

src/coreclr/jit/lowerarmarch.cpp

dotnet-policy-service · 2025-05-14T16:19:21Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

a74nh · 2025-05-14T16:24:07Z

Currently there are some issues around Non Faulting LoadVectors and and I need to check I've not created any code size regressions

src/coreclr/jit/gentree.h

src/coreclr/jit/gentree.cpp

src/coreclr/jit/gentree.h

a74nh · 2025-06-13T13:40:23Z

Rebased and almost everything is looking good now. Just need to fix JIT/opt/SVE/PredicateInstructions/PredicateInstructions

a74nh · 2025-06-13T14:49:34Z

The predicate optimisations are failing. But I think in a good way....

Consider. Note that a vector (not mask) is returned from the function.

    static Vector<short> ZipLow()
    {
        return Sve.ZipLow(Vector<short>.Zero, Sve.CreateTrueMaskInt16());
    }

HEAD, Tier 0:
Arguments in are converted to vectors.

IN0001: 000008      ptrue   p0.h
IN0002: 00000C      mov     z16.h, p0/z, #1
IN0003: 000010      movi    v17.4s, #0
IN0004: 000014      zip1    z0.h, z17.h, z16.h

HEAD:
Arguments in are masks, but the return needs converting from a mask:

IN0001: 000008      ptrue   p0.h
IN0002: 00000C      pfalse  p1.b
IN0003: 000010      zip1    p0.h, p1.h, p0.h
IN0004: 000014      mov     z0.h, p0/z, #1

With this PR:
The predicate optimisation fails to trigger (due to a now incorrect canMorphVectorOperandToMask()).
However, the code is already optimal without the optimisation. "Fixing" the predicate optimisation would result in the previous code which is worse.

IN0001: 000008      movi    v0.4s, #0
IN0002: 00000C      mvni    v16.4s, #0
IN0003: 000010      zip1    z0.h, z0.h, z16.h

Ideally, the optimisation should build up a cost model of using both the vector version and predicate version, taking into account all input arguments and all uses of the result. I feel doing that is adding scope to this PR (especially given the code being produced here is better than HEAD).

I suggest for this PR disabling the optimisation by removing the call to fgMorphTryUseAllMaskVariant() and fixing up the test cases. A follow on PR would add the cost modelling.

kunalspathak · 2025-06-13T17:20:35Z

I suggest for this PR disabling the optimisation by removing the call to fgMorphTryUseAllMaskVariant() and fixing up the test cases. A follow on PR would add the cost modelling.

What are the diffs for this commit alone?

a74nh · 2025-06-16T12:23:51Z

Prior to removing calls to fgMorphTryUseAllMaskVariant():

Diffs are based on 2,631,375 contexts (1,092,633 MinOpts, 1,538,742 FullOpts).

Overall (-334,028 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.linux.arm64.checked.mch	16,042,184	+8	+2.49%
coreclr_tests.run.linux.arm64.checked.mch	567,688,716	-333,984	-5.31%
benchmarks.run_pgo.linux.arm64.checked.mch	71,813,300	-20	-1.47%
libraries.pmi.linux.arm64.checked.mch	68,207,660	-40	-46.67%
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch	16,695,796	+8	+2.49%

MinOpts (-31,028 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
coreclr_tests.run.linux.arm64.checked.mch	383,578,920	-31,008	-2.93%
benchmarks.run_pgo.linux.arm64.checked.mch	25,146,552	-20	-1.47%

FullOpts (-303,000 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.linux.arm64.checked.mch	15,723,912	+8	+2.49%
coreclr_tests.run.linux.arm64.checked.mch	184,109,796	-302,976	-7.75%
libraries.pmi.linux.arm64.checked.mch	68,087,900	-40	-46.67%
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch	16,330,864	+8	+2.49%

Example diffs

benchmarks.run.linux.arm64.checked.mch

-4 (-1.30%) : 16114.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)

@@ -7,33 +7,33 @@
 ; No matching PGO data
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T09] (  5,  5   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
-;  V01 loc0         [V01,T05] (  3,  9   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
+;  V00 this         [V00,T08] (  5,  5   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
+;* V01 loc0         [V01,T22] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[byte]>
 ;* V02 loc1         [V02    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V03 loc2         [V03    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V04 loc3         [V04    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;  V05 loc4         [V05,T00] ( 12, 41.50)     int  ->   x1        
-;  V06 loc5         [V06,T13] (  3,  6   )     int  ->   x2         single-def
-;  V07 loc6         [V07,T17] (  3,  5   )    long  ->   x4        
-;  V08 loc7         [V08,T18] (  3,  5   )    long  ->   x6        
+;  V06 loc5         [V06,T12] (  3,  6   )     int  ->   x2         single-def
+;  V07 loc6         [V07,T16] (  3,  5   )    long  ->   x4        
+;  V08 loc7         [V08,T17] (  3,  5   )    long  ->   x6        
 ;  V09 loc8         [V09    ] (  1,  0.50)     ref  ->  [fp+0x18]   must-init pinned class-hnd single-def <byte[]>
 ;  V10 loc9         [V10    ] (  1,  0.50)     ref  ->  [fp+0x10]   must-init pinned class-hnd single-def <byte[]>
-;  V11 loc10        [V11,T08] (  2,  8   )   ubyte  ->   x8        
+;  V11 loc10        [V11,T07] (  2,  8   )   ubyte  ->   x8        
 ;# V12 OutArgs      [V12    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V13 tmp1         [V13,T15] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
-;  V14 tmp2         [V14,T16] (  5,  5   )     ref  ->   x6         class-hnd single-def "dup spill" <byte[]>
-;  V15 tmp3         [V15,T19] (  2,  2   )    long  ->   x4         "Cast away GC"
-;  V16 tmp4         [V16,T20] (  2,  2   )    long  ->   x6         "Cast away GC"
+;  V13 tmp1         [V13,T14] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
+;  V14 tmp2         [V14,T15] (  5,  5   )     ref  ->   x6         class-hnd single-def "dup spill" <byte[]>
+;  V15 tmp3         [V15,T18] (  2,  2   )    long  ->   x4         "Cast away GC"
+;  V16 tmp4         [V16,T19] (  2,  2   )    long  ->   x6         "Cast away GC"
 ;  V17 tmp5         [V17,T01] (  3, 24   )     ref  ->   x2         "arr expr"
 ;  V18 tmp6         [V18,T02] (  3, 24   )     ref  ->   x6         "arr expr"
-;* V19 tmp7         [V19,T21] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
-;* V20 tmp8         [V20,T22] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
-;  V21 cse0         [V21,T06] (  3,  8.50)     int  ->   x2         "CSE #11: aggressive"
-;  V22 cse1         [V22,T07] (  3,  8.50)     int  ->   x4         "CSE #14: aggressive"
-;  V23 cse2         [V23,T14] (  3,  6   )     int  ->   x7         "CSE #07: aggressive"
-;  V24 cse3         [V24,T12] (  4,  6.50)     int  ->   x0         "CSE #06: aggressive"
-;  V25 cse4         [V25,T10] (  4,  6.50)     ref  ->   x3         "CSE #01: aggressive"
-;  V26 cse5         [V26,T11] (  4,  6.50)     ref  ->   x5         "CSE #03: aggressive"
+;* V19 tmp7         [V19,T20] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
+;* V20 tmp8         [V20,T21] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
+;  V21 cse0         [V21,T05] (  3,  8.50)     int  ->   x2         "CSE #11: aggressive"
+;  V22 cse1         [V22,T06] (  3,  8.50)     int  ->   x4         "CSE #14: aggressive"
+;  V23 cse2         [V23,T13] (  3,  6   )     int  ->   x7         "CSE #07: aggressive"
+;  V24 cse3         [V24,T11] (  4,  6.50)     int  ->   x0         "CSE #06: aggressive"
+;  V25 cse4         [V25,T09] (  4,  6.50)     ref  ->   x3         "CSE #01: aggressive"
+;  V26 cse5         [V26,T10] (  4,  6.50)     ref  ->   x5         "CSE #03: aggressive"
 ;  V27 cse6         [V27,T03] (  3, 12   )    long  ->   x4         "CSE #08: aggressive"
 ;  V28 cse7         [V28,T04] (  3, 12   )    long  ->   x8         "CSE #05: aggressive"
 ;
@@ -46,7 +46,6 @@ G_M892_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
 						;; size=12 bbWeight=1 PerfScore 2.50
 G_M892_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ; gcrRegs +[x0]
-            ptrue   p0.b
             mov     w1, wzr
             cntb    x2, all
             ldr     x3, [x0, #0x10]
@@ -57,7 +56,7 @@ G_M892_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref,
             ldr     w6, [x5, #0x08]
             cmp     w4, w6
             bne     G_M892_IG11
-						;; size=36 bbWeight=1 PerfScore 18.00
+						;; size=32 bbWeight=1 PerfScore 16.00
 G_M892_IG03:        ; bbWeight=0.50, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {}, byref, isz
             mov     x4, x3
             ; gcrRegs +[x4]
@@ -99,14 +98,14 @@ G_M892_IG07:        ; bbWeight=1, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {},
 G_M892_IG08:        ; bbWeight=4, gcrefRegs=0028 {x3 x5}, byrefRegs=0000 {}, byref, isz
             sxtw    x8, w1
             add     x9, x4, x8
+            ptrue   p0.b
             ld1b    { z16.b }, p0/z, [x9]
             add     x8, x6, x8
             ld1b    { z17.b }, p0/z, [x8]
-            ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, z17.b
-            mov     z16.b, p1/z, #1
-            ptrue   p1.b
-            uaddv   d16, p1, z16.b
+            cmpne   p0.b, p0/z, z16.b, z17.b
+            mov     z16.b, p0/z, #1
+            ptrue   p0.b
+            uaddv   d16, p0, z16.b
             umov    x8, v16.d[0]
             uxtb    w8, w8
             cmp     w8, #0
@@ -169,7 +168,7 @@ G_M892_IG15:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 308, prolog size 12, PerfScore 259.00, instruction count 77, allocated bytes for code 308 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
+; Total bytes of code 304, prolog size 12, PerfScore 257.00, instruction count 76, allocated bytes for code 304 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -180,7 +179,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 77 (0x0004d) Actual length = 308 (0x000134)
+  Function Length   : 76 (0x0004c) Actual length = 304 (0x000130)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-1.05%) : 26115.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

@@ -21,14 +21,13 @@
 ;# V10 OutArgs      [V10    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V11 tmp1         [V11,T05] (  5,  8   )     ref  ->   x3         class-hnd single-def "dup spill" <char[]>
 ;* V12 tmp2         [V12    ] (  0,  0   )  ushort  ->  zero-ref    "Inlining Arg"
-;* V13 tmp3         [V13    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg" <System.Numerics.Vector`1[short]>
-;  V14 tmp4         [V14,T11] (  2,  2   )    long  ->   x3         "Cast away GC"
-;  V15 tmp5         [V15,T01] (  3, 24   )     ref  ->   x3         "arr expr"
-;  V16 cse0         [V16,T08] (  3,  6   )     int  ->   x4         "CSE #07: aggressive"
-;  V17 cse1         [V17,T03] (  5, 10.25)     int  ->   x0         "CSE #02: aggressive"
-;  V18 cse2         [V18,T07] (  3,  6   )     ref  ->   x2         "CSE #06: aggressive"
-;  V19 cse3         [V19,T04] (  4, 10   )     int  ->   x5         "CSE #05: aggressive"
-;  V20 cse4         [V20,T10] (  2,  4.25)    mask  ->   p0         hoist "CSE #03: aggressive"
+;  V13 tmp3         [V13,T11] (  2,  2   )    long  ->   x3         "Cast away GC"
+;  V14 tmp4         [V14,T01] (  3, 24   )     ref  ->   x3         "arr expr"
+;  V15 cse0         [V15,T08] (  3,  6   )     int  ->   x4         "CSE #07: aggressive"
+;  V16 cse1         [V16,T03] (  5, 10.25)     int  ->   x0         "CSE #02: aggressive"
+;  V17 cse2         [V17,T07] (  3,  6   )     ref  ->   x2         "CSE #06: aggressive"
+;  V18 cse3         [V18,T04] (  4, 10   )     int  ->   x5         "CSE #05: aggressive"
+;  V19 cse4         [V19,T10] (  2,  4.25)    mask  ->   p0         hoist "CSE #03: aggressive"
 ;
 ; Lcl frame size = 16
 
@@ -62,14 +61,13 @@ G_M34028_IG04:        ; bbWeight=0.50, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}
 G_M34028_IG05:        ; bbWeight=1, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}, byref, isz
             ldrh    w4, [x0, #0x14]
             dup     v16.8h, w4
-            ptrue   p0.h
-            mov     z17.h, p0/z, #1
+            mvni    v17.4s, #0
             ldr     w0, [x0, #0x10]
             ; gcrRegs -[x0]
             cnth    x5, all
             cmp     w0, w5
             ble     G_M34028_IG10
-						;; size=32 bbWeight=1 PerfScore 15.50
+						;; size=28 bbWeight=1 PerfScore 12.00
 G_M34028_IG06:        ; bbWeight=0.25, gcrefRegs=0004 {x2}, byrefRegs=0000 {}, byref
             ptrue   p0.h
             cmpne   p0.h, p0/z, z17.h, #0
@@ -177,7 +175,7 @@ G_M34028_IG18:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 380, prolog size 12, PerfScore 236.38, instruction count 95, allocated bytes for code 380 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
+; Total bytes of code 376, prolog size 12, PerfScore 232.88, instruction count 94, allocated bytes for code 376 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -188,7 +186,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 95 (0x0005f) Actual length = 380 (0x00017c)
+  Function Length   : 94 (0x0005e) Actual length = 376 (0x000178)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

+8 (+2.11%) : 8287.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)

@@ -8,31 +8,31 @@
 ; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T06] (  9,  7   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
+;  V00 this         [V00,T05] (  9,  7   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
 ;  V01 loc0         [V01,T02] (  6, 17.50)     int  ->   x1        
 ;  V02 loc1         [V02,T04] (  5, 10   )     int  ->   x2         single-def
-;  V03 loc2         [V03,T05] (  4, 10   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
-;  V04 loc3         [V04,T01] (  6, 18   )    mask  ->   p1         <System.Numerics.Vector`1[byte]>
+;* V03 loc2         [V03,T19] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[byte]>
+;  V04 loc3         [V04,T01] (  6, 18   )    mask  ->   p0         <System.Numerics.Vector`1[byte]>
 ;  V05 loc4         [V05,T20] (  4, 13   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
 ;* V06 loc5         [V06    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V07 loc6         [V07    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
-;  V08 loc7         [V08,T11] (  3,  5   )    long  ->   x4        
-;  V09 loc8         [V09,T12] (  3,  5   )    long  ->   x5        
+;  V08 loc7         [V08,T10] (  3,  5   )    long  ->   x4        
+;  V09 loc8         [V09,T11] (  3,  5   )    long  ->   x5        
 ;  V10 loc9         [V10    ] (  1,  0.50)     ref  ->  [fp+0x28]   must-init pinned class-hnd single-def <byte[]>
 ;  V11 loc10        [V11    ] (  1,  0.50)     ref  ->  [fp+0x20]   must-init pinned class-hnd single-def <byte[]>
-;  V12 loc11        [V12,T10] (  4,  5   )     int  ->   x3        
-;  V13 loc12        [V13,T18] (  3,  1.50)     int  ->   x3         single-def
+;  V12 loc11        [V12,T09] (  4,  5   )     int  ->   x3        
+;  V13 loc12        [V13,T17] (  3,  1.50)     int  ->   x3         single-def
 ;  V14 loc13        [V14,T00] (  7, 22.50)     int  ->   x4        
 ;# V15 OutArgs      [V15    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V16 tmp1         [V16,T08] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
-;  V17 tmp2         [V17,T09] (  5,  5   )     ref  ->   x5         class-hnd single-def "dup spill" <byte[]>
-;  V18 tmp3         [V18,T16] (  2,  2   )    long  ->   x4         "Cast away GC"
-;  V19 tmp4         [V19,T17] (  2,  2   )    long  ->   x5         "Cast away GC"
-;  V20 tmp5         [V20,T13] (  3,  3   )     ref  ->   x2         single-def "arr expr"
-;  V21 tmp6         [V21,T14] (  3,  3   )     ref  ->   x0         single-def "arr expr"
-;  V22 cse0         [V22,T07] (  3,  6   )     int  ->   x3         "CSE #05: aggressive"
-;  V23 cse1         [V23,T19] (  3,  1.50)    long  ->   x3         "CSE #08: moderate"
-;  V24 cse2         [V24,T15] (  4,  2   )     int  ->   x1         "CSE #07: moderate"
+;  V16 tmp1         [V16,T07] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
+;  V17 tmp2         [V17,T08] (  5,  5   )     ref  ->   x5         class-hnd single-def "dup spill" <byte[]>
+;  V18 tmp3         [V18,T15] (  2,  2   )    long  ->   x4         "Cast away GC"
+;  V19 tmp4         [V19,T16] (  2,  2   )    long  ->   x5         "Cast away GC"
+;  V20 tmp5         [V20,T12] (  3,  3   )     ref  ->   x2         single-def "arr expr"
+;  V21 tmp6         [V21,T13] (  3,  3   )     ref  ->   x0         single-def "arr expr"
+;  V22 cse0         [V22,T06] (  3,  6   )     int  ->   x3         "CSE #05: aggressive"
+;  V23 cse1         [V23,T18] (  3,  1.50)    long  ->   x3         "CSE #08: moderate"
+;  V24 cse2         [V24,T14] (  4,  2   )     int  ->   x1         "CSE #07: moderate"
 ;  V25 cse3         [V25,T03] (  3, 12   )    long  ->   x6         "CSE #06: aggressive"
 ;  V26 rat0         [V26,T21] (  3,  9   )  simd16  ->  [fp+0x10]   do-not-enreg[S] "SIMDInitTempVar"
 ;
@@ -47,10 +47,9 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
             ; gcrRegs +[x0]
             mov     w1, wzr
             cntb    x2, all
-            ptrue   p0.b
             ldr     w3, [x0, #0x20]
             mov     w4, wzr
-            whilelt p1.b, w4, w3
+            whilelt p0.b, w4, w3
             movi    v16.4s, #0
             ldr     x4, [x0, #0x10]
             ; gcrRegs +[x4]
@@ -62,7 +61,7 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
             ; gcrRegs -[x5]
             cmp     w4, w5
             bne     G_M14759_IG14
-						;; size=52 bbWeight=1 PerfScore 24.00
+						;; size=48 bbWeight=1 PerfScore 22.00
 G_M14759_IG03:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ldr     x4, [x0, #0x10]
             ; gcrRegs +[x4]
@@ -96,27 +95,30 @@ G_M14759_IG06:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, b
             mov     x5, xzr
 						;; size=4 bbWeight=0.50 PerfScore 0.25
 G_M14759_IG07:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
-            ptest   p0, p1.b
+            ptrue   p1.b
+            ptest   p1, p0.b
             bge     G_M14759_IG09
-						;; size=8 bbWeight=1 PerfScore 3.00
+						;; size=12 bbWeight=1 PerfScore 5.00
 G_M14759_IG08:        ; bbWeight=4, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             sxtw    x6, w1
             add     x7, x4, x6
-            ld1b    { z16.b }, p1/z, [x7]
+            ld1b    { z16.b }, p0/z, [x7]
             add     x6, x5, x6
-            ld1b    { z17.b }, p1/z, [x6]
+            ld1b    { z17.b }, p0/z, [x6]
+            ptrue   p0.b
+            cmpne   p0.b, p0/z, z16.b, z17.b
+            mov     z16.b, p0/z, #1
+            ptrue   p0.b
+            cmpne   p0.b, p0/z, z16.b, #0
             ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, z17.b
-            mov     z16.b, p1/z, #1
-            ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, #0
-            ptest   p0, p1.b
+            ptest   p1, p0.b
             bne     G_M14759_IG09
             add     w1, w1, w2
-            whilelt p1.b, w1, w3
-            ptest   p0, p1.b
+            whilelt p0.b, w1, w3
+            ptrue   p1.b
+            ptest   p1, p0.b
             blt     G_M14759_IG08
-						;; size=64 bbWeight=4 PerfScore 152.00
+						;; size=72 bbWeight=4 PerfScore 168.00
 G_M14759_IG09:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             mov     w3, wzr
             mov     w4, wzr
@@ -198,7 +200,7 @@ G_M14759_IG19:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 380, prolog size 12, PerfScore 248.50, instruction count 95, allocated bytes for code 380 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
+; Total bytes of code 388, prolog size 12, PerfScore 264.50, instruction count 97, allocated bytes for code 388 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -209,7 +211,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 95 (0x0005f) Actual length = 380 (0x00017c)
+  Function Length   : 97 (0x00061) Actual length = 388 (0x000184)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

+8 (+5.26%) : 21403.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)

@@ -8,19 +8,20 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T05] (  4,  4   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrLen>
-;  V01 loc0         [V01,T04] (  3,  7   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
+;  V01 loc0         [V01,T11] (  2,  3   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
 ;* V02 loc1         [V02    ] (  0,  0   )    mask  ->  zero-ref    <System.Numerics.Vector`1[byte]>
-;  V03 loc2         [V03,T10] (  5, 13   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
+;  V03 loc2         [V03,T10] (  5, 13   )  simd16  ->  d17         <System.Numerics.Vector`1[byte]>
 ;  V04 loc3         [V04,T00] (  6, 18   )    long  ->   x1        
 ;  V05 loc4         [V05,T07] (  2,  5   )    long  ->   x2         single-def
-;  V06 loc5         [V06,T01] (  5, 12   )    mask  ->   p1         <System.Numerics.Vector`1[byte]>
-;  V07 loc6         [V07,T03] (  4,  7   )    long  ->   x0        
+;  V06 loc5         [V06,T01] (  5, 12   )    mask  ->   p0         <System.Numerics.Vector`1[byte]>
+;  V07 loc6         [V07,T04] (  4,  7   )    long  ->   x0        
 ;  V08 loc7         [V08    ] (  1,  1   )     ref  ->  [fp+0x18]   must-init pinned class-hnd single-def <byte[]>
 ;# V09 OutArgs      [V09    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V10 tmp1         [V10,T02] (  5,  8   )     ref  ->   x0         class-hnd single-def "dup spill" <byte[]>
 ;  V11 tmp2         [V11,T08] (  2,  2   )    long  ->   x0         "Cast away GC"
 ;  V12 cse0         [V12,T06] (  3,  6   )     int  ->   x3         "CSE #02: aggressive"
-;  V13 cse1         [V13,T09] (  2,  1   )     int  ->   x4         "CSE #01: moderate"
+;  V13 cse1         [V13,T03] (  3,  8   )    mask  ->   p1         "CSE #03: aggressive"
+;  V14 cse2         [V14,T09] (  2,  1   )     int  ->   x4         "CSE #01: moderate"
 ;
 ; Lcl frame size = 16
 
@@ -31,16 +32,16 @@ G_M60402_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=12 bbWeight=1 PerfScore 2.50
 G_M60402_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ; gcrRegs +[x0]
-            ptrue   p0.b
+            mvni    v16.4s, #0
             mov     x1, xzr
             cntb    x2, all
             ldr     w3, [x0, #0x18]
             mov     w4, wzr
-            whilelt p1.b, w4, w3
+            whilelt p0.b, w4, w3
             ldr     x0, [x0, #0x08]
             str     x0, [fp, #0x18]	// [V08 loc7]
             cbz     x0, G_M60402_IG04
-						;; size=36 bbWeight=1 PerfScore 15.00
+						;; size=36 bbWeight=1 PerfScore 13.50
 G_M60402_IG03:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ldr     w4, [x0, #0x08]
             cbz     w4, G_M60402_IG04
@@ -54,28 +55,30 @@ G_M60402_IG04:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
             mov     x0, xzr
 						;; size=4 bbWeight=0.50 PerfScore 0.25
 G_M60402_IG05:        ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
-            ld1b    { z16.b }, p1/z, [x0]
-            movi    v17.4s, #0
+            ld1b    { z17.b }, p0/z, [x0]
+            ptrue   p1.b
+            cmpne   p1.b, p1/z, z16.b, #0
+            movi    v16.4s, #0
             ptrue   p2.b
-            cmpeq   p2.b, p2/z, z16.b, z17.b
-            ptest   p0, p2.b
+            cmpeq   p2.b, p2/z, z17.b, z16.b
+            ptest   p1, p2.b
             bne     G_M60402_IG07
-						;; size=24 bbWeight=2 PerfScore 33.00
+						;; size=32 bbWeight=2 PerfScore 43.00
 G_M60402_IG06:        ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             add     x1, x1, x2
-            whilelt p1.b, w1, w3
+            whilelt p0.b, w1, w3
             add     x4, x0, x1
-            ld1b    { z16.b }, p1/z, [x4]
-            movi    v17.4s, #0
+            ld1b    { z17.b }, p0/z, [x4]
+            movi    v16.4s, #0
             ptrue   p2.b
-            cmpeq   p2.b, p2/z, z16.b, z17.b
-            ptest   p0, p2.b
+            cmpeq   p2.b, p2/z, z17.b, z16.b
+            ptest   p1, p2.b
             beq     G_M60402_IG06
 						;; size=36 bbWeight=4 PerfScore 78.00
 G_M60402_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            ptrue   p0.b
-            cmpne   p0.b, p0/z, z16.b, #0
-            cntp    x0, p1, p0.b
+            ptrue   p1.b
+            cmpne   p1.b, p1/z, z17.b, #0
+            cntp    x0, p0, p1.b
             add     x0, x0, x1
 						;; size=16 bbWeight=1 PerfScore 7.50
 G_M60402_IG08:        ; bbWeight=1, epilog, nogc, extend
@@ -83,7 +86,7 @@ G_M60402_IG08:        ; bbWeight=1, epilog, nogc, extend
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 152, prolog size 12, PerfScore 141.00, instruction count 38, allocated bytes for code 152 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
+; Total bytes of code 160, prolog size 12, PerfScore 149.50, instruction count 40, allocated bytes for code 160 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -94,7 +97,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 38 (0x00026) Actual length = 152 (0x000098)
+  Function Length   : 40 (0x00028) Actual length = 160 (0x0000a0)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

coreclr_tests.run.linux.arm64.checked.mch

-28 (-58.33%) : 358603.dasm - PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)

@@ -17,22 +17,15 @@ G_M44742_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M44742_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            ptrue   p1.h
-            ptrue   p2.h
-            ptrue   p3.h
-            bic     p1.b, p3/z, p1.b, p2.b
-            pfalse  p2.b
-            sel     p0.b, p0, p1.b, p2.b
-            mov     z0.h, p0/z, #1
-						;; size=32 bbWeight=1 PerfScore 16.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M44742_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 ; END METHOD PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short]
 
-; Total bytes of code 48, prolog size 8, PerfScore 19.50, instruction count 12, allocated bytes for code 48 (MethodHash=71345139) for method PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=71345139) for method PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -43,7 +36,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 12 (0x0000c) Actual length = 48 (0x000030)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-24 (-54.55%) : 358606.dasm - PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)

@@ -17,21 +17,15 @@ G_M19455_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M19455_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            ptrue   p0.s
-            movi    v0.4s, #0
-            cmpne   p0.s, p0/z, z0.s, #0
-            pfalse  p1.b
-            ptrue   p2.s
-            sel     p0.b, p0, p1.b, p2.b
-            mov     z0.s, p0/z, #1
-						;; size=28 bbWeight=1 PerfScore 13.50
+            mvni    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M19455_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 ; END METHOD PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int]
 
-; Total bytes of code 44, prolog size 8, PerfScore 17.00, instruction count 11, allocated bytes for code 44 (MethodHash=0304b400) for method PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=0304b400) for method PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -42,7 +36,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 11 (0x0000b) Actual length = 44 (0x00002c)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-28 (-43.75%) : 679474.dasm - Runtime_1068867:TestEntryPoint() (FullOpts)

@@ -16,7 +16,6 @@
 ;* V05 tmp1         [V05    ] (  0,  0   )    long  ->  zero-ref    class-hnd exact "NewObj constructor temp" <C0>
 ;* V06 tmp2         [V06    ] (  0,  0   )  simd16  ->  zero-ref    "location for address-of(RValue)"
 ;* V07 tmp3         [V07    ] (  0,  0   )  struct (16) zero-ref    do-not-enreg[SF] "stack allocated C0" <C0>
-;  V08 cse0         [V08,T00] (  3,  3   )    mask  ->   p0         "CSE #01: aggressive"
 ;
 ; Lcl frame size = 0
 
@@ -24,28 +23,19 @@ G_M538_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
             stp     fp, lr, [sp, #-0x10]!
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
-G_M538_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
-            ptrue   p0.s
+G_M538_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             movi    v0.4s, #0
-            cmpne   p0.s, p0/z, z0.s, #0
-            movi    v0.4s, #0
-            ldr     q16, [@RWD00]
-            sel     z0.s, p0, z0.s, z16.s
-            movi    v16.4s, #0
-            sel     z0.s, p0, z0.s, z16.s
             movz    x0, #0xD1FFAB1E      // code for <unknown method>
             movk    x0, #0xD1FFAB1E LSL #16
             movk    x0, #0xD1FFAB1E LSL #32
             ldr     x0, [x0]
-						;; size=48 bbWeight=1 PerfScore 17.00
+						;; size=20 bbWeight=1 PerfScore 5.00
 G_M538_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             br      x0
 						;; size=8 bbWeight=1 PerfScore 2.00
-RWD00  	dq	0000000000000001h, 0000000000000000h
 
-
-; Total bytes of code 64, prolog size 8, PerfScore 20.50, instruction count 16, allocated bytes for code 64 (MethodHash=1c40fde5) for method Runtime_1068867:TestEntryPoint() (FullOpts)
+; Total bytes of code 36, prolog size 8, PerfScore 8.50, instruction count 9, allocated bytes for code 36 (MethodHash=1c40fde5) for method Runtime_1068867:TestEntryPoint() (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -56,7 +46,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 16 (0x00010) Actual length = 64 (0x000040)
+  Function Length   : 9 (0x00009) Actual length = 36 (0x000024)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

+16 (+2.53%) : 575424.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)

@@ -9,12 +9,12 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T02] (  4,  4   )     ref  ->  x19         this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong>
-;* V01 loc0         [V01,T30] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[ushort]>
-;  V02 loc1         [V02,T29] (  3,  3   )    mask  ->  [fp+0x10]   spill-single-def <System.Numerics.Vector`1[ushort]>
-;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[ulong]>
+;* V01 loc0         [V01,T34] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[ushort]>
+;  V02 loc1         [V02,T32] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[ushort]>
+;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->  d10         <System.Numerics.Vector`1[ulong]>
 ;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V05 tmp1         [V05,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
-;  V06 tmp2         [V06,T32] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V05 tmp1         [V05,T30] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V06 tmp2         [V06,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
 ;  V07 tmp3         [V07,T18] (  2,  4   )    long  ->  x21         "impAppendStmt"
 ;  V08 tmp4         [V08,T19] (  2,  4   )    long  ->  x22         "impAppendStmt"
 ;  V09 tmp5         [V09,T20] (  2,  4   )    long  ->  x23         "impAppendStmt"
@@ -51,21 +51,23 @@
 ;* V40 tmp36        [V40    ] (  0,  0   )    long  ->  zero-ref    ld-addr-op "Inline stloc first use temp"
 ;  V41 tmp37        [V41,T28] (  2,  4   )    long  ->   x0         "Inlining Arg"
 ;  V42 tmp38        [V42,T17] (  3,  6   )    long  ->   x4         "Inlining Arg"
-;  V43 cse0         [V43,T00] (  9,  9   )   byref  ->  x20         "CSE #02: aggressive"
+;  V43 cse0         [V43,T29] (  3,  3   )    mask  ->  [fp+0x18]   spill-single-def "CSE #02: moderate"
+;  V44 cse1         [V44,T00] (  9,  9   )   byref  ->  x20         "CSE #01: aggressive"
 ;
-; Lcl frame size = 8
+; Lcl frame size = 16
 
 G_M33034_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
-            stp     fp, lr, [sp, #-0x60]!
-            stp     d8, d9, [sp, #0x18]
-            stp     d10, d11, [sp, #0x28]
-            stp     x19, x20, [sp, #0x38]
-            stp     x21, x22, [sp, #0x48]
-            str     x23, [sp, #0x58]
+            stp     fp, lr, [sp, #-0x70]!
+            stp     d8, d9, [sp, #0x20]
+            stp     d10, d11, [sp, #0x30]
+            str     d12, [sp, #0x40]
+            stp     x19, x20, [sp, #0x48]
+            stp     x21, x22, [sp, #0x58]
+            str     x23, [sp, #0x68]
             mov     fp, sp
             mov     x19, x0
             ; gcrRegs +[x19]
-						;; size=32 bbWeight=1 PerfScore 7.00
+						;; size=36 bbWeight=1 PerfScore 8.00
 G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             ldr     x1, [x1]
             blr     x1
             ; gcrRegs -[x0]
-            ptrue   p0.h
-            add     xip1, fp, #16
-            str     p0, [xip1]
+            mvni    v8.4s, #0
             add     x20, x19, #96
             ; byrRegs +[x20]
             mov     x21, x20
@@ -99,22 +99,6 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            blr     x1
-            ; byrRegs -[x0]
-            ldr     x1, [x21, #0x20]
-            add     x0, x0, x1
-            sub     x0, x0, #1
-            sub     x1, x1, #1
-            bic     x0, x0, x1
-            ptrue   p0.d
-            ld1d    { z8.d }, p0/z, [x0]
-            mov     x21, x20
-            add     x0, x21, #48
-            ; byrRegs +[x0]
-            movz    x1, #0xD1FFAB1E      // code for <unknown method>
-            movk    x1, #0xD1FFAB1E LSL #16
-            movk    x1, #0xD1FFAB1E LSL #32
-            ldr     x1, [x1]
             mov     v9.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1h    { z10.h }, p0/z, [x0]
+            ptrue   p0.d
+            ld1d    { z10.d }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #56
+            add     x0, x21, #48
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1h    { z7.h }, p0/z, [x0]
+            ptrue   p0.h
             mov     v8.d[1], v9.d[0]
-            mov     v10.d[1], v11.d[0]
-            udot    z8.d, z10.h, z7.h[1]
+            cmpne   p0.h, p0/z, z8.h, #0
+            add     xip1, fp, #24
+            str     p0, [xip1]
+            ld1h    { z8.h }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #64
+            add     x0, x21, #56
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            mov     v9.d[0], v8.d[1]
+            mov     v12.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
             ldr     x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            mov     v8.d[1], v9.d[0]
-            str     q8, [x0]
+            add     xip1, fp, #24
+            ldr     p0, [xip1]
+            ld1h    { z7.h }, p0/z, [x0]
+            mov     v10.d[1], v11.d[0]
+            mov     v8.d[1], v12.d[0]
+            udot    z10.d, z8.h, z7.h[1]
+            mov     x21, x20
+            add     x0, x21, #64
+            ; byrRegs +[x0]
+            movz    x1, #0xD1FFAB1E      // code for <unknown method>
+            movk    x1, #0xD1FFAB1E LSL #16
+            movk    x1, #0xD1FFAB1E LSL #32
+            ldr     x1, [x1]
+            mov     v8.d[0], v10.d[1]
+            blr     x1
+            ; byrRegs -[x0]
+            ldr     x1, [x21, #0x20]
+            add     x0, x0, x1
+            sub     x0, x0, #1
+            sub     x1, x1, #1
+            bic     x0, x0, x1
+            mov     v10.d[1], v8.d[0]
+            str     q10, [x0]
             mov     x21, x20
             add     x0, x21, #40
             ; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x6, #0xD1FFAB1E LSL #16
             movk    x6, #0xD1FFAB1E LSL #32
             ldr     x6, [x6]
-						;; size=572 bbWeight=1 PerfScore 168.50
+						;; size=580 bbWeight=1 PerfScore 168.50
 G_M33034_IG03:        ; bbWeight=1, epilog, nogc, extend
-            ldr     x23, [sp, #0x58]
-            ldp     x21, x22, [sp, #0x48]
-            ldp     x19, x20, [sp, #0x38]
-            ldp     d10, d11, [sp, #0x28]
-            ldp     d8, d9, [sp, #0x18]
-            ldp     fp, lr, [sp], #0x60
+            ldr     x23, [sp, #0x68]
+            ldp     x21, x22, [sp, #0x58]
+            ldp     x19, x20, [sp, #0x48]
+            ldr     d12, [sp, #0x40]
+            ldp     d10, d11, [sp, #0x30]
+            ldp     d8, d9, [sp, #0x20]
+            ldp     fp, lr, [sp], #0x70
             br      x6
-						;; size=28 bbWeight=1 PerfScore 8.00
+						;; size=32 bbWeight=1 PerfScore 10.00
 
-; Total bytes of code 632, prolog size 28, PerfScore 183.50, instruction count 158, allocated bytes for code 632 (MethodHash=1d6a7ef5) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 186.50, instruction count 162, allocated bytes for code 648 (MethodHash=1d6a7ef5) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
 ; ============================================================
 
 Unwind Info:
   >> Start offset   : 0x000000 (not in unwind data)
   >>   End offset   : 0xd1ffab1e (not in unwind data)
-  Code Words        : 3
+  Code Words        : 4
   Epilog Count      : 1
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 158 (0x0009e) Actual length = 632 (0x000278)
+  Function Length   : 162 (0x000a2) Actual length = 648 (0x000288)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
   ---- Unwind codes ----
     E1          set_fp; mov fp, sp
     ---- Epilog start at index 1 ----
-    D1 0B       save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+    D1 0D       save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
     E6          save_next
-    C8 07       save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+    C8 09       save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+    DD 08       save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
     E6          save_next
-    D8 03       save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
-    8B          save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+    D8 04       save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+    8D          save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+    E4          end
+    E4          end
     E4          end
     E4          end

+16 (+2.53%) : 575272.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)

@@ -9,12 +9,12 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T02] (  4,  4   )     ref  ->  x19         this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int>
-;* V01 loc0         [V01,T30] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[sbyte]>
-;  V02 loc1         [V02,T29] (  3,  3   )    mask  ->  [fp+0x10]   spill-single-def <System.Numerics.Vector`1[sbyte]>
-;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[int]>
+;* V01 loc0         [V01,T34] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[sbyte]>
+;  V02 loc1         [V02,T32] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[sbyte]>
+;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->  d10         <System.Numerics.Vector`1[int]>
 ;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V05 tmp1         [V05,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
-;  V06 tmp2         [V06,T32] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V05 tmp1         [V05,T30] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V06 tmp2         [V06,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
 ;  V07 tmp3         [V07,T18] (  2,  4   )    long  ->  x21         "impAppendStmt"
 ;  V08 tmp4         [V08,T19] (  2,  4   )    long  ->  x22         "impAppendStmt"
 ;  V09 tmp5         [V09,T20] (  2,  4   )    long  ->  x23         "impAppendStmt"
@@ -51,21 +51,23 @@
 ;* V40 tmp36        [V40    ] (  0,  0   )    long  ->  zero-ref    ld-addr-op "Inline stloc first use temp"
 ;  V41 tmp37        [V41,T28] (  2,  4   )    long  ->   x0         "Inlining Arg"
 ;  V42 tmp38        [V42,T17] (  3,  6   )    long  ->   x4         "Inlining Arg"
-;  V43 cse0         [V43,T00] (  9,  9   )   byref  ->  x20         "CSE #02: aggressive"
+;  V43 cse0         [V43,T29] (  3,  3   )    mask  ->  [fp+0x18]   spill-single-def "CSE #02: moderate"
+;  V44 cse1         [V44,T00] (  9,  9   )   byref  ->  x20         "CSE #01: aggressive"
 ;
-; Lcl frame size = 8
+; Lcl frame size = 16
 
 G_M55930_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
-            stp     fp, lr, [sp, #-0x60]!
-            stp     d8, d9, [sp, #0x18]
-            stp     d10, d11, [sp, #0x28]
-            stp     x19, x20, [sp, #0x38]
-            stp     x21, x22, [sp, #0x48]
-            str     x23, [sp, #0x58]
+            stp     fp, lr, [sp, #-0x70]!
+            stp     d8, d9, [sp, #0x20]
+            stp     d10, d11, [sp, #0x30]
+            str     d12, [sp, #0x40]
+            stp     x19, x20, [sp, #0x48]
+            stp     x21, x22, [sp, #0x58]
+            str     x23, [sp, #0x68]
             mov     fp, sp
             mov     x19, x0
             ; gcrRegs +[x19]
-						;; size=32 bbWeight=1 PerfScore 7.00
+						;; size=36 bbWeight=1 PerfScore 8.00
 G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             ldr     x1, [x1]
             blr     x1
             ; gcrRegs -[x0]
-            ptrue   p0.b
-            add     xip1, fp, #16
-            str     p0, [xip1]
+            mvni    v8.4s, #0
             add     x20, x19, #96
             ; byrRegs +[x20]
             mov     x21, x20
@@ -99,22 +99,6 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            blr     x1
-            ; byrRegs -[x0]
-            ldr     x1, [x21, #0x20]
-            add     x0, x0, x1
-            sub     x0, x0, #1
-            sub     x1, x1, #1
-            bic     x0, x0, x1
-            ptrue   p0.s
-            ld1w    { z8.s }, p0/z, [x0]
-            mov     x21, x20
-            add     x0, x21, #48
-            ; byrRegs +[x0]
-            movz    x1, #0xD1FFAB1E      // code for <unknown method>
-            movk    x1, #0xD1FFAB1E LSL #16
-            movk    x1, #0xD1FFAB1E LSL #32
-            ldr     x1, [x1]
             mov     v9.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1b    { z10.b }, p0/z, [x0]
+            ptrue   p0.s
+            ld1w    { z10.s }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #56
+            add     x0, x21, #48
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1b    { z16.b }, p0/z, [x0]
+            ptrue   p0.b
             mov     v8.d[1], v9.d[0]
-            mov     v10.d[1], v11.d[0]
-            sdot    z8.s, z10.b, z16.b
+            cmpne   p0.b, p0/z, z8.b, #0
+            add     xip1, fp, #24
+            str     p0, [xip1]
+            ld1b    { z8.b }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #64
+            add     x0, x21, #56
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            mov     v9.d[0], v8.d[1]
+            mov     v12.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
             ldr     x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            mov     v8.d[1], v9.d[0]
-            str     q8, [x0]
+            add     xip1, fp, #24
+            ldr     p0, [xip1]
+            ld1b    { z16.b }, p0/z, [x0]
+            mov     v10.d[1], v11.d[0]
+            mov     v8.d[1], v12.d[0]
+            sdot    z10.s, z8.b, z16.b
+            mov     x21, x20
+            add     x0, x21, #64
+            ; byrRegs +[x0]
+            movz    x1, #0xD1FFAB1E      // code for <unknown method>
+            movk    x1, #0xD1FFAB1E LSL #16
+            movk    x1, #0xD1FFAB1E LSL #32
+            ldr     x1, [x1]
+            mov     v8.d[0], v10.d[1]
+            blr     x1
+            ; byrRegs -[x0]
+            ldr     x1, [x21, #0x20]
+            add     x0, x0, x1
+            sub     x0, x0, #1
+            sub     x1, x1, #1
+            bic     x0, x0, x1
+            mov     v10.d[1], v8.d[0]
+            str     q10, [x0]
             mov     x21, x20
             add     x0, x21, #40
             ; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x6, #0xD1FFAB1E LSL #16
             movk    x6, #0xD1FFAB1E LSL #32
             ldr     x6, [x6]
-						;; size=572 bbWeight=1 PerfScore 168.50
+						;; size=580 bbWeight=1 PerfScore 168.50
 G_M55930_IG03:        ; bbWeight=1, epilog, nogc, extend
-            ldr     x23, [sp, #0x58]
-            ldp     x21, x22, [sp, #0x48]
-            ldp     x19, x20, [sp, #0x38]
-            ldp     d10, d11, [sp, #0x28]
-            ldp     d8, d9, [sp, #0x18]
-            ldp     fp, lr, [sp], #0x60
+            ldr     x23, [sp, #0x68]
+            ldp     x21, x22, [sp, #0x58]
+            ldp     x19, x20, [sp, #0x48]
+            ldr     d12, [sp, #0x40]
+            ldp     d10, d11, [sp, #0x30]
+            ldp     d8, d9, [sp, #0x20]
+            ldp     fp, lr, [sp], #0x70
             br      x6
-						;; size=28 bbWeight=1 PerfScore 8.00
+						;; size=32 bbWeight=1 PerfScore 10.00
 
-; Total bytes of code 632, prolog size 28, PerfScore 183.50, instruction count 158, allocated bytes for code 632 (MethodHash=b01a2585) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 186.50, instruction count 162, allocated bytes for code 648 (MethodHash=b01a2585) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
 ; ============================================================
 
 Unwind Info:
   >> Start offset   : 0x000000 (not in unwind data)
   >>   End offset   : 0xd1ffab1e (not in unwind data)
-  Code Words        : 3
+  Code Words        : 4
   Epilog Count      : 1
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 158 (0x0009e) Actual length = 632 (0x000278)
+  Function Length   : 162 (0x000a2) Actual length = 648 (0x000288)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
   ---- Unwind codes ----
     E1          set_fp; mov fp, sp
     ---- Epilog start at index 1 ----
-    D1 0B       save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+    D1 0D       save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
     E6          save_next
-    C8 07       save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+    C8 09       save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+    DD 08       save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
     E6          save_next
-    D8 03       save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
-    8B          save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+    D8 04       save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+    8D          save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+    E4          end
+    E4          end
     E4          end
     E4          end

+16 (+2.53%) : 569192.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)

@@ -9,12 +9,12 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T02] (  4,  4   )     ref  ->  x19         this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort>
-;* V01 loc0         [V01,T30] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[ushort]>
-;  V02 loc1         [V02,T29] (  3,  3   )    mask  ->  [fp+0x10]   spill-single-def <System.Numerics.Vector`1[ushort]>
-;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[ushort]>
+;* V01 loc0         [V01,T34] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[ushort]>
+;  V02 loc1         [V02,T32] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[ushort]>
+;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->  d10         <System.Numerics.Vector`1[ushort]>
 ;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V05 tmp1         [V05,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
-;  V06 tmp2         [V06,T32] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V05 tmp1         [V05,T30] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V06 tmp2         [V06,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
 ;  V07 tmp3         [V07,T18] (  2,  4   )    long  ->  x21         "impAppendStmt"
 ;  V08 tmp4         [V08,T19] (  2,  4   )    long  ->  x22         "impAppendStmt"
 ;  V09 tmp5         [V09,T20] (  2,  4   )    long  ->  x23         "impAppendStmt"
@@ -51,21 +51,23 @@
 ;* V40 tmp36        [V40    ] (  0,  0   )    long  ->  zero-ref    ld-addr-op "Inline stloc first use temp"
 ;  V41 tmp37        [V41,T28] (  2,  4   )    long  ->   x0         "Inlining Arg"
 ;  V42 tmp38        [V42,T17] (  3,  6   )    long  ->   x4         "Inlining Arg"
-;  V43 cse0         [V43,T00] (  9,  9   )   byref  ->  x20         "CSE #02: aggressive"
+;  V43 cse0         [V43,T29] (  3,  3   )    mask  ->  [fp+0x18]   spill-single-def "CSE #02: moderate"
+;  V44 cse1         [V44,T00] (  9,  9   )   byref  ->  x20         "CSE #01: aggressive"
 ;
-; Lcl frame size = 8
+; Lcl frame size = 16
 
 G_M13407_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
-            stp     fp, lr, [sp, #-0x60]!
-            stp     d8, d9, [sp, #0x18]
-            stp     d10, d11, [sp, #0x28]
-            stp     x19, x20, [sp, #0x38]
-            stp     x21, x22, [sp, #0x48]
-            str     x23, [sp, #0x58]
+            stp     fp, lr, [sp, #-0x70]!
+            stp     d8, d9, [sp, #0x20]
+            stp     d10, d11, [sp, #0x30]
+            str     d12, [sp, #0x40]
+            stp     x19, x20, [sp, #0x48]
+            stp     x21, x22, [sp, #0x58]
+            str     x23, [sp, #0x68]
             mov     fp, sp
             mov     x19, x0
             ; gcrRegs +[x19]
-						;; size=32 bbWeight=1 PerfScore 7.00
+						;; size=36 bbWeight=1 PerfScore 8.00
 G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             ldr     x1, [x1]
             blr     x1
             ; gcrRegs -[x0]
-            ptrue   p0.h
-            add     xip1, fp, #16
-            str     p0, [xip1]
+            mvni    v8.4s, #0
             add     x20, x19, #96
             ; byrRegs +[x20]
             mov     x21, x20
@@ -99,22 +99,6 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            blr     x1
-            ; byrRegs -[x0]
-            ldr     x1, [x21, #0x20]
-            add     x0, x0, x1
-            sub     x0, x0, #1
-            sub     x1, x1, #1
-            bic     x0, x0, x1
-            ptrue   p0.h
-            ld1h    { z8.h }, p0/z, [x0]
-            mov     x21, x20
-            add     x0, x21, #48
-            ; byrRegs +[x0]
-            movz    x1, #0xD1FFAB1E      // code for <unknown method>
-            movk    x1, #0xD1FFAB1E LSL #16
-            movk    x1, #0xD1FFAB1E LSL #32
-            ldr     x1, [x1]
             mov     v9.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
+            ptrue   p0.h
             ld1h    { z10.h }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #56
+            add     x0, x21, #48
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1h    { z16.h }, p0/z, [x0]
+            ptrue   p0.h
             mov     v8.d[1], v9.d[0]
-            mov     v10.d[1], v11.d[0]
-            eor3    z8.d, z8.d, z10.d, z16.d
+            cmpne   p0.h, p0/z, z8.h, #0
+            add     xip1, fp, #24
+            str     p0, [xip1]
+            ld1h    { z8.h }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #64
+            add     x0, x21, #56
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            mov     v9.d[0], v8.d[1]
+            mov     v12.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
             ldr     x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            mov     v8.d[1], v9.d[0]
-            str     q8, [x0]
+            add     xip1, fp, #24
+            ldr     p0, [xip1]
+            ld1h    { z16.h }, p0/z, [x0]
+            mov     v10.d[1], v11.d[0]
+            mov     v8.d[1], v12.d[0]
+            eor3    z10.d, z10.d, z8.d, z16.d
+            mov     x21, x20
+            add     x0, x21, #64
+            ; byrRegs +[x0]
+            movz    x1, #0xD1FFAB1E      // code for <unknown method>
+            movk    x1, #0xD1FFAB1E LSL #16
+            movk    x1, #0xD1FFAB1E LSL #32
+            ldr     x1, [x1]
+            mov     v8.d[0], v10.d[1]
+            blr     x1
+            ; byrRegs -[x0]
+            ldr     x1, [x21, #0x20]
+            add     x0, x0, x1
+            sub     x0, x0, #1
+            sub     x1, x1, #1
+            bic     x0, x0, x1
+            mov     v10.d[1], v8.d[0]
+            str     q10, [x0]
             mov     x21, x20
             add     x0, x21, #40
             ; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x6, #0xD1FFAB1E LSL #16
             movk    x6, #0xD1FFAB1E LSL #32
             ldr     x6, [x6]
-						;; size=572 bbWeight=1 PerfScore 166.50
+						;; size=580 bbWeight=1 PerfScore 166.50
 G_M13407_IG03:        ; bbWeight=1, epilog, nogc, extend
-            ldr     x23, [sp, #0x58]
-            ldp     x21, x22, [sp, #0x48]
-            ldp     x19, x20, [sp, #0x38]
-            ldp     d10, d11, [sp, #0x28]
-            ldp     d8, d9, [sp, #0x18]
-            ldp     fp, lr, [sp], #0x60
+            ldr     x23, [sp, #0x68]
+            ldp     x21, x22, [sp, #0x58]
+            ldp     x19, x20, [sp, #0x48]
+            ldr     d12, [sp, #0x40]
+            ldp     d10, d11, [sp, #0x30]
+            ldp     d8, d9, [sp, #0x20]
+            ldp     fp, lr, [sp], #0x70
             br      x6
-						;; size=28 bbWeight=1 PerfScore 8.00
+						;; size=32 bbWeight=1 PerfScore 10.00
 
-; Total bytes of code 632, prolog size 28, PerfScore 181.50, instruction count 158, allocated bytes for code 632 (MethodHash=f1c3cba0) for method JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 184.50, instruction count 162, allocated bytes for code 648 (MethodHash=f1c3cba0) for method JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)
 ; ============================================================
 
 Unwind Info:
   >> Start offset   : 0x000000 (not in unwind data)
   >>   End offset   : 0xd1ffab1e (not in unwind data)
-  Code Words        : 3
+  Code Words        : 4
   Epilog Count      : 1
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 158 (0x0009e) Actual length = 632 (0x000278)
+  Function Length   : 162 (0x000a2) Actual length = 648 (0x000288)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
   ---- Unwind codes ----
     E1          set_fp; mov fp, sp
     ---- Epilog start at index 1 ----
-    D1 0B       save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+    D1 0D       save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
     E6          save_next
-    C8 07       save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+    C8 09       save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+    DD 08       save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
     E6          save_next
-    D8 03       save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
-    8B          save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+    D8 04       save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+    8D          save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+    E4          end
+    E4          end
     E4          end
     E4          end

benchmarks.run_pgo.linux.arm64.checked.mch

-4 (-0.85%) : 58518.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)

@@ -36,8 +36,7 @@ G_M60402_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 G_M60402_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             mov     w0, #0xD1FFAB1E
             str     w0, [fp, #0x20]	// [V11 tmp2]
-            ptrue   p0.b
-            mov     z16.b, p0/z, #1
+            mvni    v16.4s, #0
             str     q16, [fp, #0x80]	// [V01 loc0]
             str     xzr, [fp, #0x58]	// [V04 loc3]
             cntb    x0, all
@@ -62,7 +61,7 @@ G_M60402_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             ldr     w0, [x0, #0x08]
             ; gcrRegs -[x0]
             cbnz    w0, G_M60402_IG05
-						;; size=96 bbWeight=1 PerfScore 40.50
+						;; size=92 bbWeight=1 PerfScore 37.00
 G_M60402_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -179,7 +178,7 @@ G_M60402_IG11:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 472, prolog size 36, PerfScore 168.27, instruction count 118, allocated bytes for code 472 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
+; Total bytes of code 468, prolog size 36, PerfScore 164.77, instruction count 117, allocated bytes for code 468 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -190,7 +189,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 118 (0x00076) Actual length = 472 (0x0001d8)
+  Function Length   : 117 (0x00075) Actual length = 468 (0x0001d4)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-0.58%) : 24532.dasm - SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)

@@ -229,8 +229,7 @@ G_M22667_IG17:        ; bbWeight=0.01, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
 						;; size=12 bbWeight=0.01 PerfScore 0.02
 G_M22667_IG18:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             ptrue   p0.h
-            mov     z16.h, p0/z, #1
-            ptrue   p0.h
+            mvni    v16.4s, #0
             cmpne   p0.h, p0/z, z16.h, #0
             ptrue   p1.h
             ldr     q16, [fp, #0x50]	// [V05 loc4]
@@ -249,7 +248,7 @@ G_M22667_IG18:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             bl      CORINFO_HELP_COUNTPROFILE32
             ; gcr arg pop 0
             movn    w0, #0
-						;; size=76 bbWeight=1 PerfScore 25.50
+						;; size=72 bbWeight=1 PerfScore 22.00
 G_M22667_IG19:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x90
             ret     lr
@@ -265,7 +264,7 @@ G_M22667_IG21:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 688, prolog size 36, PerfScore 211.04, instruction count 172, allocated bytes for code 688 (MethodHash=8b05a774) for method SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
+; Total bytes of code 684, prolog size 36, PerfScore 207.54, instruction count 171, allocated bytes for code 684 (MethodHash=8b05a774) for method SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -276,7 +275,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 172 (0x000ac) Actual length = 688 (0x0002b0)
+  Function Length   : 171 (0x000ab) Actual length = 684 (0x0002ac)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-0.49%) : 76632.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)

@@ -95,8 +95,7 @@ G_M34028_IG06:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ldr     x2, [x2]
             blr     x2
             ; gcr arg pop 0
-            ptrue   p0.h
-            mov     z0.h, p0/z, #1
+            mvni    v0.4s, #0
             movz    x0, #0xD1FFAB1E      // code for <unknown method>
             movk    x0, #0xD1FFAB1E LSL #16
             movk    x0, #0xD1FFAB1E LSL #32
@@ -105,7 +104,7 @@ G_M34028_IG06:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ; gcr arg pop 0
             str     q0, [fp, #0x50]	// [V05 loc4]
             b       G_M34028_IG16
-						;; size=68 bbWeight=1 PerfScore 22.50
+						;; size=64 bbWeight=1 PerfScore 19.00
 G_M34028_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             ldr     w0, [fp, #0x84]	// [V01 loc0]
             sxtw    x0, w0
@@ -317,7 +316,7 @@ G_M34028_IG27:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 812, prolog size 36, PerfScore 237.56, instruction count 203, allocated bytes for code 812 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
+; Total bytes of code 808, prolog size 36, PerfScore 234.06, instruction count 202, allocated bytes for code 808 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -328,7 +327,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 203 (0x000cb) Actual length = 812 (0x00032c)
+  Function Length   : 202 (0x000ca) Actual length = 808 (0x000328)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-0.38%) : 14743.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)

@@ -49,8 +49,7 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             str     wzr, [fp, #0xC4]	// [V01 loc0]
             cntb    x0, all
             str     w0, [fp, #0xC0]	// [V02 loc1]
-            ptrue   p0.b
-            mov     z16.b, p0/z, #1
+            mvni    v16.4s, #0
             str     q16, [fp, #0xB0]	// [V03 loc2]
             ldr     x0, [fp, #0xC8]	// [V00 this]
             ; gcrRegs +[x0]
@@ -86,7 +85,7 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             ldr     w0, [x0, #0x08]
             ; gcrRegs -[x0]
             cbnz    w0, G_M14759_IG05
-						;; size=136 bbWeight=1 PerfScore 59.50
+						;; size=132 bbWeight=1 PerfScore 56.00
 G_M14759_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -394,7 +393,7 @@ G_M14759_IG27:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 1052, prolog size 44, PerfScore 348.29, instruction count 263, allocated bytes for code 1052 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
+; Total bytes of code 1048, prolog size 44, PerfScore 344.79, instruction count 262, allocated bytes for code 1048 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -405,7 +404,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 263 (0x00107) Actual length = 1052 (0x00041c)
+  Function Length   : 262 (0x00106) Actual length = 1048 (0x000418)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-0.43%) : 39357.dasm - SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)

@@ -41,8 +41,7 @@ G_M892_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
 G_M892_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             mov     w0, #0xD1FFAB1E
             str     w0, [fp, #0x28]	// [V15 tmp3]
-            ptrue   p0.b
-            mov     z16.b, p0/z, #1
+            mvni    v16.4s, #0
             str     q16, [fp, #0xA0]	// [V01 loc0]
             str     wzr, [fp, #0x6C]	// [V05 loc4]
             cntb    x0, all
@@ -71,7 +70,7 @@ G_M892_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, i
             ldr     w0, [x0, #0x08]
             ; gcrRegs -[x0]
             cbnz    w0, G_M892_IG05
-						;; size=104 bbWeight=1 PerfScore 46.00
+						;; size=100 bbWeight=1 PerfScore 42.50
 G_M892_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -357,7 +356,7 @@ G_M892_IG24:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 928, prolog size 36, PerfScore 307.29, instruction count 232, allocated bytes for code 928 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
+; Total bytes of code 924, prolog size 36, PerfScore 303.79, instruction count 231, allocated bytes for code 924 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -368,7 +367,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 232 (0x000e8) Actual length = 928 (0x0003a0)
+  Function Length   : 231 (0x000e7) Actual length = 924 (0x00039c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

libraries.pmi.linux.arm64.checked.mch

-4 (-16.67%) : 11401.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)

@@ -16,15 +16,14 @@ G_M40111_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M40111_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.s, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M40111_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=96116350) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=96116350) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-16.67%) : 11402.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)

@@ -16,15 +16,14 @@ G_M56373_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M56373_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.d, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M56373_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=c46823ca) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=c46823ca) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-16.67%) : 11403.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)

@@ -16,15 +16,14 @@ G_M57390_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M57390_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.b, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M57390_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=86bf1fd1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=86bf1fd1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-16.67%) : 11400.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)

@@ -16,15 +16,14 @@ G_M33416_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M33416_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.h, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M33416_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=c51e7d77) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=c51e7d77) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-16.67%) : 11407.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)

@@ -16,15 +16,14 @@ G_M18837_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M18837_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.d, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M18837_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=e813b66a) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=e813b66a) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-16.67%) : 11399.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)

@@ -16,15 +16,14 @@ G_M43790_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M43790_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.d, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M43790_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=73a354f1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=73a354f1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch

-4 (-1.30%) : 13109.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)

@@ -7,33 +7,33 @@
 ; No matching PGO data
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T09] (  5,  5   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
-;  V01 loc0         [V01,T05] (  3,  9   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
+;  V00 this         [V00,T08] (  5,  5   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
+;* V01 loc0         [V01,T22] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[byte]>
 ;* V02 loc1         [V02    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V03 loc2         [V03    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V04 loc3         [V04    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;  V05 loc4         [V05,T00] ( 12, 41.50)     int  ->   x1        
-;  V06 loc5         [V06,T13] (  3,  6   )     int  ->   x2         single-def
-;  V07 loc6         [V07,T17] (  3,  5   )    long  ->   x4        
-;  V08 loc7         [V08,T18] (  3,  5   )    long  ->   x6        
+;  V06 loc5         [V06,T12] (  3,  6   )     int  ->   x2         single-def
+;  V07 loc6         [V07,T16] (  3,  5   )    long  ->   x4        
+;  V08 loc7         [V08,T17] (  3,  5   )    long  ->   x6        
 ;  V09 loc8         [V09    ] (  1,  0.50)     ref  ->  [fp+0x18]   must-init pinned class-hnd single-def <byte[]>
 ;  V10 loc9         [V10    ] (  1,  0.50)     ref  ->  [fp+0x10]   must-init pinned class-hnd single-def <byte[]>
-;  V11 loc10        [V11,T08] (  2,  8   )   ubyte  ->   x8        
+;  V11 loc10        [V11,T07] (  2,  8   )   ubyte  ->   x8        
 ;# V12 OutArgs      [V12    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V13 tmp1         [V13,T15] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
-;  V14 tmp2         [V14,T16] (  5,  5   )     ref  ->   x6         class-hnd single-def "dup spill" <byte[]>
-;  V15 tmp3         [V15,T19] (  2,  2   )    long  ->   x4         "Cast away GC"
-;  V16 tmp4         [V16,T20] (  2,  2   )    long  ->   x6         "Cast away GC"
+;  V13 tmp1         [V13,T14] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
+;  V14 tmp2         [V14,T15] (  5,  5   )     ref  ->   x6         class-hnd single-def "dup spill" <byte[]>
+;  V15 tmp3         [V15,T18] (  2,  2   )    long  ->   x4         "Cast away GC"
+;  V16 tmp4         [V16,T19] (  2,  2   )    long  ->   x6         "Cast away GC"
 ;  V17 tmp5         [V17,T01] (  3, 24   )     ref  ->   x2         "arr expr"
 ;  V18 tmp6         [V18,T02] (  3, 24   )     ref  ->   x6         "arr expr"
-;* V19 tmp7         [V19,T21] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
-;* V20 tmp8         [V20,T22] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
-;  V21 cse0         [V21,T06] (  3,  8.50)     int  ->   x2         "CSE #11: aggressive"
-;  V22 cse1         [V22,T07] (  3,  8.50)     int  ->   x4         "CSE #14: aggressive"
-;  V23 cse2         [V23,T14] (  3,  6   )     int  ->   x7         "CSE #07: aggressive"
-;  V24 cse3         [V24,T12] (  4,  6.50)     int  ->   x0         "CSE #06: aggressive"
-;  V25 cse4         [V25,T10] (  4,  6.50)     ref  ->   x3         "CSE #01: aggressive"
-;  V26 cse5         [V26,T11] (  4,  6.50)     ref  ->   x5         "CSE #03: aggressive"
+;* V19 tmp7         [V19,T20] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
+;* V20 tmp8         [V20,T21] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
+;  V21 cse0         [V21,T05] (  3,  8.50)     int  ->   x2         "CSE #11: aggressive"
+;  V22 cse1         [V22,T06] (  3,  8.50)     int  ->   x4         "CSE #14: aggressive"
+;  V23 cse2         [V23,T13] (  3,  6   )     int  ->   x7         "CSE #07: aggressive"
+;  V24 cse3         [V24,T11] (  4,  6.50)     int  ->   x0         "CSE #06: aggressive"
+;  V25 cse4         [V25,T09] (  4,  6.50)     ref  ->   x3         "CSE #01: aggressive"
+;  V26 cse5         [V26,T10] (  4,  6.50)     ref  ->   x5         "CSE #03: aggressive"
 ;  V27 cse6         [V27,T03] (  3, 12   )    long  ->   x4         "CSE #08: aggressive"
 ;  V28 cse7         [V28,T04] (  3, 12   )    long  ->   x8         "CSE #05: aggressive"
 ;
@@ -46,7 +46,6 @@ G_M892_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
 						;; size=12 bbWeight=1 PerfScore 2.50
 G_M892_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ; gcrRegs +[x0]
-            ptrue   p0.b
             mov     w1, wzr
             cntb    x2, all
             ldr     x3, [x0, #0x10]
@@ -57,7 +56,7 @@ G_M892_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref,
             ldr     w6, [x5, #0x08]
             cmp     w4, w6
             bne     G_M892_IG11
-						;; size=36 bbWeight=1 PerfScore 18.00
+						;; size=32 bbWeight=1 PerfScore 16.00
 G_M892_IG03:        ; bbWeight=0.50, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {}, byref, isz
             mov     x4, x3
             ; gcrRegs +[x4]
@@ -99,14 +98,14 @@ G_M892_IG07:        ; bbWeight=1, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {},
 G_M892_IG08:        ; bbWeight=4, gcrefRegs=0028 {x3 x5}, byrefRegs=0000 {}, byref, isz
             sxtw    x8, w1
             add     x9, x4, x8
+            ptrue   p0.b
             ld1b    { z16.b }, p0/z, [x9]
             add     x8, x6, x8
             ld1b    { z17.b }, p0/z, [x8]
-            ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, z17.b
-            mov     z16.b, p1/z, #1
-            ptrue   p1.b
-            uaddv   d16, p1, z16.b
+            cmpne   p0.b, p0/z, z16.b, z17.b
+            mov     z16.b, p0/z, #1
+            ptrue   p0.b
+            uaddv   d16, p0, z16.b
             umov    x8, v16.d[0]
             uxtb    w8, w8
             cmp     w8, #0
@@ -169,7 +168,7 @@ G_M892_IG15:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 308, prolog size 12, PerfScore 259.00, instruction count 77, allocated bytes for code 308 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
+; Total bytes of code 304, prolog size 12, PerfScore 257.00, instruction count 76, allocated bytes for code 304 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -180,7 +179,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 77 (0x0004d) Actual length = 308 (0x000134)
+  Function Length   : 76 (0x0004c) Actual length = 304 (0x000130)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-1.05%) : 26420.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

@@ -21,14 +21,13 @@
 ;# V10 OutArgs      [V10    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V11 tmp1         [V11,T05] (  5,  8   )     ref  ->   x3         class-hnd single-def "dup spill" <char[]>
 ;* V12 tmp2         [V12    ] (  0,  0   )  ushort  ->  zero-ref    "Inlining Arg"
-;* V13 tmp3         [V13    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg" <System.Numerics.Vector`1[short]>
-;  V14 tmp4         [V14,T11] (  2,  2   )    long  ->   x3         "Cast away GC"
-;  V15 tmp5         [V15,T01] (  3, 24   )     ref  ->   x3         "arr expr"
-;  V16 cse0         [V16,T08] (  3,  6   )     int  ->   x4         "CSE #07: aggressive"
-;  V17 cse1         [V17,T03] (  5, 10.25)     int  ->   x0         "CSE #02: aggressive"
-;  V18 cse2         [V18,T07] (  3,  6   )     ref  ->   x2         "CSE #06: aggressive"
-;  V19 cse3         [V19,T04] (  4, 10   )     int  ->   x5         "CSE #05: aggressive"
-;  V20 cse4         [V20,T10] (  2,  4.25)    mask  ->   p0         hoist "CSE #03: aggressive"
+;  V13 tmp3         [V13,T11] (  2,  2   )    long  ->   x3         "Cast away GC"
+;  V14 tmp4         [V14,T01] (  3, 24   )     ref  ->   x3         "arr expr"
+;  V15 cse0         [V15,T08] (  3,  6   )     int  ->   x4         "CSE #07: aggressive"
+;  V16 cse1         [V16,T03] (  5, 10.25)     int  ->   x0         "CSE #02: aggressive"
+;  V17 cse2         [V17,T07] (  3,  6   )     ref  ->   x2         "CSE #06: aggressive"
+;  V18 cse3         [V18,T04] (  4, 10   )     int  ->   x5         "CSE #05: aggressive"
+;  V19 cse4         [V19,T10] (  2,  4.25)    mask  ->   p0         hoist "CSE #03: aggressive"
 ;
 ; Lcl frame size = 16
 
@@ -62,14 +61,13 @@ G_M34028_IG04:        ; bbWeight=0.50, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}
 G_M34028_IG05:        ; bbWeight=1, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}, byref, isz
             ldrh    w4, [x0, #0x14]
             dup     v16.8h, w4
-            ptrue   p0.h
-            mov     z17.h, p0/z, #1
+            mvni    v17.4s, #0
             ldr     w0, [x0, #0x10]
             ; gcrRegs -[x0]
             cnth    x5, all
             cmp     w0, w5
             ble     G_M34028_IG10
-						;; size=32 bbWeight=1 PerfScore 15.50
+						;; size=28 bbWeight=1 PerfScore 12.00
 G_M34028_IG06:        ; bbWeight=0.25, gcrefRegs=0004 {x2}, byrefRegs=0000 {}, byref
             ptrue   p0.h
             cmpne   p0.h, p0/z, z17.h, #0
@@ -177,7 +175,7 @@ G_M34028_IG18:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 380, prolog size 12, PerfScore 236.38, instruction count 95, allocated bytes for code 380 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
+; Total bytes of code 376, prolog size 12, PerfScore 232.88, instruction count 94, allocated bytes for code 376 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -188,7 +186,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 95 (0x0005f) Actual length = 380 (0x00017c)
+  Function Length   : 94 (0x0005e) Actual length = 376 (0x000178)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

+8 (+2.11%) : 6897.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)

@@ -8,31 +8,31 @@
 ; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T06] (  9,  7   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
+;  V00 this         [V00,T05] (  9,  7   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
 ;  V01 loc0         [V01,T02] (  6, 17.50)     int  ->   x1        
 ;  V02 loc1         [V02,T04] (  5, 10   )     int  ->   x2         single-def
-;  V03 loc2         [V03,T05] (  4, 10   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
-;  V04 loc3         [V04,T01] (  6, 18   )    mask  ->   p1         <System.Numerics.Vector`1[byte]>
+;* V03 loc2         [V03,T19] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[byte]>
+;  V04 loc3         [V04,T01] (  6, 18   )    mask  ->   p0         <System.Numerics.Vector`1[byte]>
 ;  V05 loc4         [V05,T20] (  4, 13   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
 ;* V06 loc5         [V06    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V07 loc6         [V07    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
-;  V08 loc7         [V08,T11] (  3,  5   )    long  ->   x4        
-;  V09 loc8         [V09,T12] (  3,  5   )    long  ->   x5        
+;  V08 loc7         [V08,T10] (  3,  5   )    long  ->   x4        
+;  V09 loc8         [V09,T11] (  3,  5   )    long  ->   x5        
 ;  V10 loc9         [V10    ] (  1,  0.50)     ref  ->  [fp+0x28]   must-init pinned class-hnd single-def <byte[]>
 ;  V11 loc10        [V11    ] (  1,  0.50)     ref  ->  [fp+0x20]   must-init pinned class-hnd single-def <byte[]>
-;  V12 loc11        [V12,T10] (  4,  5   )     int  ->   x3        
-;  V13 loc12        [V13,T18] (  3,  1.50)     int  ->   x3         single-def
+;  V12 loc11        [V12,T09] (  4,  5   )     int  ->   x3        
+;  V13 loc12        [V13,T17] (  3,  1.50)     int  ->   x3         single-def
 ;  V14 loc13        [V14,T00] (  7, 22.50)     int  ->   x4        
 ;# V15 OutArgs      [V15    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V16 tmp1         [V16,T08] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
-;  V17 tmp2         [V17,T09] (  5,  5   )     ref  ->   x5         class-hnd single-def "dup spill" <byte[]>
-;  V18 tmp3         [V18,T16] (  2,  2   )    long  ->   x4         "Cast away GC"
-;  V19 tmp4         [V19,T17] (  2,  2   )    long  ->   x5         "Cast away GC"
-;  V20 tmp5         [V20,T13] (  3,  3   )     ref  ->   x2         single-def "arr expr"
-;  V21 tmp6         [V21,T14] (  3,  3   )     ref  ->   x0         single-def "arr expr"
-;  V22 cse0         [V22,T07] (  3,  6   )     int  ->   x3         "CSE #05: aggressive"
-;  V23 cse1         [V23,T19] (  3,  1.50)    long  ->   x3         "CSE #08: moderate"
-;  V24 cse2         [V24,T15] (  4,  2   )     int  ->   x1         "CSE #07: moderate"
+;  V16 tmp1         [V16,T07] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
+;  V17 tmp2         [V17,T08] (  5,  5   )     ref  ->   x5         class-hnd single-def "dup spill" <byte[]>
+;  V18 tmp3         [V18,T15] (  2,  2   )    long  ->   x4         "Cast away GC"
+;  V19 tmp4         [V19,T16] (  2,  2   )    long  ->   x5         "Cast away GC"
+;  V20 tmp5         [V20,T12] (  3,  3   )     ref  ->   x2         single-def "arr expr"
+;  V21 tmp6         [V21,T13] (  3,  3   )     ref  ->   x0         single-def "arr expr"
+;  V22 cse0         [V22,T06] (  3,  6   )     int  ->   x3         "CSE #05: aggressive"
+;  V23 cse1         [V23,T18] (  3,  1.50)    long  ->   x3         "CSE #08: moderate"
+;  V24 cse2         [V24,T14] (  4,  2   )     int  ->   x1         "CSE #07: moderate"
 ;  V25 cse3         [V25,T03] (  3, 12   )    long  ->   x6         "CSE #06: aggressive"
 ;  V26 rat0         [V26,T21] (  3,  9   )  simd16  ->  [fp+0x10]   do-not-enreg[S] "SIMDInitTempVar"
 ;
@@ -47,10 +47,9 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
             ; gcrRegs +[x0]
             mov     w1, wzr
             cntb    x2, all
-            ptrue   p0.b
             ldr     w3, [x0, #0x20]
             mov     w4, wzr
-            whilelt p1.b, w4, w3
+            whilelt p0.b, w4, w3
             movi    v16.4s, #0
             ldr     x4, [x0, #0x10]
             ; gcrRegs +[x4]
@@ -62,7 +61,7 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
             ; gcrRegs -[x5]
             cmp     w4, w5
             bne     G_M14759_IG14
-						;; size=52 bbWeight=1 PerfScore 24.00
+						;; size=48 bbWeight=1 PerfScore 22.00
 G_M14759_IG03:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ldr     x4, [x0, #0x10]
             ; gcrRegs +[x4]
@@ -96,27 +95,30 @@ G_M14759_IG06:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, b
             mov     x5, xzr
 						;; size=4 bbWeight=0.50 PerfScore 0.25
 G_M14759_IG07:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
-            ptest   p0, p1.b
+            ptrue   p1.b
+            ptest   p1, p0.b
             bge     G_M14759_IG09
-						;; size=8 bbWeight=1 PerfScore 3.00
+						;; size=12 bbWeight=1 PerfScore 5.00
 G_M14759_IG08:        ; bbWeight=4, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             sxtw    x6, w1
             add     x7, x4, x6
-            ld1b    { z16.b }, p1/z, [x7]
+            ld1b    { z16.b }, p0/z, [x7]
             add     x6, x5, x6
-            ld1b    { z17.b }, p1/z, [x6]
+            ld1b    { z17.b }, p0/z, [x6]
+            ptrue   p0.b
+            cmpne   p0.b, p0/z, z16.b, z17.b
+            mov     z16.b, p0/z, #1
+            ptrue   p0.b
+            cmpne   p0.b, p0/z, z16.b, #0
             ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, z17.b
-            mov     z16.b, p1/z, #1
-            ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, #0
-            ptest   p0, p1.b
+            ptest   p1, p0.b
             bne     G_M14759_IG09
             add     w1, w1, w2
-            whilelt p1.b, w1, w3
-            ptest   p0, p1.b
+            whilelt p0.b, w1, w3
+            ptrue   p1.b
+            ptest   p1, p0.b
             blt     G_M14759_IG08
-						;; size=64 bbWeight=4 PerfScore 152.00
+						;; size=72 bbWeight=4 PerfScore 168.00
 G_M14759_IG09:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             mov     w3, wzr
             mov     w4, wzr
@@ -198,7 +200,7 @@ G_M14759_IG19:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 380, prolog size 12, PerfScore 248.50, instruction count 95, allocated bytes for code 380 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
+; Total bytes of code 388, prolog size 12, PerfScore 264.50, instruction count 97, allocated bytes for code 388 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -209,7 +211,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 95 (0x0005f) Actual length = 380 (0x00017c)
+  Function Length   : 97 (0x00061) Actual length = 388 (0x000184)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

+8 (+5.26%) : 21539.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)

@@ -8,19 +8,20 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T05] (  4,  4   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrLen>
-;  V01 loc0         [V01,T04] (  3,  7   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
+;  V01 loc0         [V01,T11] (  2,  3   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
 ;* V02 loc1         [V02    ] (  0,  0   )    mask  ->  zero-ref    <System.Numerics.Vector`1[byte]>
-;  V03 loc2         [V03,T10] (  5, 13   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
+;  V03 loc2         [V03,T10] (  5, 13   )  simd16  ->  d17         <System.Numerics.Vector`1[byte]>
 ;  V04 loc3         [V04,T00] (  6, 18   )    long  ->   x1        
 ;  V05 loc4         [V05,T07] (  2,  5   )    long  ->   x2         single-def
-;  V06 loc5         [V06,T01] (  5, 12   )    mask  ->   p1         <System.Numerics.Vector`1[byte]>
-;  V07 loc6         [V07,T03] (  4,  7   )    long  ->   x0        
+;  V06 loc5         [V06,T01] (  5, 12   )    mask  ->   p0         <System.Numerics.Vector`1[byte]>
+;  V07 loc6         [V07,T04] (  4,  7   )    long  ->   x0        
 ;  V08 loc7         [V08    ] (  1,  1   )     ref  ->  [fp+0x18]   must-init pinned class-hnd single-def <byte[]>
 ;# V09 OutArgs      [V09    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V10 tmp1         [V10,T02] (  5,  8   )     ref  ->   x0         class-hnd single-def "dup spill" <byte[]>
 ;  V11 tmp2         [V11,T08] (  2,  2   )    long  ->   x0         "Cast away GC"
 ;  V12 cse0         [V12,T06] (  3,  6   )     int  ->   x3         "CSE #02: aggressive"
-;  V13 cse1         [V13,T09] (  2,  1   )     int  ->   x4         "CSE #01: moderate"
+;  V13 cse1         [V13,T03] (  3,  8   )    mask  ->   p1         "CSE #03: aggressive"
+;  V14 cse2         [V14,T09] (  2,  1   )     int  ->   x4         "CSE #01: moderate"
 ;
 ; Lcl frame size = 16
 
@@ -31,16 +32,16 @@ G_M60402_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=12 bbWeight=1 PerfScore 2.50
 G_M60402_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ; gcrRegs +[x0]
-            ptrue   p0.b
+            mvni    v16.4s, #0
             mov     x1, xzr
             cntb    x2, all
             ldr     w3, [x0, #0x18]
             mov     w4, wzr
-            whilelt p1.b, w4, w3
+            whilelt p0.b, w4, w3
             ldr     x0, [x0, #0x08]
             str     x0, [fp, #0x18]	// [V08 loc7]
             cbz     x0, G_M60402_IG04
-						;; size=36 bbWeight=1 PerfScore 15.00
+						;; size=36 bbWeight=1 PerfScore 13.50
 G_M60402_IG03:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ldr     w4, [x0, #0x08]
             cbz     w4, G_M60402_IG04
@@ -54,28 +55,30 @@ G_M60402_IG04:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
             mov     x0, xzr
 						;; size=4 bbWeight=0.50 PerfScore 0.25
 G_M60402_IG05:        ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
-            ld1b    { z16.b }, p1/z, [x0]
-            movi    v17.4s, #0
+            ld1b    { z17.b }, p0/z, [x0]
+            ptrue   p1.b
+            cmpne   p1.b, p1/z, z16.b, #0
+            movi    v16.4s, #0
             ptrue   p2.b
-            cmpeq   p2.b, p2/z, z16.b, z17.b
-            ptest   p0, p2.b
+            cmpeq   p2.b, p2/z, z17.b, z16.b
+            ptest   p1, p2.b
             bne     G_M60402_IG07
-						;; size=24 bbWeight=2 PerfScore 33.00
+						;; size=32 bbWeight=2 PerfScore 43.00
 G_M60402_IG06:        ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             add     x1, x1, x2
-            whilelt p1.b, w1, w3
+            whilelt p0.b, w1, w3
             add     x4, x0, x1
-            ld1b    { z16.b }, p1/z, [x4]
-            movi    v17.4s, #0
+            ld1b    { z17.b }, p0/z, [x4]
+            movi    v16.4s, #0
             ptrue   p2.b
-            cmpeq   p2.b, p2/z, z16.b, z17.b
-            ptest   p0, p2.b
+            cmpeq   p2.b, p2/z, z17.b, z16.b
+            ptest   p1, p2.b
             beq     G_M60402_IG06
 						;; size=36 bbWeight=4 PerfScore 78.00
 G_M60402_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            ptrue   p0.b
-            cmpne   p0.b, p0/z, z16.b, #0
-            cntp    x0, p1, p0.b
+            ptrue   p1.b
+            cmpne   p1.b, p1/z, z17.b, #0
+            cntp    x0, p0, p1.b
             add     x0, x0, x1
 						;; size=16 bbWeight=1 PerfScore 7.50
 G_M60402_IG08:        ; bbWeight=1, epilog, nogc, extend
@@ -83,7 +86,7 @@ G_M60402_IG08:        ; bbWeight=1, epilog, nogc, extend
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 152, prolog size 12, PerfScore 141.00, instruction count 38, allocated bytes for code 152 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
+; Total bytes of code 160, prolog size 12, PerfScore 149.50, instruction count 40, allocated bytes for code 160 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -94,7 +97,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 38 (0x00026) Actual length = 152 (0x000098)
+  Function Length   : 40 (0x00028) Actual length = 160 (0x0000a0)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

Details

Size improvements/regressions per collection

Collection	Contexts with diffs	Improvements	Regressions	Same size	Improvements (bytes)	Regressions (bytes)
benchmarks.run.linux.arm64.checked.mch	4	2	2	0	-8	+16
coreclr_tests.run.linux.arm64.checked.mch	10,995	10,810	148	37	-336,352	+2,368
benchmarks.run_pgo.linux.arm64.checked.mch	5	5	0	0	-20	+0
libraries.pmi.linux.arm64.checked.mch	10	10	0	0	-40	+0
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	0	0	0	0	-0	+0
libraries_tests.run.linux.arm64.Release.mch	0	0	0	0	-0	+0
smoke_tests.nativeaot.linux.arm64.checked.mch	0	0	0	0	-0	+0
realworld.run.linux.arm64.checked.mch	0	0	0	0	-0	+0
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch	4	2	2	0	-8	+16
libraries.crossgen2.linux.arm64.checked.mch	0	0	0	0	-0	+0
	11,018	10,829	152	37	-336,428	+2,400

PerfScore improvements/regressions per collection

Collection	Contexts with diffs	Improvements	Regressions	Same PerfScore	Improvements (PerfScore)	Regressions (PerfScore)	PerfScore Overall in FullOpts
benchmarks.run.linux.arm64.checked.mch	4	2	2	0	-1.13%	+6.23%	+0.0003%
coreclr_tests.run.linux.arm64.checked.mch	10,995	10,811	148	36	-5.42%	+1.61%	-0.1610%
benchmarks.run_pgo.linux.arm64.checked.mch	5	5	0	0	-1.47%	0.00%	0.0000%
libraries.pmi.linux.arm64.checked.mch	10	10	0	0	-46.67%	0.00%	-0.0024%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	0	0	0	0	0.00%	0.00%	0.0000%
libraries_tests.run.linux.arm64.Release.mch	0	0	0	0	0.00%	0.00%	0.0000%
smoke_tests.nativeaot.linux.arm64.checked.mch	0	0	0	0	0.00%	0.00%	0.0000%
realworld.run.linux.arm64.checked.mch	0	0	0	0	0.00%	0.00%	0.0000%
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch	4	2	2	0	-1.13%	+6.23%	+0.0003%
libraries.crossgen2.linux.arm64.checked.mch	0	0	0	0	0.00%	0.00%	0.0000%

Context information

Collection	Diffed contexts	MinOpts	FullOpts	Missed, base	Missed, diff
benchmarks.run.linux.arm64.checked.mch	34,596	2,941	31,655	0 (0.00%)	0 (0.00%)
coreclr_tests.run.linux.arm64.checked.mch	738,681	470,040	268,641	0 (0.00%)	0 (0.00%)
benchmarks.run_pgo.linux.arm64.checked.mch	132,155	61,313	70,842	0 (0.00%)	0 (0.00%)
libraries.pmi.linux.arm64.checked.mch	259,557	5	259,552	0 (0.00%)	0 (0.00%)
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	350,289	21,739	328,550	0 (0.00%)	0 (0.00%)
libraries_tests.run.linux.arm64.Release.mch	767,445	533,158	234,287	0 (0.00%)	0 (0.00%)
smoke_tests.nativeaot.linux.arm64.checked.mch	18,714	7	18,707	0 (0.00%)	0 (0.00%)
realworld.run.linux.arm64.checked.mch	28,798	39	28,759	0 (0.00%)	0 (0.00%)
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch	35,913	3,374	32,539	0 (0.00%)	0 (0.00%)
libraries.crossgen2.linux.arm64.checked.mch	265,227	17	265,210	0 (0.00%)	0 (0.00%)
	2,631,375	1,092,633	1,538,742	0 (0.00%)	0 (0.00%)

jit-analyze output

benchmarks.run.linux.arm64.checked.mch


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 16042184 (overridden on cmd)
Total bytes of diff: 16042192 (overridden on cmd)
Total bytes of delta: 8 (0.00 % of base)
    diff is a regression.
    relative diff is a regression.

Detail diffs



Top file regressions (bytes):
           8 : 21403.dasm (5.26 % of base)
           8 : 8287.dasm (2.11 % of base)

Top file improvements (bytes):
          -4 : 16114.dasm (-1.30 % of base)
          -4 : 26115.dasm (-1.05 % of base)

4 total files with Code Size differences (2 improved, 2 regressed), 0 unchanged.

Top method regressions (bytes):
           8 (2.11 % of base) : 8287.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
           8 (5.26 % of base) : 21403.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)

Top method improvements (bytes):
          -4 (-1.30 % of base) : 16114.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
          -4 (-1.05 % of base) : 26115.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

Top method regressions (percentages):
           8 (5.26 % of base) : 21403.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
           8 (2.11 % of base) : 8287.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)

Top method improvements (percentages):
          -4 (-1.30 % of base) : 16114.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
          -4 (-1.05 % of base) : 26115.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

4 total methods with Code Size differences (2 improved, 2 regressed).

coreclr_tests.run.linux.arm64.checked.mch


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 567688716 (overridden on cmd)
Total bytes of diff: 567354732 (overridden on cmd)
Total bytes of delta: -333984 (-0.06 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file regressions (bytes):
          16 : 567144.dasm (2.53 % of base)
          16 : 575424.dasm (2.53 % of base)
          16 : 575544.dasm (2.50 % of base)
          16 : 594152.dasm (2.50 % of base)
          16 : 575294.dasm (2.53 % of base)
          16 : 588776.dasm (2.50 % of base)
          16 : 569192.dasm (2.53 % of base)
          16 : 568047.dasm (2.53 % of base)
          16 : 568272.dasm (2.53 % of base)
          16 : 574096.dasm (2.42 % of base)
          16 : 567479.dasm (2.53 % of base)
          16 : 574200.dasm (2.42 % of base)
          16 : 574664.dasm (2.42 % of base)
          16 : 575358.dasm (2.53 % of base)
          16 : 567416.dasm (2.53 % of base)
          16 : 569088.dasm (2.53 % of base)
          16 : 575272.dasm (2.53 % of base)
          16 : 568247.dasm (2.53 % of base)
          16 : 568072.dasm (2.53 % of base)
          16 : 568472.dasm (2.53 % of base)

Top file improvements (bytes):
        -152 : 574655.dasm (-8.02 % of base)
        -152 : 574294.dasm (-8.02 % of base)
        -152 : 574605.dasm (-8.02 % of base)
        -152 : 574555.dasm (-8.02 % of base)
        -152 : 574138.dasm (-8.02 % of base)
        -152 : 574580.dasm (-8.02 % of base)
        -152 : 574268.dasm (-8.02 % of base)
        -152 : 574242.dasm (-8.02 % of base)
        -152 : 574630.dasm (-8.02 % of base)
        -152 : 574680.dasm (-8.02 % of base)
        -152 : 574164.dasm (-8.02 % of base)
        -152 : 574112.dasm (-8.02 % of base)
        -152 : 574530.dasm (-8.02 % of base)
        -152 : 574505.dasm (-8.02 % of base)
        -152 : 574190.dasm (-8.02 % of base)
        -152 : 574216.dasm (-8.02 % of base)
        -144 : 574842.dasm (-8.07 % of base)
        -144 : 574796.dasm (-8.07 % of base)
        -144 : 574819.dasm (-8.07 % of base)
        -144 : 574865.dasm (-8.07 % of base)

84 total files with Code Size differences (54 improved, 30 regressed), 20 unchanged.

Top method regressions (bytes):
          16 (2.50 % of base) : 594152.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_ConditionalExtractAfterLastActiveElement_byte:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 594408.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_ConditionalExtractAfterLastActiveElementAndReplicate_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 594792.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_ConditionalExtractLastActiveElement_byte:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 595048.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_ConditionalExtractLastActiveElementAndReplicate_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.42 % of base) : 574096.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_byte:RunBasicScenario_Load():this (FullOpts)
          16 (2.42 % of base) : 574200.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_sbyte:RunBasicScenario_Load():this (FullOpts)
          16 (2.42 % of base) : 574664.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575272.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575294.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575358.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575424.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 575544.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_FusedMultiplyAddNegated_float:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 588064.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_MultiplyAdd_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 588776.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_MultiplySubtract_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567119.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAdd_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567144.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAdd_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567311.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningLower_long_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567416.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningUpper_int_short:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567479.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningUpper_uint_ushort:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567847.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseClearXor_int:RunBasicScenario_Load():this (FullOpts)

Top method improvements (bytes):
        -152 (-8.02 % of base) : 574112.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_byte:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574268.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_int:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574294.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_long:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574216.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_sbyte:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574242.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_short:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574164.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_uint:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574190.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_ulong:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574138.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_ushort:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574505.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_byte:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574655.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_int:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574680.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_long:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574605.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_sbyte:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574630.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_short:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574555.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_uint:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574580.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_ulong:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574530.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_ushort:ConditionalSelect_ZeroOp():this (FullOpts)
        -144 (-8.07 % of base) : 574842.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_CreateBreakPropagateMask_int:ConditionalSelect_ZeroOp():this (FullOpts)
        -144 (-8.07 % of base) : 574865.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_CreateBreakPropagateMask_long:ConditionalSelect_ZeroOp():this (FullOpts)
        -144 (-8.07 % of base) : 574796.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_CreateBreakPropagateMask_sbyte:ConditionalSelect_ZeroOp():this (FullOpts)
        -144 (-8.07 % of base) : 574819.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_CreateBreakPropagateMask_short:ConditionalSelect_ZeroOp():this (FullOpts)

Top method regressions (percentages):
          16 (2.53 % of base) : 575272.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575294.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575358.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575424.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567119.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAdd_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567144.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAdd_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567311.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningLower_long_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567416.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningUpper_int_short:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567479.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningUpper_uint_ushort:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567847.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseClearXor_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567872.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseClearXor_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568047.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelect_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568072.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelect_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568247.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelectLeftInverted_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568272.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelectLeftInverted_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568447.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelectRightInverted_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568472.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelectRightInverted_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 569166.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_byte:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 569088.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_short:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 569192.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)

Top method improvements (percentages):
         -28 (-58.33 % of base) : 358603.dasm - PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
         -24 (-54.55 % of base) : 358606.dasm - PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
         -28 (-43.75 % of base) : 679474.dasm - Runtime_1068867:TestEntryPoint() (FullOpts)
         -20 (-41.67 % of base) : 358602.dasm - PredicateInstructions:And():System.Numerics.Vector`1[short] (FullOpts)
         -20 (-41.67 % of base) : 358605.dasm - PredicateInstructions:Or():System.Numerics.Vector`1[short] (FullOpts)
         -20 (-41.67 % of base) : 358604.dasm - PredicateInstructions:Xor():System.Numerics.Vector`1[int] (FullOpts)
         -48 (-33.33 % of base) : 679471.dasm - Runtime_106868:TestEntryPoint() (FullOpts)
         -28 (-23.33 % of base) : 642641.dasm - ChangeMaskUse:CastMaskUseAsMask() (FullOpts)
         -20 (-22.73 % of base) : 679365.dasm - Runtime_105720:TestEntryPoint() (FullOpts)
         -28 (-21.21 % of base) : 679478.dasm - Runtime_106872:TestEntryPoint() (FullOpts)
         -20 (-19.23 % of base) : 642640.dasm - ChangeMaskUse:CastMaskUseAsVector() (FullOpts)
         -16 (-17.39 % of base) : 349058.dasm - EmbeddedLoads:CndSelectEmbeddedOp3LoadAllBits(int[],System.Numerics.Vector`1[int]) (FullOpts)
         -16 (-17.39 % of base) : 349060.dasm - EmbeddedLoads:CndSelectEmbeddedOp3LoadZero(int[],System.Numerics.Vector`1[int]) (FullOpts)
         -20 (-17.24 % of base) : 679570.dasm - Runtime_113338:Test() (FullOpts)
          -4 (-16.67 % of base) : 524461.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskByte():System.Numerics.Vector`1[byte] (FullOpts)
          -4 (-16.67 % of base) : 106072.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskByte():System.Numerics.Vector`1[byte] (Tier0)
          -4 (-16.67 % of base) : 524463.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
          -4 (-16.67 % of base) : 106082.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (Tier0)
          -4 (-16.67 % of base) : 524465.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
          -4 (-16.67 % of base) : 106093.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (Tier0)

benchmarks.run_pgo.linux.arm64.checked.mch


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 71813300 (overridden on cmd)
Total bytes of diff: 71813280 (overridden on cmd)
Total bytes of delta: -20 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
          -4 : 58518.dasm (-0.85 % of base)
          -4 : 76632.dasm (-0.49 % of base)
          -4 : 39357.dasm (-0.43 % of base)
          -4 : 14743.dasm (-0.38 % of base)
          -4 : 24532.dasm (-0.58 % of base)

5 total files with Code Size differences (5 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
          -4 (-0.38 % of base) : 14743.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
          -4 (-0.43 % of base) : 39357.dasm - SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
          -4 (-0.58 % of base) : 24532.dasm - SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
          -4 (-0.49 % of base) : 76632.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
          -4 (-0.85 % of base) : 58518.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)

Top method improvements (percentages):
          -4 (-0.85 % of base) : 58518.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
          -4 (-0.58 % of base) : 24532.dasm - SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
          -4 (-0.49 % of base) : 76632.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
          -4 (-0.43 % of base) : 39357.dasm - SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
          -4 (-0.38 % of base) : 14743.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)

5 total methods with Code Size differences (5 improved, 0 regressed).

libraries.pmi.linux.arm64.checked.mch


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 68207660 (overridden on cmd)
Total bytes of diff: 68207620 (overridden on cmd)
Total bytes of delta: -40 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
          -4 : 11404.dasm (-16.67 % of base)
          -4 : 11402.dasm (-16.67 % of base)
          -4 : 11398.dasm (-16.67 % of base)
          -4 : 11400.dasm (-16.67 % of base)
          -4 : 11399.dasm (-16.67 % of base)
          -4 : 11403.dasm (-16.67 % of base)
          -4 : 11401.dasm (-16.67 % of base)
          -4 : 11405.dasm (-16.67 % of base)
          -4 : 11406.dasm (-16.67 % of base)
          -4 : 11407.dasm (-16.67 % of base)

10 total files with Code Size differences (10 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
          -4 (-16.67 % of base) : 11398.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskByte():System.Numerics.Vector`1[byte] (FullOpts)
          -4 (-16.67 % of base) : 11399.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
          -4 (-16.67 % of base) : 11400.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
          -4 (-16.67 % of base) : 11401.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
          -4 (-16.67 % of base) : 11402.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
          -4 (-16.67 % of base) : 11403.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
          -4 (-16.67 % of base) : 11404.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSingle():System.Numerics.Vector`1[float] (FullOpts)
          -4 (-16.67 % of base) : 11405.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt16():System.Numerics.Vector`1[ushort] (FullOpts)
          -4 (-16.67 % of base) : 11406.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt32():System.Numerics.Vector`1[uint] (FullOpts)
          -4 (-16.67 % of base) : 11407.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)

Top method improvements (percentages):
          -4 (-16.67 % of base) : 11398.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskByte():System.Numerics.Vector`1[byte] (FullOpts)
          -4 (-16.67 % of base) : 11399.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
          -4 (-16.67 % of base) : 11400.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
          -4 (-16.67 % of base) : 11401.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
          -4 (-16.67 % of base) : 11402.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
          -4 (-16.67 % of base) : 11403.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
          -4 (-16.67 % of base) : 11404.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSingle():System.Numerics.Vector`1[float] (FullOpts)
          -4 (-16.67 % of base) : 11405.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt16():System.Numerics.Vector`1[ushort] (FullOpts)
          -4 (-16.67 % of base) : 11406.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt32():System.Numerics.Vector`1[uint] (FullOpts)
          -4 (-16.67 % of base) : 11407.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)

10 total methods with Code Size differences (10 improved, 0 regressed).

benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 16695796 (overridden on cmd)
Total bytes of diff: 16695804 (overridden on cmd)
Total bytes of delta: 8 (0.00 % of base)
    diff is a regression.
    relative diff is a regression.

Detail diffs



Top file regressions (bytes):
           8 : 21539.dasm (5.26 % of base)
           8 : 6897.dasm (2.11 % of base)

Top file improvements (bytes):
          -4 : 26420.dasm (-1.05 % of base)
          -4 : 13109.dasm (-1.30 % of base)

4 total files with Code Size differences (2 improved, 2 regressed), 0 unchanged.

Top method regressions (bytes):
           8 (2.11 % of base) : 6897.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
           8 (5.26 % of base) : 21539.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)

Top method improvements (bytes):
          -4 (-1.30 % of base) : 13109.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
          -4 (-1.05 % of base) : 26420.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

Top method regressions (percentages):
           8 (5.26 % of base) : 21539.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
           8 (2.11 % of base) : 6897.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)

Top method improvements (percentages):
          -4 (-1.30 % of base) : 13109.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
          -4 (-1.05 % of base) : 26420.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

4 total methods with Code Size differences (2 improved, 2 regressed).

a74nh · 2025-06-16T12:24:34Z

After removing calls to fgMorphTryUseAllMaskVariant():

Diffs are based on 2,631,375 contexts (1,092,633 MinOpts, 1,538,742 FullOpts).

Overall (-334,028 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.linux.arm64.checked.mch	16,042,184	+8	+2.49%
coreclr_tests.run.linux.arm64.checked.mch	567,688,716	-333,984	-5.31%
benchmarks.run_pgo.linux.arm64.checked.mch	71,813,300	-20	-1.47%
libraries.pmi.linux.arm64.checked.mch	68,207,660	-40	-46.67%
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch	16,695,796	+8	+2.49%

MinOpts (-31,028 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
coreclr_tests.run.linux.arm64.checked.mch	383,578,920	-31,008	-2.93%
benchmarks.run_pgo.linux.arm64.checked.mch	25,146,552	-20	-1.47%

FullOpts (-303,000 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.linux.arm64.checked.mch	15,723,912	+8	+2.49%
coreclr_tests.run.linux.arm64.checked.mch	184,109,796	-302,976	-7.75%
libraries.pmi.linux.arm64.checked.mch	68,087,900	-40	-46.67%
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch	16,330,864	+8	+2.49%

Example diffs

benchmarks.run.linux.arm64.checked.mch

-4 (-1.30%) : 16114.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)

@@ -7,33 +7,33 @@
 ; No matching PGO data
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T09] (  5,  5   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
-;  V01 loc0         [V01,T05] (  3,  9   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
+;  V00 this         [V00,T08] (  5,  5   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
+;* V01 loc0         [V01,T22] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[byte]>
 ;* V02 loc1         [V02    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V03 loc2         [V03    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V04 loc3         [V04    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;  V05 loc4         [V05,T00] ( 12, 41.50)     int  ->   x1        
-;  V06 loc5         [V06,T13] (  3,  6   )     int  ->   x2         single-def
-;  V07 loc6         [V07,T17] (  3,  5   )    long  ->   x4        
-;  V08 loc7         [V08,T18] (  3,  5   )    long  ->   x6        
+;  V06 loc5         [V06,T12] (  3,  6   )     int  ->   x2         single-def
+;  V07 loc6         [V07,T16] (  3,  5   )    long  ->   x4        
+;  V08 loc7         [V08,T17] (  3,  5   )    long  ->   x6        
 ;  V09 loc8         [V09    ] (  1,  0.50)     ref  ->  [fp+0x18]   must-init pinned class-hnd single-def <byte[]>
 ;  V10 loc9         [V10    ] (  1,  0.50)     ref  ->  [fp+0x10]   must-init pinned class-hnd single-def <byte[]>
-;  V11 loc10        [V11,T08] (  2,  8   )   ubyte  ->   x8        
+;  V11 loc10        [V11,T07] (  2,  8   )   ubyte  ->   x8        
 ;# V12 OutArgs      [V12    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V13 tmp1         [V13,T15] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
-;  V14 tmp2         [V14,T16] (  5,  5   )     ref  ->   x6         class-hnd single-def "dup spill" <byte[]>
-;  V15 tmp3         [V15,T19] (  2,  2   )    long  ->   x4         "Cast away GC"
-;  V16 tmp4         [V16,T20] (  2,  2   )    long  ->   x6         "Cast away GC"
+;  V13 tmp1         [V13,T14] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
+;  V14 tmp2         [V14,T15] (  5,  5   )     ref  ->   x6         class-hnd single-def "dup spill" <byte[]>
+;  V15 tmp3         [V15,T18] (  2,  2   )    long  ->   x4         "Cast away GC"
+;  V16 tmp4         [V16,T19] (  2,  2   )    long  ->   x6         "Cast away GC"
 ;  V17 tmp5         [V17,T01] (  3, 24   )     ref  ->   x2         "arr expr"
 ;  V18 tmp6         [V18,T02] (  3, 24   )     ref  ->   x6         "arr expr"
-;* V19 tmp7         [V19,T21] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
-;* V20 tmp8         [V20,T22] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
-;  V21 cse0         [V21,T06] (  3,  8.50)     int  ->   x2         "CSE #11: aggressive"
-;  V22 cse1         [V22,T07] (  3,  8.50)     int  ->   x4         "CSE #14: aggressive"
-;  V23 cse2         [V23,T14] (  3,  6   )     int  ->   x7         "CSE #07: aggressive"
-;  V24 cse3         [V24,T12] (  4,  6.50)     int  ->   x0         "CSE #06: aggressive"
-;  V25 cse4         [V25,T10] (  4,  6.50)     ref  ->   x3         "CSE #01: aggressive"
-;  V26 cse5         [V26,T11] (  4,  6.50)     ref  ->   x5         "CSE #03: aggressive"
+;* V19 tmp7         [V19,T20] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
+;* V20 tmp8         [V20,T21] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
+;  V21 cse0         [V21,T05] (  3,  8.50)     int  ->   x2         "CSE #11: aggressive"
+;  V22 cse1         [V22,T06] (  3,  8.50)     int  ->   x4         "CSE #14: aggressive"
+;  V23 cse2         [V23,T13] (  3,  6   )     int  ->   x7         "CSE #07: aggressive"
+;  V24 cse3         [V24,T11] (  4,  6.50)     int  ->   x0         "CSE #06: aggressive"
+;  V25 cse4         [V25,T09] (  4,  6.50)     ref  ->   x3         "CSE #01: aggressive"
+;  V26 cse5         [V26,T10] (  4,  6.50)     ref  ->   x5         "CSE #03: aggressive"
 ;  V27 cse6         [V27,T03] (  3, 12   )    long  ->   x4         "CSE #08: aggressive"
 ;  V28 cse7         [V28,T04] (  3, 12   )    long  ->   x8         "CSE #05: aggressive"
 ;
@@ -46,7 +46,6 @@ G_M892_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
 						;; size=12 bbWeight=1 PerfScore 2.50
 G_M892_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ; gcrRegs +[x0]
-            ptrue   p0.b
             mov     w1, wzr
             cntb    x2, all
             ldr     x3, [x0, #0x10]
@@ -57,7 +56,7 @@ G_M892_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref,
             ldr     w6, [x5, #0x08]
             cmp     w4, w6
             bne     G_M892_IG11
-						;; size=36 bbWeight=1 PerfScore 18.00
+						;; size=32 bbWeight=1 PerfScore 16.00
 G_M892_IG03:        ; bbWeight=0.50, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {}, byref, isz
             mov     x4, x3
             ; gcrRegs +[x4]
@@ -99,14 +98,14 @@ G_M892_IG07:        ; bbWeight=1, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {},
 G_M892_IG08:        ; bbWeight=4, gcrefRegs=0028 {x3 x5}, byrefRegs=0000 {}, byref, isz
             sxtw    x8, w1
             add     x9, x4, x8
+            ptrue   p0.b
             ld1b    { z16.b }, p0/z, [x9]
             add     x8, x6, x8
             ld1b    { z17.b }, p0/z, [x8]
-            ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, z17.b
-            mov     z16.b, p1/z, #1
-            ptrue   p1.b
-            uaddv   d16, p1, z16.b
+            cmpne   p0.b, p0/z, z16.b, z17.b
+            mov     z16.b, p0/z, #1
+            ptrue   p0.b
+            uaddv   d16, p0, z16.b
             umov    x8, v16.d[0]
             uxtb    w8, w8
             cmp     w8, #0
@@ -169,7 +168,7 @@ G_M892_IG15:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 308, prolog size 12, PerfScore 259.00, instruction count 77, allocated bytes for code 308 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
+; Total bytes of code 304, prolog size 12, PerfScore 257.00, instruction count 76, allocated bytes for code 304 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -180,7 +179,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 77 (0x0004d) Actual length = 308 (0x000134)
+  Function Length   : 76 (0x0004c) Actual length = 304 (0x000130)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-1.05%) : 26115.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

@@ -21,14 +21,13 @@
 ;# V10 OutArgs      [V10    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V11 tmp1         [V11,T05] (  5,  8   )     ref  ->   x3         class-hnd single-def "dup spill" <char[]>
 ;* V12 tmp2         [V12    ] (  0,  0   )  ushort  ->  zero-ref    "Inlining Arg"
-;* V13 tmp3         [V13    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg" <System.Numerics.Vector`1[short]>
-;  V14 tmp4         [V14,T11] (  2,  2   )    long  ->   x3         "Cast away GC"
-;  V15 tmp5         [V15,T01] (  3, 24   )     ref  ->   x3         "arr expr"
-;  V16 cse0         [V16,T08] (  3,  6   )     int  ->   x4         "CSE #07: aggressive"
-;  V17 cse1         [V17,T03] (  5, 10.25)     int  ->   x0         "CSE #02: aggressive"
-;  V18 cse2         [V18,T07] (  3,  6   )     ref  ->   x2         "CSE #06: aggressive"
-;  V19 cse3         [V19,T04] (  4, 10   )     int  ->   x5         "CSE #05: aggressive"
-;  V20 cse4         [V20,T10] (  2,  4.25)    mask  ->   p0         hoist "CSE #03: aggressive"
+;  V13 tmp3         [V13,T11] (  2,  2   )    long  ->   x3         "Cast away GC"
+;  V14 tmp4         [V14,T01] (  3, 24   )     ref  ->   x3         "arr expr"
+;  V15 cse0         [V15,T08] (  3,  6   )     int  ->   x4         "CSE #07: aggressive"
+;  V16 cse1         [V16,T03] (  5, 10.25)     int  ->   x0         "CSE #02: aggressive"
+;  V17 cse2         [V17,T07] (  3,  6   )     ref  ->   x2         "CSE #06: aggressive"
+;  V18 cse3         [V18,T04] (  4, 10   )     int  ->   x5         "CSE #05: aggressive"
+;  V19 cse4         [V19,T10] (  2,  4.25)    mask  ->   p0         hoist "CSE #03: aggressive"
 ;
 ; Lcl frame size = 16
 
@@ -62,14 +61,13 @@ G_M34028_IG04:        ; bbWeight=0.50, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}
 G_M34028_IG05:        ; bbWeight=1, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}, byref, isz
             ldrh    w4, [x0, #0x14]
             dup     v16.8h, w4
-            ptrue   p0.h
-            mov     z17.h, p0/z, #1
+            mvni    v17.4s, #0
             ldr     w0, [x0, #0x10]
             ; gcrRegs -[x0]
             cnth    x5, all
             cmp     w0, w5
             ble     G_M34028_IG10
-						;; size=32 bbWeight=1 PerfScore 15.50
+						;; size=28 bbWeight=1 PerfScore 12.00
 G_M34028_IG06:        ; bbWeight=0.25, gcrefRegs=0004 {x2}, byrefRegs=0000 {}, byref
             ptrue   p0.h
             cmpne   p0.h, p0/z, z17.h, #0
@@ -177,7 +175,7 @@ G_M34028_IG18:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 380, prolog size 12, PerfScore 236.38, instruction count 95, allocated bytes for code 380 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
+; Total bytes of code 376, prolog size 12, PerfScore 232.88, instruction count 94, allocated bytes for code 376 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -188,7 +186,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 95 (0x0005f) Actual length = 380 (0x00017c)
+  Function Length   : 94 (0x0005e) Actual length = 376 (0x000178)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

+8 (+2.11%) : 8287.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)

@@ -8,31 +8,31 @@
 ; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T06] (  9,  7   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
+;  V00 this         [V00,T05] (  9,  7   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
 ;  V01 loc0         [V01,T02] (  6, 17.50)     int  ->   x1        
 ;  V02 loc1         [V02,T04] (  5, 10   )     int  ->   x2         single-def
-;  V03 loc2         [V03,T05] (  4, 10   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
-;  V04 loc3         [V04,T01] (  6, 18   )    mask  ->   p1         <System.Numerics.Vector`1[byte]>
+;* V03 loc2         [V03,T19] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[byte]>
+;  V04 loc3         [V04,T01] (  6, 18   )    mask  ->   p0         <System.Numerics.Vector`1[byte]>
 ;  V05 loc4         [V05,T20] (  4, 13   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
 ;* V06 loc5         [V06    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V07 loc6         [V07    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
-;  V08 loc7         [V08,T11] (  3,  5   )    long  ->   x4        
-;  V09 loc8         [V09,T12] (  3,  5   )    long  ->   x5        
+;  V08 loc7         [V08,T10] (  3,  5   )    long  ->   x4        
+;  V09 loc8         [V09,T11] (  3,  5   )    long  ->   x5        
 ;  V10 loc9         [V10    ] (  1,  0.50)     ref  ->  [fp+0x28]   must-init pinned class-hnd single-def <byte[]>
 ;  V11 loc10        [V11    ] (  1,  0.50)     ref  ->  [fp+0x20]   must-init pinned class-hnd single-def <byte[]>
-;  V12 loc11        [V12,T10] (  4,  5   )     int  ->   x3        
-;  V13 loc12        [V13,T18] (  3,  1.50)     int  ->   x3         single-def
+;  V12 loc11        [V12,T09] (  4,  5   )     int  ->   x3        
+;  V13 loc12        [V13,T17] (  3,  1.50)     int  ->   x3         single-def
 ;  V14 loc13        [V14,T00] (  7, 22.50)     int  ->   x4        
 ;# V15 OutArgs      [V15    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V16 tmp1         [V16,T08] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
-;  V17 tmp2         [V17,T09] (  5,  5   )     ref  ->   x5         class-hnd single-def "dup spill" <byte[]>
-;  V18 tmp3         [V18,T16] (  2,  2   )    long  ->   x4         "Cast away GC"
-;  V19 tmp4         [V19,T17] (  2,  2   )    long  ->   x5         "Cast away GC"
-;  V20 tmp5         [V20,T13] (  3,  3   )     ref  ->   x2         single-def "arr expr"
-;  V21 tmp6         [V21,T14] (  3,  3   )     ref  ->   x0         single-def "arr expr"
-;  V22 cse0         [V22,T07] (  3,  6   )     int  ->   x3         "CSE #05: aggressive"
-;  V23 cse1         [V23,T19] (  3,  1.50)    long  ->   x3         "CSE #08: moderate"
-;  V24 cse2         [V24,T15] (  4,  2   )     int  ->   x1         "CSE #07: moderate"
+;  V16 tmp1         [V16,T07] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
+;  V17 tmp2         [V17,T08] (  5,  5   )     ref  ->   x5         class-hnd single-def "dup spill" <byte[]>
+;  V18 tmp3         [V18,T15] (  2,  2   )    long  ->   x4         "Cast away GC"
+;  V19 tmp4         [V19,T16] (  2,  2   )    long  ->   x5         "Cast away GC"
+;  V20 tmp5         [V20,T12] (  3,  3   )     ref  ->   x2         single-def "arr expr"
+;  V21 tmp6         [V21,T13] (  3,  3   )     ref  ->   x0         single-def "arr expr"
+;  V22 cse0         [V22,T06] (  3,  6   )     int  ->   x3         "CSE #05: aggressive"
+;  V23 cse1         [V23,T18] (  3,  1.50)    long  ->   x3         "CSE #08: moderate"
+;  V24 cse2         [V24,T14] (  4,  2   )     int  ->   x1         "CSE #07: moderate"
 ;  V25 cse3         [V25,T03] (  3, 12   )    long  ->   x6         "CSE #06: aggressive"
 ;  V26 rat0         [V26,T21] (  3,  9   )  simd16  ->  [fp+0x10]   do-not-enreg[S] "SIMDInitTempVar"
 ;
@@ -47,10 +47,9 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
             ; gcrRegs +[x0]
             mov     w1, wzr
             cntb    x2, all
-            ptrue   p0.b
             ldr     w3, [x0, #0x20]
             mov     w4, wzr
-            whilelt p1.b, w4, w3
+            whilelt p0.b, w4, w3
             movi    v16.4s, #0
             ldr     x4, [x0, #0x10]
             ; gcrRegs +[x4]
@@ -62,7 +61,7 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
             ; gcrRegs -[x5]
             cmp     w4, w5
             bne     G_M14759_IG14
-						;; size=52 bbWeight=1 PerfScore 24.00
+						;; size=48 bbWeight=1 PerfScore 22.00
 G_M14759_IG03:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ldr     x4, [x0, #0x10]
             ; gcrRegs +[x4]
@@ -96,27 +95,30 @@ G_M14759_IG06:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, b
             mov     x5, xzr
 						;; size=4 bbWeight=0.50 PerfScore 0.25
 G_M14759_IG07:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
-            ptest   p0, p1.b
+            ptrue   p1.b
+            ptest   p1, p0.b
             bge     G_M14759_IG09
-						;; size=8 bbWeight=1 PerfScore 3.00
+						;; size=12 bbWeight=1 PerfScore 5.00
 G_M14759_IG08:        ; bbWeight=4, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             sxtw    x6, w1
             add     x7, x4, x6
-            ld1b    { z16.b }, p1/z, [x7]
+            ld1b    { z16.b }, p0/z, [x7]
             add     x6, x5, x6
-            ld1b    { z17.b }, p1/z, [x6]
+            ld1b    { z17.b }, p0/z, [x6]
+            ptrue   p0.b
+            cmpne   p0.b, p0/z, z16.b, z17.b
+            mov     z16.b, p0/z, #1
+            ptrue   p0.b
+            cmpne   p0.b, p0/z, z16.b, #0
             ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, z17.b
-            mov     z16.b, p1/z, #1
-            ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, #0
-            ptest   p0, p1.b
+            ptest   p1, p0.b
             bne     G_M14759_IG09
             add     w1, w1, w2
-            whilelt p1.b, w1, w3
-            ptest   p0, p1.b
+            whilelt p0.b, w1, w3
+            ptrue   p1.b
+            ptest   p1, p0.b
             blt     G_M14759_IG08
-						;; size=64 bbWeight=4 PerfScore 152.00
+						;; size=72 bbWeight=4 PerfScore 168.00
 G_M14759_IG09:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             mov     w3, wzr
             mov     w4, wzr
@@ -198,7 +200,7 @@ G_M14759_IG19:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 380, prolog size 12, PerfScore 248.50, instruction count 95, allocated bytes for code 380 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
+; Total bytes of code 388, prolog size 12, PerfScore 264.50, instruction count 97, allocated bytes for code 388 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -209,7 +211,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 95 (0x0005f) Actual length = 380 (0x00017c)
+  Function Length   : 97 (0x00061) Actual length = 388 (0x000184)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

+8 (+5.26%) : 21403.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)

@@ -8,19 +8,20 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T05] (  4,  4   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrLen>
-;  V01 loc0         [V01,T04] (  3,  7   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
+;  V01 loc0         [V01,T11] (  2,  3   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
 ;* V02 loc1         [V02    ] (  0,  0   )    mask  ->  zero-ref    <System.Numerics.Vector`1[byte]>
-;  V03 loc2         [V03,T10] (  5, 13   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
+;  V03 loc2         [V03,T10] (  5, 13   )  simd16  ->  d17         <System.Numerics.Vector`1[byte]>
 ;  V04 loc3         [V04,T00] (  6, 18   )    long  ->   x1        
 ;  V05 loc4         [V05,T07] (  2,  5   )    long  ->   x2         single-def
-;  V06 loc5         [V06,T01] (  5, 12   )    mask  ->   p1         <System.Numerics.Vector`1[byte]>
-;  V07 loc6         [V07,T03] (  4,  7   )    long  ->   x0        
+;  V06 loc5         [V06,T01] (  5, 12   )    mask  ->   p0         <System.Numerics.Vector`1[byte]>
+;  V07 loc6         [V07,T04] (  4,  7   )    long  ->   x0        
 ;  V08 loc7         [V08    ] (  1,  1   )     ref  ->  [fp+0x18]   must-init pinned class-hnd single-def <byte[]>
 ;# V09 OutArgs      [V09    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V10 tmp1         [V10,T02] (  5,  8   )     ref  ->   x0         class-hnd single-def "dup spill" <byte[]>
 ;  V11 tmp2         [V11,T08] (  2,  2   )    long  ->   x0         "Cast away GC"
 ;  V12 cse0         [V12,T06] (  3,  6   )     int  ->   x3         "CSE #02: aggressive"
-;  V13 cse1         [V13,T09] (  2,  1   )     int  ->   x4         "CSE #01: moderate"
+;  V13 cse1         [V13,T03] (  3,  8   )    mask  ->   p1         "CSE #03: aggressive"
+;  V14 cse2         [V14,T09] (  2,  1   )     int  ->   x4         "CSE #01: moderate"
 ;
 ; Lcl frame size = 16
 
@@ -31,16 +32,16 @@ G_M60402_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=12 bbWeight=1 PerfScore 2.50
 G_M60402_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ; gcrRegs +[x0]
-            ptrue   p0.b
+            mvni    v16.4s, #0
             mov     x1, xzr
             cntb    x2, all
             ldr     w3, [x0, #0x18]
             mov     w4, wzr
-            whilelt p1.b, w4, w3
+            whilelt p0.b, w4, w3
             ldr     x0, [x0, #0x08]
             str     x0, [fp, #0x18]	// [V08 loc7]
             cbz     x0, G_M60402_IG04
-						;; size=36 bbWeight=1 PerfScore 15.00
+						;; size=36 bbWeight=1 PerfScore 13.50
 G_M60402_IG03:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ldr     w4, [x0, #0x08]
             cbz     w4, G_M60402_IG04
@@ -54,28 +55,30 @@ G_M60402_IG04:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
             mov     x0, xzr
 						;; size=4 bbWeight=0.50 PerfScore 0.25
 G_M60402_IG05:        ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
-            ld1b    { z16.b }, p1/z, [x0]
-            movi    v17.4s, #0
+            ld1b    { z17.b }, p0/z, [x0]
+            ptrue   p1.b
+            cmpne   p1.b, p1/z, z16.b, #0
+            movi    v16.4s, #0
             ptrue   p2.b
-            cmpeq   p2.b, p2/z, z16.b, z17.b
-            ptest   p0, p2.b
+            cmpeq   p2.b, p2/z, z17.b, z16.b
+            ptest   p1, p2.b
             bne     G_M60402_IG07
-						;; size=24 bbWeight=2 PerfScore 33.00
+						;; size=32 bbWeight=2 PerfScore 43.00
 G_M60402_IG06:        ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             add     x1, x1, x2
-            whilelt p1.b, w1, w3
+            whilelt p0.b, w1, w3
             add     x4, x0, x1
-            ld1b    { z16.b }, p1/z, [x4]
-            movi    v17.4s, #0
+            ld1b    { z17.b }, p0/z, [x4]
+            movi    v16.4s, #0
             ptrue   p2.b
-            cmpeq   p2.b, p2/z, z16.b, z17.b
-            ptest   p0, p2.b
+            cmpeq   p2.b, p2/z, z17.b, z16.b
+            ptest   p1, p2.b
             beq     G_M60402_IG06
 						;; size=36 bbWeight=4 PerfScore 78.00
 G_M60402_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            ptrue   p0.b
-            cmpne   p0.b, p0/z, z16.b, #0
-            cntp    x0, p1, p0.b
+            ptrue   p1.b
+            cmpne   p1.b, p1/z, z17.b, #0
+            cntp    x0, p0, p1.b
             add     x0, x0, x1
 						;; size=16 bbWeight=1 PerfScore 7.50
 G_M60402_IG08:        ; bbWeight=1, epilog, nogc, extend
@@ -83,7 +86,7 @@ G_M60402_IG08:        ; bbWeight=1, epilog, nogc, extend
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 152, prolog size 12, PerfScore 141.00, instruction count 38, allocated bytes for code 152 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
+; Total bytes of code 160, prolog size 12, PerfScore 149.50, instruction count 40, allocated bytes for code 160 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -94,7 +97,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 38 (0x00026) Actual length = 152 (0x000098)
+  Function Length   : 40 (0x00028) Actual length = 160 (0x0000a0)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

coreclr_tests.run.linux.arm64.checked.mch

-28 (-58.33%) : 358603.dasm - PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)

@@ -17,22 +17,15 @@ G_M44742_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M44742_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            ptrue   p1.h
-            ptrue   p2.h
-            ptrue   p3.h
-            bic     p1.b, p3/z, p1.b, p2.b
-            pfalse  p2.b
-            sel     p0.b, p0, p1.b, p2.b
-            mov     z0.h, p0/z, #1
-						;; size=32 bbWeight=1 PerfScore 16.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M44742_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 ; END METHOD PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short]
 
-; Total bytes of code 48, prolog size 8, PerfScore 19.50, instruction count 12, allocated bytes for code 48 (MethodHash=71345139) for method PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=71345139) for method PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -43,7 +36,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 12 (0x0000c) Actual length = 48 (0x000030)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-24 (-54.55%) : 358606.dasm - PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)

@@ -17,21 +17,15 @@ G_M19455_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M19455_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            ptrue   p0.s
-            movi    v0.4s, #0
-            cmpne   p0.s, p0/z, z0.s, #0
-            pfalse  p1.b
-            ptrue   p2.s
-            sel     p0.b, p0, p1.b, p2.b
-            mov     z0.s, p0/z, #1
-						;; size=28 bbWeight=1 PerfScore 13.50
+            mvni    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M19455_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 ; END METHOD PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int]
 
-; Total bytes of code 44, prolog size 8, PerfScore 17.00, instruction count 11, allocated bytes for code 44 (MethodHash=0304b400) for method PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=0304b400) for method PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -42,7 +36,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 11 (0x0000b) Actual length = 44 (0x00002c)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-28 (-43.75%) : 679474.dasm - Runtime_1068867:TestEntryPoint() (FullOpts)

@@ -16,7 +16,6 @@
 ;* V05 tmp1         [V05    ] (  0,  0   )    long  ->  zero-ref    class-hnd exact "NewObj constructor temp" <C0>
 ;* V06 tmp2         [V06    ] (  0,  0   )  simd16  ->  zero-ref    "location for address-of(RValue)"
 ;* V07 tmp3         [V07    ] (  0,  0   )  struct (16) zero-ref    do-not-enreg[SF] "stack allocated C0" <C0>
-;  V08 cse0         [V08,T00] (  3,  3   )    mask  ->   p0         "CSE #01: aggressive"
 ;
 ; Lcl frame size = 0
 
@@ -24,28 +23,19 @@ G_M538_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
             stp     fp, lr, [sp, #-0x10]!
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
-G_M538_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
-            ptrue   p0.s
+G_M538_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             movi    v0.4s, #0
-            cmpne   p0.s, p0/z, z0.s, #0
-            movi    v0.4s, #0
-            ldr     q16, [@RWD00]
-            sel     z0.s, p0, z0.s, z16.s
-            movi    v16.4s, #0
-            sel     z0.s, p0, z0.s, z16.s
             movz    x0, #0xD1FFAB1E      // code for <unknown method>
             movk    x0, #0xD1FFAB1E LSL #16
             movk    x0, #0xD1FFAB1E LSL #32
             ldr     x0, [x0]
-						;; size=48 bbWeight=1 PerfScore 17.00
+						;; size=20 bbWeight=1 PerfScore 5.00
 G_M538_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             br      x0
 						;; size=8 bbWeight=1 PerfScore 2.00
-RWD00  	dq	0000000000000001h, 0000000000000000h
 
-
-; Total bytes of code 64, prolog size 8, PerfScore 20.50, instruction count 16, allocated bytes for code 64 (MethodHash=1c40fde5) for method Runtime_1068867:TestEntryPoint() (FullOpts)
+; Total bytes of code 36, prolog size 8, PerfScore 8.50, instruction count 9, allocated bytes for code 36 (MethodHash=1c40fde5) for method Runtime_1068867:TestEntryPoint() (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -56,7 +46,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 16 (0x00010) Actual length = 64 (0x000040)
+  Function Length   : 9 (0x00009) Actual length = 36 (0x000024)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

+16 (+2.53%) : 575424.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)

@@ -9,12 +9,12 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T02] (  4,  4   )     ref  ->  x19         this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong>
-;* V01 loc0         [V01,T30] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[ushort]>
-;  V02 loc1         [V02,T29] (  3,  3   )    mask  ->  [fp+0x10]   spill-single-def <System.Numerics.Vector`1[ushort]>
-;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[ulong]>
+;* V01 loc0         [V01,T34] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[ushort]>
+;  V02 loc1         [V02,T32] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[ushort]>
+;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->  d10         <System.Numerics.Vector`1[ulong]>
 ;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V05 tmp1         [V05,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
-;  V06 tmp2         [V06,T32] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V05 tmp1         [V05,T30] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V06 tmp2         [V06,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
 ;  V07 tmp3         [V07,T18] (  2,  4   )    long  ->  x21         "impAppendStmt"
 ;  V08 tmp4         [V08,T19] (  2,  4   )    long  ->  x22         "impAppendStmt"
 ;  V09 tmp5         [V09,T20] (  2,  4   )    long  ->  x23         "impAppendStmt"
@@ -51,21 +51,23 @@
 ;* V40 tmp36        [V40    ] (  0,  0   )    long  ->  zero-ref    ld-addr-op "Inline stloc first use temp"
 ;  V41 tmp37        [V41,T28] (  2,  4   )    long  ->   x0         "Inlining Arg"
 ;  V42 tmp38        [V42,T17] (  3,  6   )    long  ->   x4         "Inlining Arg"
-;  V43 cse0         [V43,T00] (  9,  9   )   byref  ->  x20         "CSE #02: aggressive"
+;  V43 cse0         [V43,T29] (  3,  3   )    mask  ->  [fp+0x18]   spill-single-def "CSE #02: moderate"
+;  V44 cse1         [V44,T00] (  9,  9   )   byref  ->  x20         "CSE #01: aggressive"
 ;
-; Lcl frame size = 8
+; Lcl frame size = 16
 
 G_M33034_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
-            stp     fp, lr, [sp, #-0x60]!
-            stp     d8, d9, [sp, #0x18]
-            stp     d10, d11, [sp, #0x28]
-            stp     x19, x20, [sp, #0x38]
-            stp     x21, x22, [sp, #0x48]
-            str     x23, [sp, #0x58]
+            stp     fp, lr, [sp, #-0x70]!
+            stp     d8, d9, [sp, #0x20]
+            stp     d10, d11, [sp, #0x30]
+            str     d12, [sp, #0x40]
+            stp     x19, x20, [sp, #0x48]
+            stp     x21, x22, [sp, #0x58]
+            str     x23, [sp, #0x68]
             mov     fp, sp
             mov     x19, x0
             ; gcrRegs +[x19]
-						;; size=32 bbWeight=1 PerfScore 7.00
+						;; size=36 bbWeight=1 PerfScore 8.00
 G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             ldr     x1, [x1]
             blr     x1
             ; gcrRegs -[x0]
-            ptrue   p0.h
-            add     xip1, fp, #16
-            str     p0, [xip1]
+            mvni    v8.4s, #0
             add     x20, x19, #96
             ; byrRegs +[x20]
             mov     x21, x20
@@ -99,22 +99,6 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            blr     x1
-            ; byrRegs -[x0]
-            ldr     x1, [x21, #0x20]
-            add     x0, x0, x1
-            sub     x0, x0, #1
-            sub     x1, x1, #1
-            bic     x0, x0, x1
-            ptrue   p0.d
-            ld1d    { z8.d }, p0/z, [x0]
-            mov     x21, x20
-            add     x0, x21, #48
-            ; byrRegs +[x0]
-            movz    x1, #0xD1FFAB1E      // code for <unknown method>
-            movk    x1, #0xD1FFAB1E LSL #16
-            movk    x1, #0xD1FFAB1E LSL #32
-            ldr     x1, [x1]
             mov     v9.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1h    { z10.h }, p0/z, [x0]
+            ptrue   p0.d
+            ld1d    { z10.d }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #56
+            add     x0, x21, #48
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1h    { z7.h }, p0/z, [x0]
+            ptrue   p0.h
             mov     v8.d[1], v9.d[0]
-            mov     v10.d[1], v11.d[0]
-            udot    z8.d, z10.h, z7.h[1]
+            cmpne   p0.h, p0/z, z8.h, #0
+            add     xip1, fp, #24
+            str     p0, [xip1]
+            ld1h    { z8.h }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #64
+            add     x0, x21, #56
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            mov     v9.d[0], v8.d[1]
+            mov     v12.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
             ldr     x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            mov     v8.d[1], v9.d[0]
-            str     q8, [x0]
+            add     xip1, fp, #24
+            ldr     p0, [xip1]
+            ld1h    { z7.h }, p0/z, [x0]
+            mov     v10.d[1], v11.d[0]
+            mov     v8.d[1], v12.d[0]
+            udot    z10.d, z8.h, z7.h[1]
+            mov     x21, x20
+            add     x0, x21, #64
+            ; byrRegs +[x0]
+            movz    x1, #0xD1FFAB1E      // code for <unknown method>
+            movk    x1, #0xD1FFAB1E LSL #16
+            movk    x1, #0xD1FFAB1E LSL #32
+            ldr     x1, [x1]
+            mov     v8.d[0], v10.d[1]
+            blr     x1
+            ; byrRegs -[x0]
+            ldr     x1, [x21, #0x20]
+            add     x0, x0, x1
+            sub     x0, x0, #1
+            sub     x1, x1, #1
+            bic     x0, x0, x1
+            mov     v10.d[1], v8.d[0]
+            str     q10, [x0]
             mov     x21, x20
             add     x0, x21, #40
             ; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x6, #0xD1FFAB1E LSL #16
             movk    x6, #0xD1FFAB1E LSL #32
             ldr     x6, [x6]
-						;; size=572 bbWeight=1 PerfScore 168.50
+						;; size=580 bbWeight=1 PerfScore 168.50
 G_M33034_IG03:        ; bbWeight=1, epilog, nogc, extend
-            ldr     x23, [sp, #0x58]
-            ldp     x21, x22, [sp, #0x48]
-            ldp     x19, x20, [sp, #0x38]
-            ldp     d10, d11, [sp, #0x28]
-            ldp     d8, d9, [sp, #0x18]
-            ldp     fp, lr, [sp], #0x60
+            ldr     x23, [sp, #0x68]
+            ldp     x21, x22, [sp, #0x58]
+            ldp     x19, x20, [sp, #0x48]
+            ldr     d12, [sp, #0x40]
+            ldp     d10, d11, [sp, #0x30]
+            ldp     d8, d9, [sp, #0x20]
+            ldp     fp, lr, [sp], #0x70
             br      x6
-						;; size=28 bbWeight=1 PerfScore 8.00
+						;; size=32 bbWeight=1 PerfScore 10.00
 
-; Total bytes of code 632, prolog size 28, PerfScore 183.50, instruction count 158, allocated bytes for code 632 (MethodHash=1d6a7ef5) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 186.50, instruction count 162, allocated bytes for code 648 (MethodHash=1d6a7ef5) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
 ; ============================================================
 
 Unwind Info:
   >> Start offset   : 0x000000 (not in unwind data)
   >>   End offset   : 0xd1ffab1e (not in unwind data)
-  Code Words        : 3
+  Code Words        : 4
   Epilog Count      : 1
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 158 (0x0009e) Actual length = 632 (0x000278)
+  Function Length   : 162 (0x000a2) Actual length = 648 (0x000288)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
   ---- Unwind codes ----
     E1          set_fp; mov fp, sp
     ---- Epilog start at index 1 ----
-    D1 0B       save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+    D1 0D       save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
     E6          save_next
-    C8 07       save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+    C8 09       save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+    DD 08       save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
     E6          save_next
-    D8 03       save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
-    8B          save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+    D8 04       save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+    8D          save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+    E4          end
+    E4          end
     E4          end
     E4          end

+16 (+2.53%) : 575272.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)

@@ -9,12 +9,12 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T02] (  4,  4   )     ref  ->  x19         this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int>
-;* V01 loc0         [V01,T30] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[sbyte]>
-;  V02 loc1         [V02,T29] (  3,  3   )    mask  ->  [fp+0x10]   spill-single-def <System.Numerics.Vector`1[sbyte]>
-;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[int]>
+;* V01 loc0         [V01,T34] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[sbyte]>
+;  V02 loc1         [V02,T32] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[sbyte]>
+;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->  d10         <System.Numerics.Vector`1[int]>
 ;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V05 tmp1         [V05,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
-;  V06 tmp2         [V06,T32] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V05 tmp1         [V05,T30] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V06 tmp2         [V06,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
 ;  V07 tmp3         [V07,T18] (  2,  4   )    long  ->  x21         "impAppendStmt"
 ;  V08 tmp4         [V08,T19] (  2,  4   )    long  ->  x22         "impAppendStmt"
 ;  V09 tmp5         [V09,T20] (  2,  4   )    long  ->  x23         "impAppendStmt"
@@ -51,21 +51,23 @@
 ;* V40 tmp36        [V40    ] (  0,  0   )    long  ->  zero-ref    ld-addr-op "Inline stloc first use temp"
 ;  V41 tmp37        [V41,T28] (  2,  4   )    long  ->   x0         "Inlining Arg"
 ;  V42 tmp38        [V42,T17] (  3,  6   )    long  ->   x4         "Inlining Arg"
-;  V43 cse0         [V43,T00] (  9,  9   )   byref  ->  x20         "CSE #02: aggressive"
+;  V43 cse0         [V43,T29] (  3,  3   )    mask  ->  [fp+0x18]   spill-single-def "CSE #02: moderate"
+;  V44 cse1         [V44,T00] (  9,  9   )   byref  ->  x20         "CSE #01: aggressive"
 ;
-; Lcl frame size = 8
+; Lcl frame size = 16
 
 G_M55930_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
-            stp     fp, lr, [sp, #-0x60]!
-            stp     d8, d9, [sp, #0x18]
-            stp     d10, d11, [sp, #0x28]
-            stp     x19, x20, [sp, #0x38]
-            stp     x21, x22, [sp, #0x48]
-            str     x23, [sp, #0x58]
+            stp     fp, lr, [sp, #-0x70]!
+            stp     d8, d9, [sp, #0x20]
+            stp     d10, d11, [sp, #0x30]
+            str     d12, [sp, #0x40]
+            stp     x19, x20, [sp, #0x48]
+            stp     x21, x22, [sp, #0x58]
+            str     x23, [sp, #0x68]
             mov     fp, sp
             mov     x19, x0
             ; gcrRegs +[x19]
-						;; size=32 bbWeight=1 PerfScore 7.00
+						;; size=36 bbWeight=1 PerfScore 8.00
 G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             ldr     x1, [x1]
             blr     x1
             ; gcrRegs -[x0]
-            ptrue   p0.b
-            add     xip1, fp, #16
-            str     p0, [xip1]
+            mvni    v8.4s, #0
             add     x20, x19, #96
             ; byrRegs +[x20]
             mov     x21, x20
@@ -99,22 +99,6 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            blr     x1
-            ; byrRegs -[x0]
-            ldr     x1, [x21, #0x20]
-            add     x0, x0, x1
-            sub     x0, x0, #1
-            sub     x1, x1, #1
-            bic     x0, x0, x1
-            ptrue   p0.s
-            ld1w    { z8.s }, p0/z, [x0]
-            mov     x21, x20
-            add     x0, x21, #48
-            ; byrRegs +[x0]
-            movz    x1, #0xD1FFAB1E      // code for <unknown method>
-            movk    x1, #0xD1FFAB1E LSL #16
-            movk    x1, #0xD1FFAB1E LSL #32
-            ldr     x1, [x1]
             mov     v9.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1b    { z10.b }, p0/z, [x0]
+            ptrue   p0.s
+            ld1w    { z10.s }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #56
+            add     x0, x21, #48
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1b    { z16.b }, p0/z, [x0]
+            ptrue   p0.b
             mov     v8.d[1], v9.d[0]
-            mov     v10.d[1], v11.d[0]
-            sdot    z8.s, z10.b, z16.b
+            cmpne   p0.b, p0/z, z8.b, #0
+            add     xip1, fp, #24
+            str     p0, [xip1]
+            ld1b    { z8.b }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #64
+            add     x0, x21, #56
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            mov     v9.d[0], v8.d[1]
+            mov     v12.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
             ldr     x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            mov     v8.d[1], v9.d[0]
-            str     q8, [x0]
+            add     xip1, fp, #24
+            ldr     p0, [xip1]
+            ld1b    { z16.b }, p0/z, [x0]
+            mov     v10.d[1], v11.d[0]
+            mov     v8.d[1], v12.d[0]
+            sdot    z10.s, z8.b, z16.b
+            mov     x21, x20
+            add     x0, x21, #64
+            ; byrRegs +[x0]
+            movz    x1, #0xD1FFAB1E      // code for <unknown method>
+            movk    x1, #0xD1FFAB1E LSL #16
+            movk    x1, #0xD1FFAB1E LSL #32
+            ldr     x1, [x1]
+            mov     v8.d[0], v10.d[1]
+            blr     x1
+            ; byrRegs -[x0]
+            ldr     x1, [x21, #0x20]
+            add     x0, x0, x1
+            sub     x0, x0, #1
+            sub     x1, x1, #1
+            bic     x0, x0, x1
+            mov     v10.d[1], v8.d[0]
+            str     q10, [x0]
             mov     x21, x20
             add     x0, x21, #40
             ; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x6, #0xD1FFAB1E LSL #16
             movk    x6, #0xD1FFAB1E LSL #32
             ldr     x6, [x6]
-						;; size=572 bbWeight=1 PerfScore 168.50
+						;; size=580 bbWeight=1 PerfScore 168.50
 G_M55930_IG03:        ; bbWeight=1, epilog, nogc, extend
-            ldr     x23, [sp, #0x58]
-            ldp     x21, x22, [sp, #0x48]
-            ldp     x19, x20, [sp, #0x38]
-            ldp     d10, d11, [sp, #0x28]
-            ldp     d8, d9, [sp, #0x18]
-            ldp     fp, lr, [sp], #0x60
+            ldr     x23, [sp, #0x68]
+            ldp     x21, x22, [sp, #0x58]
+            ldp     x19, x20, [sp, #0x48]
+            ldr     d12, [sp, #0x40]
+            ldp     d10, d11, [sp, #0x30]
+            ldp     d8, d9, [sp, #0x20]
+            ldp     fp, lr, [sp], #0x70
             br      x6
-						;; size=28 bbWeight=1 PerfScore 8.00
+						;; size=32 bbWeight=1 PerfScore 10.00
 
-; Total bytes of code 632, prolog size 28, PerfScore 183.50, instruction count 158, allocated bytes for code 632 (MethodHash=b01a2585) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 186.50, instruction count 162, allocated bytes for code 648 (MethodHash=b01a2585) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
 ; ============================================================
 
 Unwind Info:
   >> Start offset   : 0x000000 (not in unwind data)
   >>   End offset   : 0xd1ffab1e (not in unwind data)
-  Code Words        : 3
+  Code Words        : 4
   Epilog Count      : 1
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 158 (0x0009e) Actual length = 632 (0x000278)
+  Function Length   : 162 (0x000a2) Actual length = 648 (0x000288)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
   ---- Unwind codes ----
     E1          set_fp; mov fp, sp
     ---- Epilog start at index 1 ----
-    D1 0B       save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+    D1 0D       save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
     E6          save_next
-    C8 07       save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+    C8 09       save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+    DD 08       save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
     E6          save_next
-    D8 03       save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
-    8B          save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+    D8 04       save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+    8D          save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+    E4          end
+    E4          end
     E4          end
     E4          end

+16 (+2.53%) : 569192.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)

@@ -9,12 +9,12 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T02] (  4,  4   )     ref  ->  x19         this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort>
-;* V01 loc0         [V01,T30] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[ushort]>
-;  V02 loc1         [V02,T29] (  3,  3   )    mask  ->  [fp+0x10]   spill-single-def <System.Numerics.Vector`1[ushort]>
-;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[ushort]>
+;* V01 loc0         [V01,T34] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[ushort]>
+;  V02 loc1         [V02,T32] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[ushort]>
+;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->  d10         <System.Numerics.Vector`1[ushort]>
 ;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V05 tmp1         [V05,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
-;  V06 tmp2         [V06,T32] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V05 tmp1         [V05,T30] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V06 tmp2         [V06,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
 ;  V07 tmp3         [V07,T18] (  2,  4   )    long  ->  x21         "impAppendStmt"
 ;  V08 tmp4         [V08,T19] (  2,  4   )    long  ->  x22         "impAppendStmt"
 ;  V09 tmp5         [V09,T20] (  2,  4   )    long  ->  x23         "impAppendStmt"
@@ -51,21 +51,23 @@
 ;* V40 tmp36        [V40    ] (  0,  0   )    long  ->  zero-ref    ld-addr-op "Inline stloc first use temp"
 ;  V41 tmp37        [V41,T28] (  2,  4   )    long  ->   x0         "Inlining Arg"
 ;  V42 tmp38        [V42,T17] (  3,  6   )    long  ->   x4         "Inlining Arg"
-;  V43 cse0         [V43,T00] (  9,  9   )   byref  ->  x20         "CSE #02: aggressive"
+;  V43 cse0         [V43,T29] (  3,  3   )    mask  ->  [fp+0x18]   spill-single-def "CSE #02: moderate"
+;  V44 cse1         [V44,T00] (  9,  9   )   byref  ->  x20         "CSE #01: aggressive"
 ;
-; Lcl frame size = 8
+; Lcl frame size = 16
 
 G_M13407_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
-            stp     fp, lr, [sp, #-0x60]!
-            stp     d8, d9, [sp, #0x18]
-            stp     d10, d11, [sp, #0x28]
-            stp     x19, x20, [sp, #0x38]
-            stp     x21, x22, [sp, #0x48]
-            str     x23, [sp, #0x58]
+            stp     fp, lr, [sp, #-0x70]!
+            stp     d8, d9, [sp, #0x20]
+            stp     d10, d11, [sp, #0x30]
+            str     d12, [sp, #0x40]
+            stp     x19, x20, [sp, #0x48]
+            stp     x21, x22, [sp, #0x58]
+            str     x23, [sp, #0x68]
             mov     fp, sp
             mov     x19, x0
             ; gcrRegs +[x19]
-						;; size=32 bbWeight=1 PerfScore 7.00
+						;; size=36 bbWeight=1 PerfScore 8.00
 G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             ldr     x1, [x1]
             blr     x1
             ; gcrRegs -[x0]
-            ptrue   p0.h
-            add     xip1, fp, #16
-            str     p0, [xip1]
+            mvni    v8.4s, #0
             add     x20, x19, #96
             ; byrRegs +[x20]
             mov     x21, x20
@@ -99,22 +99,6 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            blr     x1
-            ; byrRegs -[x0]
-            ldr     x1, [x21, #0x20]
-            add     x0, x0, x1
-            sub     x0, x0, #1
-            sub     x1, x1, #1
-            bic     x0, x0, x1
-            ptrue   p0.h
-            ld1h    { z8.h }, p0/z, [x0]
-            mov     x21, x20
-            add     x0, x21, #48
-            ; byrRegs +[x0]
-            movz    x1, #0xD1FFAB1E      // code for <unknown method>
-            movk    x1, #0xD1FFAB1E LSL #16
-            movk    x1, #0xD1FFAB1E LSL #32
-            ldr     x1, [x1]
             mov     v9.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
+            ptrue   p0.h
             ld1h    { z10.h }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #56
+            add     x0, x21, #48
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1h    { z16.h }, p0/z, [x0]
+            ptrue   p0.h
             mov     v8.d[1], v9.d[0]
-            mov     v10.d[1], v11.d[0]
-            eor3    z8.d, z8.d, z10.d, z16.d
+            cmpne   p0.h, p0/z, z8.h, #0
+            add     xip1, fp, #24
+            str     p0, [xip1]
+            ld1h    { z8.h }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #64
+            add     x0, x21, #56
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            mov     v9.d[0], v8.d[1]
+            mov     v12.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
             ldr     x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            mov     v8.d[1], v9.d[0]
-            str     q8, [x0]
+            add     xip1, fp, #24
+            ldr     p0, [xip1]
+            ld1h    { z16.h }, p0/z, [x0]
+            mov     v10.d[1], v11.d[0]
+            mov     v8.d[1], v12.d[0]
+            eor3    z10.d, z10.d, z8.d, z16.d
+            mov     x21, x20
+            add     x0, x21, #64
+            ; byrRegs +[x0]
+            movz    x1, #0xD1FFAB1E      // code for <unknown method>
+            movk    x1, #0xD1FFAB1E LSL #16
+            movk    x1, #0xD1FFAB1E LSL #32
+            ldr     x1, [x1]
+            mov     v8.d[0], v10.d[1]
+            blr     x1
+            ; byrRegs -[x0]
+            ldr     x1, [x21, #0x20]
+            add     x0, x0, x1
+            sub     x0, x0, #1
+            sub     x1, x1, #1
+            bic     x0, x0, x1
+            mov     v10.d[1], v8.d[0]
+            str     q10, [x0]
             mov     x21, x20
             add     x0, x21, #40
             ; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x6, #0xD1FFAB1E LSL #16
             movk    x6, #0xD1FFAB1E LSL #32
             ldr     x6, [x6]
-						;; size=572 bbWeight=1 PerfScore 166.50
+						;; size=580 bbWeight=1 PerfScore 166.50
 G_M13407_IG03:        ; bbWeight=1, epilog, nogc, extend
-            ldr     x23, [sp, #0x58]
-            ldp     x21, x22, [sp, #0x48]
-            ldp     x19, x20, [sp, #0x38]
-            ldp     d10, d11, [sp, #0x28]
-            ldp     d8, d9, [sp, #0x18]
-            ldp     fp, lr, [sp], #0x60
+            ldr     x23, [sp, #0x68]
+            ldp     x21, x22, [sp, #0x58]
+            ldp     x19, x20, [sp, #0x48]
+            ldr     d12, [sp, #0x40]
+            ldp     d10, d11, [sp, #0x30]
+            ldp     d8, d9, [sp, #0x20]
+            ldp     fp, lr, [sp], #0x70
             br      x6
-						;; size=28 bbWeight=1 PerfScore 8.00
+						;; size=32 bbWeight=1 PerfScore 10.00
 
-; Total bytes of code 632, prolog size 28, PerfScore 181.50, instruction count 158, allocated bytes for code 632 (MethodHash=f1c3cba0) for method JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 184.50, instruction count 162, allocated bytes for code 648 (MethodHash=f1c3cba0) for method JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)
 ; ============================================================
 
 Unwind Info:
   >> Start offset   : 0x000000 (not in unwind data)
   >>   End offset   : 0xd1ffab1e (not in unwind data)
-  Code Words        : 3
+  Code Words        : 4
   Epilog Count      : 1
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 158 (0x0009e) Actual length = 632 (0x000278)
+  Function Length   : 162 (0x000a2) Actual length = 648 (0x000288)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
   ---- Unwind codes ----
     E1          set_fp; mov fp, sp
     ---- Epilog start at index 1 ----
-    D1 0B       save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+    D1 0D       save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
     E6          save_next
-    C8 07       save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+    C8 09       save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+    DD 08       save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
     E6          save_next
-    D8 03       save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
-    8B          save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+    D8 04       save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+    8D          save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+    E4          end
+    E4          end
     E4          end
     E4          end

benchmarks.run_pgo.linux.arm64.checked.mch

-4 (-0.85%) : 58518.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)

@@ -36,8 +36,7 @@ G_M60402_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 G_M60402_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             mov     w0, #0xD1FFAB1E
             str     w0, [fp, #0x20]	// [V11 tmp2]
-            ptrue   p0.b
-            mov     z16.b, p0/z, #1
+            mvni    v16.4s, #0
             str     q16, [fp, #0x80]	// [V01 loc0]
             str     xzr, [fp, #0x58]	// [V04 loc3]
             cntb    x0, all
@@ -62,7 +61,7 @@ G_M60402_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             ldr     w0, [x0, #0x08]
             ; gcrRegs -[x0]
             cbnz    w0, G_M60402_IG05
-						;; size=96 bbWeight=1 PerfScore 40.50
+						;; size=92 bbWeight=1 PerfScore 37.00
 G_M60402_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -179,7 +178,7 @@ G_M60402_IG11:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 472, prolog size 36, PerfScore 168.27, instruction count 118, allocated bytes for code 472 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
+; Total bytes of code 468, prolog size 36, PerfScore 164.77, instruction count 117, allocated bytes for code 468 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -190,7 +189,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 118 (0x00076) Actual length = 472 (0x0001d8)
+  Function Length   : 117 (0x00075) Actual length = 468 (0x0001d4)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-0.58%) : 24532.dasm - SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)

@@ -229,8 +229,7 @@ G_M22667_IG17:        ; bbWeight=0.01, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
 						;; size=12 bbWeight=0.01 PerfScore 0.02
 G_M22667_IG18:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             ptrue   p0.h
-            mov     z16.h, p0/z, #1
-            ptrue   p0.h
+            mvni    v16.4s, #0
             cmpne   p0.h, p0/z, z16.h, #0
             ptrue   p1.h
             ldr     q16, [fp, #0x50]	// [V05 loc4]
@@ -249,7 +248,7 @@ G_M22667_IG18:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             bl      CORINFO_HELP_COUNTPROFILE32
             ; gcr arg pop 0
             movn    w0, #0
-						;; size=76 bbWeight=1 PerfScore 25.50
+						;; size=72 bbWeight=1 PerfScore 22.00
 G_M22667_IG19:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x90
             ret     lr
@@ -265,7 +264,7 @@ G_M22667_IG21:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 688, prolog size 36, PerfScore 211.04, instruction count 172, allocated bytes for code 688 (MethodHash=8b05a774) for method SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
+; Total bytes of code 684, prolog size 36, PerfScore 207.54, instruction count 171, allocated bytes for code 684 (MethodHash=8b05a774) for method SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -276,7 +275,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 172 (0x000ac) Actual length = 688 (0x0002b0)
+  Function Length   : 171 (0x000ab) Actual length = 684 (0x0002ac)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-0.49%) : 76632.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)

@@ -95,8 +95,7 @@ G_M34028_IG06:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ldr     x2, [x2]
             blr     x2
             ; gcr arg pop 0
-            ptrue   p0.h
-            mov     z0.h, p0/z, #1
+            mvni    v0.4s, #0
             movz    x0, #0xD1FFAB1E      // code for <unknown method>
             movk    x0, #0xD1FFAB1E LSL #16
             movk    x0, #0xD1FFAB1E LSL #32
@@ -105,7 +104,7 @@ G_M34028_IG06:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ; gcr arg pop 0
             str     q0, [fp, #0x50]	// [V05 loc4]
             b       G_M34028_IG16
-						;; size=68 bbWeight=1 PerfScore 22.50
+						;; size=64 bbWeight=1 PerfScore 19.00
 G_M34028_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             ldr     w0, [fp, #0x84]	// [V01 loc0]
             sxtw    x0, w0
@@ -317,7 +316,7 @@ G_M34028_IG27:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 812, prolog size 36, PerfScore 237.56, instruction count 203, allocated bytes for code 812 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
+; Total bytes of code 808, prolog size 36, PerfScore 234.06, instruction count 202, allocated bytes for code 808 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -328,7 +327,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 203 (0x000cb) Actual length = 812 (0x00032c)
+  Function Length   : 202 (0x000ca) Actual length = 808 (0x000328)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-0.38%) : 14743.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)

@@ -49,8 +49,7 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             str     wzr, [fp, #0xC4]	// [V01 loc0]
             cntb    x0, all
             str     w0, [fp, #0xC0]	// [V02 loc1]
-            ptrue   p0.b
-            mov     z16.b, p0/z, #1
+            mvni    v16.4s, #0
             str     q16, [fp, #0xB0]	// [V03 loc2]
             ldr     x0, [fp, #0xC8]	// [V00 this]
             ; gcrRegs +[x0]
@@ -86,7 +85,7 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             ldr     w0, [x0, #0x08]
             ; gcrRegs -[x0]
             cbnz    w0, G_M14759_IG05
-						;; size=136 bbWeight=1 PerfScore 59.50
+						;; size=132 bbWeight=1 PerfScore 56.00
 G_M14759_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -394,7 +393,7 @@ G_M14759_IG27:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 1052, prolog size 44, PerfScore 348.29, instruction count 263, allocated bytes for code 1052 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
+; Total bytes of code 1048, prolog size 44, PerfScore 344.79, instruction count 262, allocated bytes for code 1048 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -405,7 +404,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 263 (0x00107) Actual length = 1052 (0x00041c)
+  Function Length   : 262 (0x00106) Actual length = 1048 (0x000418)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-0.43%) : 39357.dasm - SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)

@@ -41,8 +41,7 @@ G_M892_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
 G_M892_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             mov     w0, #0xD1FFAB1E
             str     w0, [fp, #0x28]	// [V15 tmp3]
-            ptrue   p0.b
-            mov     z16.b, p0/z, #1
+            mvni    v16.4s, #0
             str     q16, [fp, #0xA0]	// [V01 loc0]
             str     wzr, [fp, #0x6C]	// [V05 loc4]
             cntb    x0, all
@@ -71,7 +70,7 @@ G_M892_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, i
             ldr     w0, [x0, #0x08]
             ; gcrRegs -[x0]
             cbnz    w0, G_M892_IG05
-						;; size=104 bbWeight=1 PerfScore 46.00
+						;; size=100 bbWeight=1 PerfScore 42.50
 G_M892_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -357,7 +356,7 @@ G_M892_IG24:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 928, prolog size 36, PerfScore 307.29, instruction count 232, allocated bytes for code 928 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
+; Total bytes of code 924, prolog size 36, PerfScore 303.79, instruction count 231, allocated bytes for code 924 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -368,7 +367,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 232 (0x000e8) Actual length = 928 (0x0003a0)
+  Function Length   : 231 (0x000e7) Actual length = 924 (0x00039c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

libraries.pmi.linux.arm64.checked.mch

-4 (-16.67%) : 11401.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)

@@ -16,15 +16,14 @@ G_M40111_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M40111_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.s, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M40111_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=96116350) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=96116350) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-16.67%) : 11402.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)

@@ -16,15 +16,14 @@ G_M56373_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M56373_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.d, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M56373_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=c46823ca) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=c46823ca) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-16.67%) : 11403.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)

@@ -16,15 +16,14 @@ G_M57390_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M57390_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.b, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M57390_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=86bf1fd1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=86bf1fd1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-16.67%) : 11400.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)

@@ -16,15 +16,14 @@ G_M33416_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M33416_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.h, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M33416_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=c51e7d77) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=c51e7d77) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-16.67%) : 11407.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)

@@ -16,15 +16,14 @@ G_M18837_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M18837_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.d, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M18837_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=e813b66a) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=e813b66a) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-16.67%) : 11399.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)

@@ -16,15 +16,14 @@ G_M43790_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M43790_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.d, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M43790_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=73a354f1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=73a354f1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch

-4 (-1.30%) : 13109.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)

@@ -7,33 +7,33 @@
 ; No matching PGO data
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T09] (  5,  5   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
-;  V01 loc0         [V01,T05] (  3,  9   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
+;  V00 this         [V00,T08] (  5,  5   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
+;* V01 loc0         [V01,T22] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[byte]>
 ;* V02 loc1         [V02    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V03 loc2         [V03    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V04 loc3         [V04    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;  V05 loc4         [V05,T00] ( 12, 41.50)     int  ->   x1        
-;  V06 loc5         [V06,T13] (  3,  6   )     int  ->   x2         single-def
-;  V07 loc6         [V07,T17] (  3,  5   )    long  ->   x4        
-;  V08 loc7         [V08,T18] (  3,  5   )    long  ->   x6        
+;  V06 loc5         [V06,T12] (  3,  6   )     int  ->   x2         single-def
+;  V07 loc6         [V07,T16] (  3,  5   )    long  ->   x4        
+;  V08 loc7         [V08,T17] (  3,  5   )    long  ->   x6        
 ;  V09 loc8         [V09    ] (  1,  0.50)     ref  ->  [fp+0x18]   must-init pinned class-hnd single-def <byte[]>
 ;  V10 loc9         [V10    ] (  1,  0.50)     ref  ->  [fp+0x10]   must-init pinned class-hnd single-def <byte[]>
-;  V11 loc10        [V11,T08] (  2,  8   )   ubyte  ->   x8        
+;  V11 loc10        [V11,T07] (  2,  8   )   ubyte  ->   x8        
 ;# V12 OutArgs      [V12    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V13 tmp1         [V13,T15] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
-;  V14 tmp2         [V14,T16] (  5,  5   )     ref  ->   x6         class-hnd single-def "dup spill" <byte[]>
-;  V15 tmp3         [V15,T19] (  2,  2   )    long  ->   x4         "Cast away GC"
-;  V16 tmp4         [V16,T20] (  2,  2   )    long  ->   x6         "Cast away GC"
+;  V13 tmp1         [V13,T14] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
+;  V14 tmp2         [V14,T15] (  5,  5   )     ref  ->   x6         class-hnd single-def "dup spill" <byte[]>
+;  V15 tmp3         [V15,T18] (  2,  2   )    long  ->   x4         "Cast away GC"
+;  V16 tmp4         [V16,T19] (  2,  2   )    long  ->   x6         "Cast away GC"
 ;  V17 tmp5         [V17,T01] (  3, 24   )     ref  ->   x2         "arr expr"
 ;  V18 tmp6         [V18,T02] (  3, 24   )     ref  ->   x6         "arr expr"
-;* V19 tmp7         [V19,T21] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
-;* V20 tmp8         [V20,T22] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
-;  V21 cse0         [V21,T06] (  3,  8.50)     int  ->   x2         "CSE #11: aggressive"
-;  V22 cse1         [V22,T07] (  3,  8.50)     int  ->   x4         "CSE #14: aggressive"
-;  V23 cse2         [V23,T14] (  3,  6   )     int  ->   x7         "CSE #07: aggressive"
-;  V24 cse3         [V24,T12] (  4,  6.50)     int  ->   x0         "CSE #06: aggressive"
-;  V25 cse4         [V25,T10] (  4,  6.50)     ref  ->   x3         "CSE #01: aggressive"
-;  V26 cse5         [V26,T11] (  4,  6.50)     ref  ->   x5         "CSE #03: aggressive"
+;* V19 tmp7         [V19,T20] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
+;* V20 tmp8         [V20,T21] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
+;  V21 cse0         [V21,T05] (  3,  8.50)     int  ->   x2         "CSE #11: aggressive"
+;  V22 cse1         [V22,T06] (  3,  8.50)     int  ->   x4         "CSE #14: aggressive"
+;  V23 cse2         [V23,T13] (  3,  6   )     int  ->   x7         "CSE #07: aggressive"
+;  V24 cse3         [V24,T11] (  4,  6.50)     int  ->   x0         "CSE #06: aggressive"
+;  V25 cse4         [V25,T09] (  4,  6.50)     ref  ->   x3         "CSE #01: aggressive"
+;  V26 cse5         [V26,T10] (  4,  6.50)     ref  ->   x5         "CSE #03: aggressive"
 ;  V27 cse6         [V27,T03] (  3, 12   )    long  ->   x4         "CSE #08: aggressive"
 ;  V28 cse7         [V28,T04] (  3, 12   )    long  ->   x8         "CSE #05: aggressive"
 ;
@@ -46,7 +46,6 @@ G_M892_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
 						;; size=12 bbWeight=1 PerfScore 2.50
 G_M892_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ; gcrRegs +[x0]
-            ptrue   p0.b
             mov     w1, wzr
             cntb    x2, all
             ldr     x3, [x0, #0x10]
@@ -57,7 +56,7 @@ G_M892_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref,
             ldr     w6, [x5, #0x08]
             cmp     w4, w6
             bne     G_M892_IG11
-						;; size=36 bbWeight=1 PerfScore 18.00
+						;; size=32 bbWeight=1 PerfScore 16.00
 G_M892_IG03:        ; bbWeight=0.50, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {}, byref, isz
             mov     x4, x3
             ; gcrRegs +[x4]
@@ -99,14 +98,14 @@ G_M892_IG07:        ; bbWeight=1, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {},
 G_M892_IG08:        ; bbWeight=4, gcrefRegs=0028 {x3 x5}, byrefRegs=0000 {}, byref, isz
             sxtw    x8, w1
             add     x9, x4, x8
+            ptrue   p0.b
             ld1b    { z16.b }, p0/z, [x9]
             add     x8, x6, x8
             ld1b    { z17.b }, p0/z, [x8]
-            ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, z17.b
-            mov     z16.b, p1/z, #1
-            ptrue   p1.b
-            uaddv   d16, p1, z16.b
+            cmpne   p0.b, p0/z, z16.b, z17.b
+            mov     z16.b, p0/z, #1
+            ptrue   p0.b
+            uaddv   d16, p0, z16.b
             umov    x8, v16.d[0]
             uxtb    w8, w8
             cmp     w8, #0
@@ -169,7 +168,7 @@ G_M892_IG15:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 308, prolog size 12, PerfScore 259.00, instruction count 77, allocated bytes for code 308 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
+; Total bytes of code 304, prolog size 12, PerfScore 257.00, instruction count 76, allocated bytes for code 304 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -180,7 +179,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 77 (0x0004d) Actual length = 308 (0x000134)
+  Function Length   : 76 (0x0004c) Actual length = 304 (0x000130)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

-4 (-1.05%) : 26420.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

@@ -21,14 +21,13 @@
 ;# V10 OutArgs      [V10    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V11 tmp1         [V11,T05] (  5,  8   )     ref  ->   x3         class-hnd single-def "dup spill" <char[]>
 ;* V12 tmp2         [V12    ] (  0,  0   )  ushort  ->  zero-ref    "Inlining Arg"
-;* V13 tmp3         [V13    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg" <System.Numerics.Vector`1[short]>
-;  V14 tmp4         [V14,T11] (  2,  2   )    long  ->   x3         "Cast away GC"
-;  V15 tmp5         [V15,T01] (  3, 24   )     ref  ->   x3         "arr expr"
-;  V16 cse0         [V16,T08] (  3,  6   )     int  ->   x4         "CSE #07: aggressive"
-;  V17 cse1         [V17,T03] (  5, 10.25)     int  ->   x0         "CSE #02: aggressive"
-;  V18 cse2         [V18,T07] (  3,  6   )     ref  ->   x2         "CSE #06: aggressive"
-;  V19 cse3         [V19,T04] (  4, 10   )     int  ->   x5         "CSE #05: aggressive"
-;  V20 cse4         [V20,T10] (  2,  4.25)    mask  ->   p0         hoist "CSE #03: aggressive"
+;  V13 tmp3         [V13,T11] (  2,  2   )    long  ->   x3         "Cast away GC"
+;  V14 tmp4         [V14,T01] (  3, 24   )     ref  ->   x3         "arr expr"
+;  V15 cse0         [V15,T08] (  3,  6   )     int  ->   x4         "CSE #07: aggressive"
+;  V16 cse1         [V16,T03] (  5, 10.25)     int  ->   x0         "CSE #02: aggressive"
+;  V17 cse2         [V17,T07] (  3,  6   )     ref  ->   x2         "CSE #06: aggressive"
+;  V18 cse3         [V18,T04] (  4, 10   )     int  ->   x5         "CSE #05: aggressive"
+;  V19 cse4         [V19,T10] (  2,  4.25)    mask  ->   p0         hoist "CSE #03: aggressive"
 ;
 ; Lcl frame size = 16
 
@@ -62,14 +61,13 @@ G_M34028_IG04:        ; bbWeight=0.50, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}
 G_M34028_IG05:        ; bbWeight=1, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}, byref, isz
             ldrh    w4, [x0, #0x14]
             dup     v16.8h, w4
-            ptrue   p0.h
-            mov     z17.h, p0/z, #1
+            mvni    v17.4s, #0
             ldr     w0, [x0, #0x10]
             ; gcrRegs -[x0]
             cnth    x5, all
             cmp     w0, w5
             ble     G_M34028_IG10
-						;; size=32 bbWeight=1 PerfScore 15.50
+						;; size=28 bbWeight=1 PerfScore 12.00
 G_M34028_IG06:        ; bbWeight=0.25, gcrefRegs=0004 {x2}, byrefRegs=0000 {}, byref
             ptrue   p0.h
             cmpne   p0.h, p0/z, z17.h, #0
@@ -177,7 +175,7 @@ G_M34028_IG18:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 380, prolog size 12, PerfScore 236.38, instruction count 95, allocated bytes for code 380 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
+; Total bytes of code 376, prolog size 12, PerfScore 232.88, instruction count 94, allocated bytes for code 376 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -188,7 +186,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 95 (0x0005f) Actual length = 380 (0x00017c)
+  Function Length   : 94 (0x0005e) Actual length = 376 (0x000178)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

+8 (+2.11%) : 6897.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)

@@ -8,31 +8,31 @@
 ; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T06] (  9,  7   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
+;  V00 this         [V00,T05] (  9,  7   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
 ;  V01 loc0         [V01,T02] (  6, 17.50)     int  ->   x1        
 ;  V02 loc1         [V02,T04] (  5, 10   )     int  ->   x2         single-def
-;  V03 loc2         [V03,T05] (  4, 10   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
-;  V04 loc3         [V04,T01] (  6, 18   )    mask  ->   p1         <System.Numerics.Vector`1[byte]>
+;* V03 loc2         [V03,T19] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[byte]>
+;  V04 loc3         [V04,T01] (  6, 18   )    mask  ->   p0         <System.Numerics.Vector`1[byte]>
 ;  V05 loc4         [V05,T20] (  4, 13   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
 ;* V06 loc5         [V06    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V07 loc6         [V07    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
-;  V08 loc7         [V08,T11] (  3,  5   )    long  ->   x4        
-;  V09 loc8         [V09,T12] (  3,  5   )    long  ->   x5        
+;  V08 loc7         [V08,T10] (  3,  5   )    long  ->   x4        
+;  V09 loc8         [V09,T11] (  3,  5   )    long  ->   x5        
 ;  V10 loc9         [V10    ] (  1,  0.50)     ref  ->  [fp+0x28]   must-init pinned class-hnd single-def <byte[]>
 ;  V11 loc10        [V11    ] (  1,  0.50)     ref  ->  [fp+0x20]   must-init pinned class-hnd single-def <byte[]>
-;  V12 loc11        [V12,T10] (  4,  5   )     int  ->   x3        
-;  V13 loc12        [V13,T18] (  3,  1.50)     int  ->   x3         single-def
+;  V12 loc11        [V12,T09] (  4,  5   )     int  ->   x3        
+;  V13 loc12        [V13,T17] (  3,  1.50)     int  ->   x3         single-def
 ;  V14 loc13        [V14,T00] (  7, 22.50)     int  ->   x4        
 ;# V15 OutArgs      [V15    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V16 tmp1         [V16,T08] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
-;  V17 tmp2         [V17,T09] (  5,  5   )     ref  ->   x5         class-hnd single-def "dup spill" <byte[]>
-;  V18 tmp3         [V18,T16] (  2,  2   )    long  ->   x4         "Cast away GC"
-;  V19 tmp4         [V19,T17] (  2,  2   )    long  ->   x5         "Cast away GC"
-;  V20 tmp5         [V20,T13] (  3,  3   )     ref  ->   x2         single-def "arr expr"
-;  V21 tmp6         [V21,T14] (  3,  3   )     ref  ->   x0         single-def "arr expr"
-;  V22 cse0         [V22,T07] (  3,  6   )     int  ->   x3         "CSE #05: aggressive"
-;  V23 cse1         [V23,T19] (  3,  1.50)    long  ->   x3         "CSE #08: moderate"
-;  V24 cse2         [V24,T15] (  4,  2   )     int  ->   x1         "CSE #07: moderate"
+;  V16 tmp1         [V16,T07] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
+;  V17 tmp2         [V17,T08] (  5,  5   )     ref  ->   x5         class-hnd single-def "dup spill" <byte[]>
+;  V18 tmp3         [V18,T15] (  2,  2   )    long  ->   x4         "Cast away GC"
+;  V19 tmp4         [V19,T16] (  2,  2   )    long  ->   x5         "Cast away GC"
+;  V20 tmp5         [V20,T12] (  3,  3   )     ref  ->   x2         single-def "arr expr"
+;  V21 tmp6         [V21,T13] (  3,  3   )     ref  ->   x0         single-def "arr expr"
+;  V22 cse0         [V22,T06] (  3,  6   )     int  ->   x3         "CSE #05: aggressive"
+;  V23 cse1         [V23,T18] (  3,  1.50)    long  ->   x3         "CSE #08: moderate"
+;  V24 cse2         [V24,T14] (  4,  2   )     int  ->   x1         "CSE #07: moderate"
 ;  V25 cse3         [V25,T03] (  3, 12   )    long  ->   x6         "CSE #06: aggressive"
 ;  V26 rat0         [V26,T21] (  3,  9   )  simd16  ->  [fp+0x10]   do-not-enreg[S] "SIMDInitTempVar"
 ;
@@ -47,10 +47,9 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
             ; gcrRegs +[x0]
             mov     w1, wzr
             cntb    x2, all
-            ptrue   p0.b
             ldr     w3, [x0, #0x20]
             mov     w4, wzr
-            whilelt p1.b, w4, w3
+            whilelt p0.b, w4, w3
             movi    v16.4s, #0
             ldr     x4, [x0, #0x10]
             ; gcrRegs +[x4]
@@ -62,7 +61,7 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
             ; gcrRegs -[x5]
             cmp     w4, w5
             bne     G_M14759_IG14
-						;; size=52 bbWeight=1 PerfScore 24.00
+						;; size=48 bbWeight=1 PerfScore 22.00
 G_M14759_IG03:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ldr     x4, [x0, #0x10]
             ; gcrRegs +[x4]
@@ -96,27 +95,30 @@ G_M14759_IG06:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, b
             mov     x5, xzr
 						;; size=4 bbWeight=0.50 PerfScore 0.25
 G_M14759_IG07:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
-            ptest   p0, p1.b
+            ptrue   p1.b
+            ptest   p1, p0.b
             bge     G_M14759_IG09
-						;; size=8 bbWeight=1 PerfScore 3.00
+						;; size=12 bbWeight=1 PerfScore 5.00
 G_M14759_IG08:        ; bbWeight=4, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             sxtw    x6, w1
             add     x7, x4, x6
-            ld1b    { z16.b }, p1/z, [x7]
+            ld1b    { z16.b }, p0/z, [x7]
             add     x6, x5, x6
-            ld1b    { z17.b }, p1/z, [x6]
+            ld1b    { z17.b }, p0/z, [x6]
+            ptrue   p0.b
+            cmpne   p0.b, p0/z, z16.b, z17.b
+            mov     z16.b, p0/z, #1
+            ptrue   p0.b
+            cmpne   p0.b, p0/z, z16.b, #0
             ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, z17.b
-            mov     z16.b, p1/z, #1
-            ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, #0
-            ptest   p0, p1.b
+            ptest   p1, p0.b
             bne     G_M14759_IG09
             add     w1, w1, w2
-            whilelt p1.b, w1, w3
-            ptest   p0, p1.b
+            whilelt p0.b, w1, w3
+            ptrue   p1.b
+            ptest   p1, p0.b
             blt     G_M14759_IG08
-						;; size=64 bbWeight=4 PerfScore 152.00
+						;; size=72 bbWeight=4 PerfScore 168.00
 G_M14759_IG09:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             mov     w3, wzr
             mov     w4, wzr
@@ -198,7 +200,7 @@ G_M14759_IG19:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 380, prolog size 12, PerfScore 248.50, instruction count 95, allocated bytes for code 380 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
+; Total bytes of code 388, prolog size 12, PerfScore 264.50, instruction count 97, allocated bytes for code 388 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -209,7 +211,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 95 (0x0005f) Actual length = 380 (0x00017c)
+  Function Length   : 97 (0x00061) Actual length = 388 (0x000184)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

+8 (+5.26%) : 21539.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)

@@ -8,19 +8,20 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T05] (  4,  4   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrLen>
-;  V01 loc0         [V01,T04] (  3,  7   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
+;  V01 loc0         [V01,T11] (  2,  3   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
 ;* V02 loc1         [V02    ] (  0,  0   )    mask  ->  zero-ref    <System.Numerics.Vector`1[byte]>
-;  V03 loc2         [V03,T10] (  5, 13   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
+;  V03 loc2         [V03,T10] (  5, 13   )  simd16  ->  d17         <System.Numerics.Vector`1[byte]>
 ;  V04 loc3         [V04,T00] (  6, 18   )    long  ->   x1        
 ;  V05 loc4         [V05,T07] (  2,  5   )    long  ->   x2         single-def
-;  V06 loc5         [V06,T01] (  5, 12   )    mask  ->   p1         <System.Numerics.Vector`1[byte]>
-;  V07 loc6         [V07,T03] (  4,  7   )    long  ->   x0        
+;  V06 loc5         [V06,T01] (  5, 12   )    mask  ->   p0         <System.Numerics.Vector`1[byte]>
+;  V07 loc6         [V07,T04] (  4,  7   )    long  ->   x0        
 ;  V08 loc7         [V08    ] (  1,  1   )     ref  ->  [fp+0x18]   must-init pinned class-hnd single-def <byte[]>
 ;# V09 OutArgs      [V09    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V10 tmp1         [V10,T02] (  5,  8   )     ref  ->   x0         class-hnd single-def "dup spill" <byte[]>
 ;  V11 tmp2         [V11,T08] (  2,  2   )    long  ->   x0         "Cast away GC"
 ;  V12 cse0         [V12,T06] (  3,  6   )     int  ->   x3         "CSE #02: aggressive"
-;  V13 cse1         [V13,T09] (  2,  1   )     int  ->   x4         "CSE #01: moderate"
+;  V13 cse1         [V13,T03] (  3,  8   )    mask  ->   p1         "CSE #03: aggressive"
+;  V14 cse2         [V14,T09] (  2,  1   )     int  ->   x4         "CSE #01: moderate"
 ;
 ; Lcl frame size = 16
 
@@ -31,16 +32,16 @@ G_M60402_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=12 bbWeight=1 PerfScore 2.50
 G_M60402_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ; gcrRegs +[x0]
-            ptrue   p0.b
+            mvni    v16.4s, #0
             mov     x1, xzr
             cntb    x2, all
             ldr     w3, [x0, #0x18]
             mov     w4, wzr
-            whilelt p1.b, w4, w3
+            whilelt p0.b, w4, w3
             ldr     x0, [x0, #0x08]
             str     x0, [fp, #0x18]	// [V08 loc7]
             cbz     x0, G_M60402_IG04
-						;; size=36 bbWeight=1 PerfScore 15.00
+						;; size=36 bbWeight=1 PerfScore 13.50
 G_M60402_IG03:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ldr     w4, [x0, #0x08]
             cbz     w4, G_M60402_IG04
@@ -54,28 +55,30 @@ G_M60402_IG04:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
             mov     x0, xzr
 						;; size=4 bbWeight=0.50 PerfScore 0.25
 G_M60402_IG05:        ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
-            ld1b    { z16.b }, p1/z, [x0]
-            movi    v17.4s, #0
+            ld1b    { z17.b }, p0/z, [x0]
+            ptrue   p1.b
+            cmpne   p1.b, p1/z, z16.b, #0
+            movi    v16.4s, #0
             ptrue   p2.b
-            cmpeq   p2.b, p2/z, z16.b, z17.b
-            ptest   p0, p2.b
+            cmpeq   p2.b, p2/z, z17.b, z16.b
+            ptest   p1, p2.b
             bne     G_M60402_IG07
-						;; size=24 bbWeight=2 PerfScore 33.00
+						;; size=32 bbWeight=2 PerfScore 43.00
 G_M60402_IG06:        ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             add     x1, x1, x2
-            whilelt p1.b, w1, w3
+            whilelt p0.b, w1, w3
             add     x4, x0, x1
-            ld1b    { z16.b }, p1/z, [x4]
-            movi    v17.4s, #0
+            ld1b    { z17.b }, p0/z, [x4]
+            movi    v16.4s, #0
             ptrue   p2.b
-            cmpeq   p2.b, p2/z, z16.b, z17.b
-            ptest   p0, p2.b
+            cmpeq   p2.b, p2/z, z17.b, z16.b
+            ptest   p1, p2.b
             beq     G_M60402_IG06
 						;; size=36 bbWeight=4 PerfScore 78.00
 G_M60402_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            ptrue   p0.b
-            cmpne   p0.b, p0/z, z16.b, #0
-            cntp    x0, p1, p0.b
+            ptrue   p1.b
+            cmpne   p1.b, p1/z, z17.b, #0
+            cntp    x0, p0, p1.b
             add     x0, x0, x1
 						;; size=16 bbWeight=1 PerfScore 7.50
 G_M60402_IG08:        ; bbWeight=1, epilog, nogc, extend
@@ -83,7 +86,7 @@ G_M60402_IG08:        ; bbWeight=1, epilog, nogc, extend
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 152, prolog size 12, PerfScore 141.00, instruction count 38, allocated bytes for code 152 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
+; Total bytes of code 160, prolog size 12, PerfScore 149.50, instruction count 40, allocated bytes for code 160 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -94,7 +97,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 38 (0x00026) Actual length = 152 (0x000098)
+  Function Length   : 40 (0x00028) Actual length = 160 (0x0000a0)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

Details

Size improvements/regressions per collection

Collection	Contexts with diffs	Improvements	Regressions	Same size	Improvements (bytes)	Regressions (bytes)
benchmarks.run.linux.arm64.checked.mch	4	2	2	0	-8	+16
coreclr_tests.run.linux.arm64.checked.mch	10,995	10,810	148	37	-336,352	+2,368
benchmarks.run_pgo.linux.arm64.checked.mch	5	5	0	0	-20	+0
libraries.pmi.linux.arm64.checked.mch	10	10	0	0	-40	+0
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	0	0	0	0	-0	+0
libraries_tests.run.linux.arm64.Release.mch	0	0	0	0	-0	+0
smoke_tests.nativeaot.linux.arm64.checked.mch	0	0	0	0	-0	+0
realworld.run.linux.arm64.checked.mch	0	0	0	0	-0	+0
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch	4	2	2	0	-8	+16
libraries.crossgen2.linux.arm64.checked.mch	0	0	0	0	-0	+0
	11,018	10,829	152	37	-336,428	+2,400

PerfScore improvements/regressions per collection

Collection	Contexts with diffs	Improvements	Regressions	Same PerfScore	Improvements (PerfScore)	Regressions (PerfScore)	PerfScore Overall in FullOpts
benchmarks.run.linux.arm64.checked.mch	4	2	2	0	-1.13%	+6.23%	+0.0003%
coreclr_tests.run.linux.arm64.checked.mch	10,995	10,811	148	36	-5.42%	+1.61%	-0.1610%
benchmarks.run_pgo.linux.arm64.checked.mch	5	5	0	0	-1.47%	0.00%	0.0000%
libraries.pmi.linux.arm64.checked.mch	10	10	0	0	-46.67%	0.00%	-0.0024%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	0	0	0	0	0.00%	0.00%	0.0000%
libraries_tests.run.linux.arm64.Release.mch	0	0	0	0	0.00%	0.00%	0.0000%
smoke_tests.nativeaot.linux.arm64.checked.mch	0	0	0	0	0.00%	0.00%	0.0000%
realworld.run.linux.arm64.checked.mch	0	0	0	0	0.00%	0.00%	0.0000%
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch	4	2	2	0	-1.13%	+6.23%	+0.0003%
libraries.crossgen2.linux.arm64.checked.mch	0	0	0	0	0.00%	0.00%	0.0000%

Context information

Collection	Diffed contexts	MinOpts	FullOpts	Missed, base	Missed, diff
benchmarks.run.linux.arm64.checked.mch	34,596	2,941	31,655	0 (0.00%)	0 (0.00%)
coreclr_tests.run.linux.arm64.checked.mch	738,681	470,040	268,641	0 (0.00%)	0 (0.00%)
benchmarks.run_pgo.linux.arm64.checked.mch	132,155	61,313	70,842	0 (0.00%)	0 (0.00%)
libraries.pmi.linux.arm64.checked.mch	259,557	5	259,552	0 (0.00%)	0 (0.00%)
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	350,289	21,739	328,550	0 (0.00%)	0 (0.00%)
libraries_tests.run.linux.arm64.Release.mch	767,445	533,158	234,287	0 (0.00%)	0 (0.00%)
smoke_tests.nativeaot.linux.arm64.checked.mch	18,714	7	18,707	0 (0.00%)	0 (0.00%)
realworld.run.linux.arm64.checked.mch	28,798	39	28,759	0 (0.00%)	0 (0.00%)
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch	35,913	3,374	32,539	0 (0.00%)	0 (0.00%)
libraries.crossgen2.linux.arm64.checked.mch	265,227	17	265,210	0 (0.00%)	0 (0.00%)
	2,631,375	1,092,633	1,538,742	0 (0.00%)	0 (0.00%)

jit-analyze output

benchmarks.run.linux.arm64.checked.mch


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 16042184 (overridden on cmd)
Total bytes of diff: 16042192 (overridden on cmd)
Total bytes of delta: 8 (0.00 % of base)
    diff is a regression.
    relative diff is a regression.

Detail diffs



Top file regressions (bytes):
           8 : 21403.dasm (5.26 % of base)
           8 : 8287.dasm (2.11 % of base)

Top file improvements (bytes):
          -4 : 16114.dasm (-1.30 % of base)
          -4 : 26115.dasm (-1.05 % of base)

4 total files with Code Size differences (2 improved, 2 regressed), 0 unchanged.

Top method regressions (bytes):
           8 (2.11 % of base) : 8287.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
           8 (5.26 % of base) : 21403.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)

Top method improvements (bytes):
          -4 (-1.30 % of base) : 16114.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
          -4 (-1.05 % of base) : 26115.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

Top method regressions (percentages):
           8 (5.26 % of base) : 21403.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
           8 (2.11 % of base) : 8287.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)

Top method improvements (percentages):
          -4 (-1.30 % of base) : 16114.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
          -4 (-1.05 % of base) : 26115.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

4 total methods with Code Size differences (2 improved, 2 regressed).

coreclr_tests.run.linux.arm64.checked.mch


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 567688716 (overridden on cmd)
Total bytes of diff: 567354732 (overridden on cmd)
Total bytes of delta: -333984 (-0.06 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file regressions (bytes):
          16 : 568472.dasm (2.53 % of base)
          16 : 567119.dasm (2.53 % of base)
          16 : 569192.dasm (2.53 % of base)
          16 : 569088.dasm (2.53 % of base)
          16 : 588064.dasm (2.50 % of base)
          16 : 574096.dasm (2.42 % of base)
          16 : 588776.dasm (2.50 % of base)
          16 : 567479.dasm (2.53 % of base)
          16 : 567847.dasm (2.53 % of base)
          16 : 568247.dasm (2.53 % of base)
          16 : 575544.dasm (2.50 % of base)
          16 : 567144.dasm (2.53 % of base)
          16 : 574664.dasm (2.42 % of base)
          16 : 575272.dasm (2.53 % of base)
          16 : 568047.dasm (2.53 % of base)
          16 : 567872.dasm (2.53 % of base)
          16 : 594152.dasm (2.50 % of base)
          16 : 569166.dasm (2.53 % of base)
          16 : 595048.dasm (2.50 % of base)
          16 : 568447.dasm (2.53 % of base)

Top file improvements (bytes):
        -152 : 574294.dasm (-8.02 % of base)
        -152 : 574505.dasm (-8.02 % of base)
        -152 : 574268.dasm (-8.02 % of base)
        -152 : 574680.dasm (-8.02 % of base)
        -152 : 574630.dasm (-8.02 % of base)
        -152 : 574138.dasm (-8.02 % of base)
        -152 : 574216.dasm (-8.02 % of base)
        -152 : 574605.dasm (-8.02 % of base)
        -152 : 574164.dasm (-8.02 % of base)
        -152 : 574530.dasm (-8.02 % of base)
        -152 : 574555.dasm (-8.02 % of base)
        -152 : 574112.dasm (-8.02 % of base)
        -152 : 574580.dasm (-8.02 % of base)
        -152 : 574190.dasm (-8.02 % of base)
        -152 : 574655.dasm (-8.02 % of base)
        -152 : 574242.dasm (-8.02 % of base)
        -144 : 574842.dasm (-8.07 % of base)
        -144 : 574819.dasm (-8.07 % of base)
        -144 : 574865.dasm (-8.07 % of base)
        -144 : 574796.dasm (-8.07 % of base)

84 total files with Code Size differences (54 improved, 30 regressed), 20 unchanged.

Top method regressions (bytes):
          16 (2.50 % of base) : 594152.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_ConditionalExtractAfterLastActiveElement_byte:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 594408.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_ConditionalExtractAfterLastActiveElementAndReplicate_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 594792.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_ConditionalExtractLastActiveElement_byte:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 595048.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_ConditionalExtractLastActiveElementAndReplicate_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.42 % of base) : 574096.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_byte:RunBasicScenario_Load():this (FullOpts)
          16 (2.42 % of base) : 574200.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_sbyte:RunBasicScenario_Load():this (FullOpts)
          16 (2.42 % of base) : 574664.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575272.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575294.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575358.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575424.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 575544.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_FusedMultiplyAddNegated_float:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 588064.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_MultiplyAdd_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 588776.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_MultiplySubtract_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567119.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAdd_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567144.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAdd_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567311.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningLower_long_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567416.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningUpper_int_short:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567479.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningUpper_uint_ushort:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567847.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseClearXor_int:RunBasicScenario_Load():this (FullOpts)

Top method improvements (bytes):
        -152 (-8.02 % of base) : 574112.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_byte:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574268.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_int:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574294.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_long:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574216.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_sbyte:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574242.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_short:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574164.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_uint:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574190.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_ulong:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574138.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_ushort:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574505.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_byte:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574655.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_int:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574680.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_long:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574605.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_sbyte:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574630.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_short:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574555.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_uint:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574580.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_ulong:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574530.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_ushort:ConditionalSelect_ZeroOp():this (FullOpts)
        -144 (-8.07 % of base) : 574842.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_CreateBreakPropagateMask_int:ConditionalSelect_ZeroOp():this (FullOpts)
        -144 (-8.07 % of base) : 574865.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_CreateBreakPropagateMask_long:ConditionalSelect_ZeroOp():this (FullOpts)
        -144 (-8.07 % of base) : 574796.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_CreateBreakPropagateMask_sbyte:ConditionalSelect_ZeroOp():this (FullOpts)
        -144 (-8.07 % of base) : 574819.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_CreateBreakPropagateMask_short:ConditionalSelect_ZeroOp():this (FullOpts)

Top method regressions (percentages):
          16 (2.53 % of base) : 575272.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575294.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575358.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575424.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567119.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAdd_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567144.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAdd_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567311.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningLower_long_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567416.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningUpper_int_short:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567479.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningUpper_uint_ushort:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567847.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseClearXor_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567872.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseClearXor_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568047.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelect_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568072.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelect_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568247.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelectLeftInverted_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568272.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelectLeftInverted_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568447.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelectRightInverted_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568472.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelectRightInverted_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 569166.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_byte:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 569088.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_short:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 569192.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)

Top method improvements (percentages):
         -28 (-58.33 % of base) : 358603.dasm - PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
         -24 (-54.55 % of base) : 358606.dasm - PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
         -28 (-43.75 % of base) : 679474.dasm - Runtime_1068867:TestEntryPoint() (FullOpts)
         -20 (-41.67 % of base) : 358602.dasm - PredicateInstructions:And():System.Numerics.Vector`1[short] (FullOpts)
         -20 (-41.67 % of base) : 358605.dasm - PredicateInstructions:Or():System.Numerics.Vector`1[short] (FullOpts)
         -20 (-41.67 % of base) : 358604.dasm - PredicateInstructions:Xor():System.Numerics.Vector`1[int] (FullOpts)
         -48 (-33.33 % of base) : 679471.dasm - Runtime_106868:TestEntryPoint() (FullOpts)
         -28 (-23.33 % of base) : 642641.dasm - ChangeMaskUse:CastMaskUseAsMask() (FullOpts)
         -20 (-22.73 % of base) : 679365.dasm - Runtime_105720:TestEntryPoint() (FullOpts)
         -28 (-21.21 % of base) : 679478.dasm - Runtime_106872:TestEntryPoint() (FullOpts)
         -20 (-19.23 % of base) : 642640.dasm - ChangeMaskUse:CastMaskUseAsVector() (FullOpts)
         -16 (-17.39 % of base) : 349058.dasm - EmbeddedLoads:CndSelectEmbeddedOp3LoadAllBits(int[],System.Numerics.Vector`1[int]) (FullOpts)
         -16 (-17.39 % of base) : 349060.dasm - EmbeddedLoads:CndSelectEmbeddedOp3LoadZero(int[],System.Numerics.Vector`1[int]) (FullOpts)
         -20 (-17.24 % of base) : 679570.dasm - Runtime_113338:Test() (FullOpts)
          -4 (-16.67 % of base) : 524461.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskByte():System.Numerics.Vector`1[byte] (FullOpts)
          -4 (-16.67 % of base) : 106072.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskByte():System.Numerics.Vector`1[byte] (Tier0)
          -4 (-16.67 % of base) : 524463.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
          -4 (-16.67 % of base) : 106082.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (Tier0)
          -4 (-16.67 % of base) : 524465.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
          -4 (-16.67 % of base) : 106093.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (Tier0)

benchmarks.run_pgo.linux.arm64.checked.mch


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 71813300 (overridden on cmd)
Total bytes of diff: 71813280 (overridden on cmd)
Total bytes of delta: -20 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
          -4 : 24532.dasm (-0.58 % of base)
          -4 : 76632.dasm (-0.49 % of base)
          -4 : 39357.dasm (-0.43 % of base)
          -4 : 14743.dasm (-0.38 % of base)
          -4 : 58518.dasm (-0.85 % of base)

5 total files with Code Size differences (5 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
          -4 (-0.38 % of base) : 14743.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
          -4 (-0.43 % of base) : 39357.dasm - SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
          -4 (-0.58 % of base) : 24532.dasm - SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
          -4 (-0.49 % of base) : 76632.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
          -4 (-0.85 % of base) : 58518.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)

Top method improvements (percentages):
          -4 (-0.85 % of base) : 58518.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
          -4 (-0.58 % of base) : 24532.dasm - SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
          -4 (-0.49 % of base) : 76632.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
          -4 (-0.43 % of base) : 39357.dasm - SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
          -4 (-0.38 % of base) : 14743.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)

5 total methods with Code Size differences (5 improved, 0 regressed).

libraries.pmi.linux.arm64.checked.mch


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 68207660 (overridden on cmd)
Total bytes of diff: 68207620 (overridden on cmd)
Total bytes of delta: -40 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.

Detail diffs



Top file improvements (bytes):
          -4 : 11398.dasm (-16.67 % of base)
          -4 : 11407.dasm (-16.67 % of base)
          -4 : 11405.dasm (-16.67 % of base)
          -4 : 11400.dasm (-16.67 % of base)
          -4 : 11399.dasm (-16.67 % of base)
          -4 : 11401.dasm (-16.67 % of base)
          -4 : 11403.dasm (-16.67 % of base)
          -4 : 11402.dasm (-16.67 % of base)
          -4 : 11406.dasm (-16.67 % of base)
          -4 : 11404.dasm (-16.67 % of base)

10 total files with Code Size differences (10 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
          -4 (-16.67 % of base) : 11398.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskByte():System.Numerics.Vector`1[byte] (FullOpts)
          -4 (-16.67 % of base) : 11399.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
          -4 (-16.67 % of base) : 11400.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
          -4 (-16.67 % of base) : 11401.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
          -4 (-16.67 % of base) : 11402.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
          -4 (-16.67 % of base) : 11403.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
          -4 (-16.67 % of base) : 11404.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSingle():System.Numerics.Vector`1[float] (FullOpts)
          -4 (-16.67 % of base) : 11405.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt16():System.Numerics.Vector`1[ushort] (FullOpts)
          -4 (-16.67 % of base) : 11406.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt32():System.Numerics.Vector`1[uint] (FullOpts)
          -4 (-16.67 % of base) : 11407.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)

Top method improvements (percentages):
          -4 (-16.67 % of base) : 11398.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskByte():System.Numerics.Vector`1[byte] (FullOpts)
          -4 (-16.67 % of base) : 11399.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
          -4 (-16.67 % of base) : 11400.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
          -4 (-16.67 % of base) : 11401.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
          -4 (-16.67 % of base) : 11402.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
          -4 (-16.67 % of base) : 11403.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
          -4 (-16.67 % of base) : 11404.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSingle():System.Numerics.Vector`1[float] (FullOpts)
          -4 (-16.67 % of base) : 11405.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt16():System.Numerics.Vector`1[ushort] (FullOpts)
          -4 (-16.67 % of base) : 11406.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt32():System.Numerics.Vector`1[uint] (FullOpts)
          -4 (-16.67 % of base) : 11407.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)

10 total methods with Code Size differences (10 improved, 0 regressed).

benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 16695796 (overridden on cmd)
Total bytes of diff: 16695804 (overridden on cmd)
Total bytes of delta: 8 (0.00 % of base)
    diff is a regression.
    relative diff is a regression.

Detail diffs



Top file regressions (bytes):
           8 : 6897.dasm (2.11 % of base)
           8 : 21539.dasm (5.26 % of base)

Top file improvements (bytes):
          -4 : 13109.dasm (-1.30 % of base)
          -4 : 26420.dasm (-1.05 % of base)

4 total files with Code Size differences (2 improved, 2 regressed), 0 unchanged.

Top method regressions (bytes):
           8 (2.11 % of base) : 6897.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
           8 (5.26 % of base) : 21539.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)

Top method improvements (bytes):
          -4 (-1.30 % of base) : 13109.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
          -4 (-1.05 % of base) : 26420.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

Top method regressions (percentages):
           8 (5.26 % of base) : 21539.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
           8 (2.11 % of base) : 6897.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)

Top method improvements (percentages):
          -4 (-1.30 % of base) : 13109.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
          -4 (-1.05 % of base) : 26420.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

4 total methods with Code Size differences (2 improved, 2 regressed).

a74nh · 2025-06-16T12:26:34Z

Diffs look the same to me. Which is as expected - Only PredicateInstructions.cs is triggering the optimisation.

kunalspathak · 2025-06-16T16:01:58Z

I suggest for this PR disabling the optimisation by removing the call to fgMorphTryUseAllMaskVariant() and fixing up the test cases. A follow on PR would add the cost modelling.

In that case we will have to reopen all the issues that #114438 closed when we introduced that method.

Do you think we should fix the Predicate tests that was regressed by the work that this PR introduces and then open a follow-up issue to fix it in general?

a74nh · 2025-06-16T16:59:55Z

In that case we will have to reopen all the issues that #114438 closed when we introduced that method.

Arm64 SVE: Use mask versions of instructions where possible. #103078

ARM64-SVE: Use predicate versions of instructions #101970

ARM64 SVE: Zip/Unzip should use predicate variant when all inputs are masks #101598

Given these were essentially dups of each other, could we just reopen one of them?

Do you think we should fix the Predicate tests that was regressed by the work that this PR introduces and then open a follow-up issue to fix it in general?

I think fixing it will be quite a bit of work, so best in a new PR. This PR at least removes the asmcheck lines.

kunalspathak

LGTM

kunalspathak · 2025-06-18T13:24:07Z

/ba-g unrelated failures

Fixes dotnet#101970 in dotnet#115566 Adds a simple costing to fgMorphTryUseAllMaskVariant() and assumes nodes can always be converted to masks (using ConvertVectorToMask).

…otnet#115566)" This reverts commit 9fe86ca.

Fixes dotnet#101970 Predicate variants were implemented in dotnet#114438 and then turned off in dotnet#115566. The code was then removed in dotnet#117101 when the AMD64 version was moved to from morph to folding. This is a simple rework of that code. Replaces dotnet#116854

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label May 14, 2025

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 14, 2025

a74nh commented May 14, 2025

View reviewed changes

src/coreclr/jit/lowerarmarch.cpp Outdated Show resolved Hide resolved

a74nh added arch-arm64 arm-sve Work related to arm64 SVE/SVE2 support labels May 14, 2025

tannergooding reviewed May 14, 2025

View reviewed changes

src/coreclr/jit/gentree.h Outdated Show resolved Hide resolved

This was referenced May 14, 2025

CI flakiness: mono interpreter build getting killed #114123

Open

SmtpClientSendMailTest_SendAsync.MultipleRecipients_Failure_All test failure #115070

Closed

a74nh added 6 commits May 19, 2025 14:10

Remove all jit changes

fefb33c

Import constant vector 0 for createfalsemask

65d987e

fix up tests

c93db61

Only allow zero op3 contains for embedded ops

ae4847b

fix up tests

0c2316d

fix formatting

63af100

build-analysis bot mentioned this pull request May 20, 2025

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

a74nh added 9 commits May 21, 2025 12:11

Import constant vector all bits set for createtruemask

cac18d0

Fix up tests

c97d8a3

fix type of true mask variants

32ac0d3

Allow common code to create the convert for CreateTrueMask*

7204965

Fix x86 build

ed4ed9b

unique test names in templates

8512317

simpler lowering

b895ddd

Don't remove embedded ops that may throw

de06326

Clear embOp when clearing contained

33674a0

a74nh changed the title ~~Arm64 SVE: Better optimise zero/allbits vectors~~ Arm64 SVE: Optimise zero/allbits vectors the same as masks May 22, 2025

tannergooding reviewed May 22, 2025

View reviewed changes

src/coreclr/jit/gentree.cpp Outdated Show resolved Hide resolved

tannergooding reviewed May 22, 2025

View reviewed changes

src/coreclr/jit/gentree.h Outdated Show resolved Hide resolved

a74nh added 7 commits June 13, 2025 09:12

merge main

ff4c546

Add asserts to gtFoldExprHWIntrinsic

f0508a7

Simplify IsFalseMask

4c91658

inline IsTrueMask/IsFalseMask

afb4c3a

Use LABELEDDISPTREERANGE

e8aee07

Add header to gtFoldExprConvertVecCnsToMask

85b4da8

Remove FEATURE_HW_INTRINSICS around IsTrueMask/IsFalseMask

1b037e4

build-analysis bot mentioned this pull request Jun 13, 2025

"We stopped hearing from agent Azure Pipelines 32. Verify the agent machine is running and has a healthy network connection" dotnet/dnceng#1886

Open

3 tasks

turn off fgMorphTryUseAllMaskVariant

a1f703c

a74nh mentioned this pull request Jun 13, 2025

Arm64 SVE: Add AndNot/OrNot APIs #116628

Closed

a74nh mentioned this pull request Jun 18, 2025

ARM64-SVE: Use predicate versions of instructions #101970

Open

kunalspathak approved these changes Jun 18, 2025

View reviewed changes

kunalspathak merged commit 9fe86ca into dotnet:main Jun 18, 2025
113 of 115 checks passed

a74nh deleted the zeromask_github branch June 18, 2025 14:32

a74nh mentioned this pull request Jun 20, 2025

Arm64 SVE: Error when ConditionalSelect has all constant arguments #116847

Closed

a74nh mentioned this pull request Jun 20, 2025

Arm64 SVE: re-enable use of predicate variants #116854

Closed

kunalspathak added a commit to kunalspathak/runtime that referenced this pull request Jun 21, 2025

Revert "Arm64 SVE: Optimise zero/allbits vectors the same as masks (d…

8be5726

…otnet#115566)" This reverts commit 9fe86ca.

kunalspathak mentioned this pull request Jun 24, 2025

Use LSB of vector when converting from vector to mask #116991

Closed

a74nh mentioned this pull request Jul 4, 2025

Arm64: re-enable use of predicate variants #117313

Open

Arm64 SVE: Optimise zero/allbits vectors the same as masks #115566

Arm64 SVE: Optimise zero/allbits vectors the same as masks #115566

Uh oh!

Conversation

a74nh commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dotnet-policy-service bot commented May 14, 2025

Uh oh!

a74nh commented May 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

a74nh commented Jun 13, 2025

Uh oh!

a74nh commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kunalspathak commented Jun 13, 2025

Uh oh!

a74nh commented Jun 16, 2025

Size improvements/regressions per collection

PerfScore improvements/regressions per collection

Context information

jit-analyze output

Uh oh!

a74nh commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Size improvements/regressions per collection

PerfScore improvements/regressions per collection

Context information

jit-analyze output

Uh oh!

a74nh commented Jun 16, 2025

Uh oh!

kunalspathak commented Jun 16, 2025

Uh oh!

a74nh commented Jun 16, 2025

Uh oh!

kunalspathak left a comment

Choose a reason for hiding this comment

Uh oh!

kunalspathak commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

a74nh commented May 14, 2025 •

edited

Loading

a74nh commented Jun 13, 2025 •

edited

Loading

a74nh commented Jun 16, 2025 •

edited

Loading