Skip to content

[JIT, x64, Tier1+PGO] Wrong opcode emitted (jb instead of jl) after test reg,reg on sign-extended int32, in a chained || comparison #128895

@nukeme1

Description

@nukeme1

Description

[JIT, x64, Tier1+PGO] Wrong opcode emitted (jb instead of jl) after test reg,reg on sign-extended int32, in a chained || comparison

TL;DR

On .NET 10.0.8 x64, in a Tier-1 + Synthesized-PGO compilation of a large method
(~3.9 KB, 113 PGO-inlined callees), RyuJIT emits test rcx, rcx; jb SHORT L
where the correct opcode is jl SHORT L. The value in rcx is a sign-extended
int32 (movsxd rcx, ecx) being compared against zero as a signed long. The
jb (unsigned-LT) branch is provably never taken after test r,r
(CF is cleared by test), so the fall-through always executes, producing a
deterministic wrong result.

Setting DOTNET_TieredPGO=0 (or "System.Runtime.TieredPGO": false in
runtimeconfig.json) is a working mitigation.

Environment

Runtime version 10.0.8
Process arch x64
OS Windows 10/11 (also affects Server, repro'd locally on Win 11 22631)
Tiering / PGO Tier1 with Synthesized PGO (also Dynamic PGO)
JIT name in disasm header BLENDED_CODE for generic X64 + VEX + EVEX on Windows
Method category Large pre-codegen'd calc method translated from a tax-calculation DSL
Method size IL size = 179, code size = 3869 bytes
Inlining context 113 inlinees with PGO data; 229 single-block inlinees; 24 inlinees without PGO data
fgCalledCount 32
Reproducer status Production binary reliably reproduces;

Affected method

After source-to-C# translation from the upstream tax DSL, the method is
plain straight-line C# that performs four IndexOfAny lookups, wraps each
result in a pooled TaxNumericCell.fValue (a long), and OR-chains the
Mark.Compare(0) >= 0 predicates:

public TaxBooleanCell local_ValidateQuotationMark(TaxStringCell InQuotationMark)
{
    int _BP = _B.ReserveItems(3);
    TaxBooleanCell __functionResult;
    TaxNumericCell Mark1, Mark2, Mark3, Mark4;

    __functionResult = BCell(false);

    Mark1 = ICell(FindChar(InQuotationMark.Value, "«"));
    Mark2 = ICell(FindChar(InQuotationMark.Value, "»"));
    Mark3 = ICell(FindChar(InQuotationMark.Value, "\u201C"));
    Mark4 = ICell(FindChar(InQuotationMark.Value, "\u201D"));

    if ((Mark1.Compare(0) >= 0)
     || (Mark2.Compare(0) >= 0)
     || (Mark3.Compare(0) >= 0)
     || (Mark4.Compare(0) >= 0))
    {
        __functionResult = BCell(true);
    }

    _B.ReleaseItems(3);
    return __functionResult;
}

Helpers (all inlined by Tier1+PGO):

// ICell stores a 1-based "found" index or -1 into a pooled numeric cell.
private TaxNumericCell ICell(int idx)
{
    int slot = _N.ReserveItems(1);
    TaxNumericCell c = (TaxNumericCell)_N[slot];
    c.fValue = idx >= 0 ? idx + 1 : -1;   // <-- key: int-domain cmovl, then store to long field
    return c;
}

private static int FindChar(string s, string needle) =>
    s.IndexOfAny(needle.ToCharArray(), 0, s.Length);

// Compare is virtual; PGO-guarded devirt selects the leaf TaxIntegerCell.
public override int Compare(long other)   // on TaxIntegerCell
{
    long delta = fValue - other;
    if (delta < 0) return -1;
    if (delta > 0) return 1;
    return 0;
}

After the OR-chain rewrite that the JIT applies to Compare(0) >= 0,
each predicate collapses to fValue >= 0 (signed long).

Buggy disassembly (verbatim excerpt from JitDisasm)

The full Tier1+PGO listing is 3869 bytes. The relevant region is IG82
(Mark4 computation, inline of FindCharIndexOfAny, plus ICell),
followed by IG83–IG87 (Mark1/2/3 from stack spills, all using the
correct signed branch), then IG88 with the wrong opcode for Mark4:

G_M000_IG82:
       mov      r9d, dword ptr [rsi+0x08]
       mov      rcx, rsi
       xor      r8d, r8d
       call     [System.String:IndexOfAny(char[],int,int):int:this]
       lea      ecx, [rax+0x01]              ; ecx = rax + 1
       mov      edx, -1
       test     eax, eax
       cmovl    ecx, edx                     ; ecx = (rax<0) ? -1 : rax+1
       movsxd   rcx, ecx                     ; rcx = sign-extend(ecx)  -- Mark4.fValue
       cmp      qword ptr [rsp+0xF0], 0      ; Mark1.fValue (stack spill)
       je       SHORT G_M000_IG89

G_M000_IG83:
       cmp      qword ptr [rsp+0xF0], 0
       jg       SHORT G_M000_IG89            ; correct: jg (signed)

G_M000_IG84:
       mov      rax, qword ptr [rsp+0xB0]    ; Mark2.fValue
       test     rax, rax
       je       SHORT G_M000_IG89

G_M000_IG85:
       test     rax, rax
       jg       SHORT G_M000_IG89            ; correct: jg (signed)

G_M000_IG86:
       mov      rax, qword ptr [rsp+0x70]    ; Mark3.fValue
       test     rax, rax
       je       SHORT G_M000_IG89

G_M000_IG87:
       test     rax, rax
       jg       SHORT G_M000_IG89            ; correct: jg (signed)

G_M000_IG88:
       test     rcx, rcx                     ; Mark4 still live in rcx from IG82
       jb       SHORT G_M000_IG90            ; *** BUG: jb (unsigned-LT) ***
                                             ; expected: jg / jge / equivalent signed test

G_M000_IG89:
       mov      rcx, 0xD1FFAB1E
       call     CORINFO_HELP_NEWSFAST        ; allocate TaxBooleanCell(true)
       ...
       call     [Cch.Data.TaxBooleanCell:Assign(bool):this]   ; result = true

G_M000_IG90:
       cmp      gword ptr [rbx+0x30], 0
       ...

Why this is wrong

  • test rcx, rcx clears CF unconditionally (Intel SDM, vol. 2, TEST).
    Therefore jb (which branches on CF=1) is never taken.
  • The predicate being lowered is Mark4.fValue >= 0. The signed-LT branch
    (jl) over the "set result = true" block is the standard lowering. The JIT
    picked the unsigned-LT opcode instead.
  • Consequence: whenever Mark1, Mark2, Mark3 all evaluate to false
    (the overwhelmingly hot path — strings without quotation marks), control
    falls through into IG89 regardless of Mark4's actual value, and
    result = true is unconditionally assigned.

Why this only affects Mark4 (and not Mark1/2/3)

Mark1/2/3 went through a stack spill (mov [rsp+offset], rax upstream;
mov rax, [rsp+offset]; test rax, rax; jg). After the memory round-trip
the JIT correctly emits the signed branch.

Mark4 stays register-resident in rcx from the movsxd rcx, ecx to the test.
Our hypothesis is that when the JIT decides the operand type at the
test r,r; jcc site, it consults a stale per-tree type tag that says
"this value is a sign-extended int32 whose sign was tested" — and from that
incorrectly concludes jb (the unsigned form). After a store/load the tag
is reset and jl is emitted correctly.

Impact

This affects a Canadian corporate tax (T2) calculation product shipping on
.NET 10. The method involved is a hot validator called once per quotation-mark
text field per return; the wrong-result manifests as spurious validation
errors on every return processed after the method tiers up to Tier1+PGO.
We are shipping the TieredPGO=0 mitigation in our runtimeconfig.json as
a stopgap.

Categorization hints

  • Area: area-CodeGen-coreclr
  • Component: opcode selection / branch lowering for GT_LT / GT_GE over
    sign-extended int32long values in PGO-driven Tier1 codegen.

Reproduction Steps

Reproduction

What we have today

  • The full assembly listing (DOTNET_JitDisasm=local_ValidateQuotationMark,
    DOTNET_JitDisasmSummary=1) is 3869 bytes and contains the buggy IG88
    block shown above, every run, on .NET 10.0.8 x64 with PGO enabled.
  • Functional repro: were unable to create a reliable minimal app that
    reproduces the issue.
    return true (wrong — expected false).
  • DOTNET_TieredPGO=0 makes the bug disappear (only the Tier1-without-PGO
    codegen is produced, which uses jl).
  • DOTNET_JitNoInline=1 also makes the bug disappear — confirming the
    miscompilation is inliner-driven: the wrong opcode only emerges once
    the JIT has inlined ICell + FindChar + Compare into
    local_ValidateQuotationMark, producing the cmovl + movsxd
    register-resident-through-OR-chain sequence shown at IG82–IG88. With
    inlining disabled, each helper is a real call, the value round-trips
    through memory, and the correct signed branch is emitted.

Minimal repro status — help wanted

We have attempted a minimal synthetic harness that mirrors the production
shape (4-level abstract Cell hierarchy, pooled allocator, PGO-driven
guarded devirt of Compare, 5M-call warmup, identical OR-chain), but
have not yet been able to trigger the bad codegen in isolation. In the
synthetic harness, the JIT spills Mark4 to the stack just like the other
three marks, and emits jg correctly.

It appears the production trigger requires the full inlining context
(113 inlinees, ~3.9 KB body) — specifically, enough register pressure for
the JIT to keep Mark4 register-resident from movsxd through the entire
OR-chain, and enough non-escaping cell allocations for the JIT to elide
the fValue store.

We are happy to:

  • Provide the complete production disassembly of the method
    (~28 KB text excerpt from DOTNET_JitDisasm).
  • Provide the complete jitDump (with DOTNET_JitDump=local_ValidateQuotationMark).
  • Run any specific DOTNET_* instrumentation env vars you'd like and ship
    the output back.

If the team needs a directly buildable repro, we can investigate whether the
proprietary calc binary can be shared under NDA, or whether a redistributable
extract of the generated Utils.cs file is sufficient.

Expected behavior

Expected behavior

test rcx, rcx; jl SHORT G_M000_IG90 (or any signed-LT equivalent) at IG88,
matching the codegen produced for Mark1/2/3 at IG83/IG85/IG87.

Actual behavior

Actual behavior

test rcx, rcx; jb SHORT G_M000_IG90 — opcode is unsigned-LT, branch is
provably dead, fall-through unconditionally executes the
__functionResult = BCell(true) block.

Regression?

No response

Known Workarounds

Workaround (confirmed)

Either:

set DOTNET_TieredPGO=0

or in runtimeconfig.json:

{
  "runtimeOptions": {
    "configProperties": {
      "System.Runtime.TieredPGO": false
    }
  }
}

Disabling tiering altogether (`DOTNET_TieredCompilation=0`) also works.
Static PGO (PGO-instrumented IL with `PgoEnabled=true`) was not tested.

DOTNET_JitNoInline=1 also made the issue disappear if that helps.

We elected to disable the tieredPGO via the runtimeconfig.template.json included with our applications pending a fix.

### Configuration

_No response_

### Other information

We have attached some output from an optimized run (disasm-tier1-pgo.txt, exhibited the issue), one that was not optimized for comparison (disasm-tier1-no-pgo.txt, pgo=0) along with various elements of information such as the installed .net environment information (dotnet-info 1.txt), the runtimeconfig (T2Txp.runtimeconfig.json) of the application, and excerpt of the csharp function (Utils.cs-excerpt.txt) that triggered the issue along with a raw jitDump with pgo enabled (jitdump-raw 1.txt) contaned in the attached zip file

[Evidence.zip](https://github.com/user-attachments/files/28509422/Evidence.zip)

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions