Skip to content

[MachineVerifier] Fix LiveInterval verification of unwinding instructions #131565

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

aheejin
Copy link
Member

@aheejin aheejin commented Mar 17, 2025

This effectively reverts 61d22f2.

61d22f2 claims that when the predecessor unwinds to an EH pad, we should take the live-out values of not the predecessor's terminator but the last call. But if the call has a possibility to unwind to an EH pad, it means it was an invoke before and it should either be the terminating instruction of a BB or be right before a branch, which is the terminator. If a call is in a middle of a BB, that means the call was not an invoke before and it unwinds to the caller if it throws.

This can fail in the case when there are unwinding instructions that are not calls, such as CLEANUPRET or other target-specific throwing instructions (e.g. Wasm's throw/rethrow/throw_ref). For example, as in the test case attached,

bb.2 (landing-pad):
  ...
  CALL @foo                           ;; We shouldn't take live-outs from here
  %0 = CONST_I32 0
  CLEANUPRET %bb.2 (unwinds to %bb3)  ;; We should take live-outs from here

bb.3 (landing-pad):
  CALL @bar, %0
  ...

When examining the live range of bb.3's %0, we should check the live-outs at CLEANUPRET, and not CALL @foo. But because of this checking routine this PR deletes, this ends up passing CLEANUPRET and %0 = CONST_I32 0 and search for %0's liveout at CALL @foo, which is incorrect and results in a validation error.

SlotIndex PEnd = LiveInts->getMBBEndIdx(Pred);
// Predecessor of landing pad live-out on last call.
if (MFI->isEHPad()) {
for (const MachineInstr &MI : llvm::reverse(*Pred)) {
if (MI.isCall()) {
PEnd = Indexes->getInstructionIndex(MI).getBoundaryIndex();
break;
}
}
}

Also I'm not sure if I understand the description about llvm.experimental.gc.statepoint intrinsic given in 61d22f2. There doesn't seem to be any special thing about that intrinsic; if it can unwind to an EH pad, it would be at the end of a BB. If it unwinds to the caller, it shouldn't be where we take the live-outs from. Both the test case statepoint-invoke-ra1.ll given in that commit, and also https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/X86/statepoint-invoke-ra.mir that replaced it in 2e1150d, seem to pass even with this PR (because it was an invoke and at the end of a BB anyway).


This also deletes windows-seh-EHa-RegisterLiveness.ll, a test case added in #76921. It looks that test case was added to track a MachineVerifier bug, which I guess may be the same thing this PR is trying to fix. After this PR, that test unexpectedly succeeds, which I think means we can delete it. (There is also another PR (#76933) that tried to fix this bug in another way, which still has not landed.)


This bug is discovered from the reproducer in #126916, even though this is not the bug reported there.

This effectively reverts
llvm@61d22f2.

llvm@61d22f2
claims that when the predecessor unwinds to an EH pad, we should take
the live-out values of not the predecessor's terminator but the last
call. But if the call has a possibility to unwind to an EH pad, it means
it was an `invoke` before and it should either be the terminating
instruction of a BB or be right before a branch, which is the
terminator. If a call is in a middle of a BB, that means the call was
not an `invoke` before and it unwinds to the caller if it throws.

This can fail in the case when there are unwinding instructions that are
not calls, such as `CLEANUPRET` or other target-specific throwing
instructions (e.g. Wasm's `throw`/`rethrow`/`throw_ref`). For example,
as in the test case attached,
```mir
bb.2 (landing-pad):
  ...
  CALL @foo
  %0 = CONST_I32 0
  CLEANUPRET %bb.3

bb.3 (landing-pad):
  call @bar, %0
  ...
```
When examining the live range of `bb.3`'s `%0`, we should check the
live-outs at `CLEANUPRET`, and not `CALL @foo`. But because of this
checking routine this PR deletes, this ends up passing `CLEANUPRET` and
`%0 = CONST_I32 0` and search for `%0`'s liveout at `CALL @foo`, which
is incorrect and results in a validation error.
```cpp
      // Predecessor of landing pad live-out on last call.
      if (MFI->isEHPad()) {
        for (const MachineInstr &MI : llvm::reverse(*Pred)) {
          if (MI.isCall()) {
            PEnd = Indexes->getInstructionIndex(MI).getBoundaryIndex();
            break;
          }
        }
      }
```

Also I'm not sure if I understand the description about
`llvm.experimental.gc.statepoint` intrinsic given in
llvm@61d22f2.
There doesn't seem to be any special thing about that intrinsic; if it
can unwind to an EH pad, it would be at the end of a BB. If it unwinds
to the caller, it shouldn't be where we take the live-outs from. Both
the test case `statepoint-invoke-ra1.ll` given in that commit, and also
https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/X86/statepoint-invoke-ra.mir
that replaced it in
llvm@2e1150d,
seem to pass even with this PR (because it was an `invoke` and at the
end of a BB anyway).

---

This also deletes `windows-seh-EHa-RegisterLiveness.ll `, a test case
added in llvm#76921. It looks that test case was added to track a
MachineVerifier bug, which I guess may be the same thing this PR is
trying to fix. After this PR, that test unexpectedly succeeds, which I
think means we can delete it. (There is also another PR (llvm#76933) that
tried to fix this bug in another way, which still has not landed.)

---

This bug is discovered from the reproducer in llvm#126916, even though this
is not the bug reported there.
@aheejin aheejin requested review from rnk and HaohaiWen March 17, 2025 05:51
@aheejin
Copy link
Member Author

aheejin commented Mar 17, 2025

I'd like to cc and ask the author of 61d22f2 but he doesn't seem to be in Github.

@llvmbot
Copy link
Member

llvmbot commented Mar 17, 2025

@llvm/pr-subscribers-backend-webassembly

@llvm/pr-subscribers-backend-x86

Author: Heejin Ahn (aheejin)

Changes

This effectively reverts 61d22f2.

61d22f2 claims that when the predecessor unwinds to an EH pad, we should take the live-out values of not the predecessor's terminator but the last call. But if the call has a possibility to unwind to an EH pad, it means it was an invoke before and it should either be the terminating instruction of a BB or be right before a branch, which is the terminator. If a call is in a middle of a BB, that means the call was not an invoke before and it unwinds to the caller if it throws.

This can fail in the case when there are unwinding instructions that are not calls, such as CLEANUPRET or other target-specific throwing instructions (e.g. Wasm's throw/rethrow/throw_ref). For example, as in the test case attached,

bb.2 (landing-pad):
  ...
  CALL @<!-- -->foo
  %0 = CONST_I32 0
  CLEANUPRET %bb.3

bb.3 (landing-pad):
  call @<!-- -->bar, %0
  ...

When examining the live range of bb.3's %0, we should check the live-outs at CLEANUPRET, and not CALL @<!-- -->foo. But because of this checking routine this PR deletes, this ends up passing CLEANUPRET and %0 = CONST_I32 0 and search for %0's liveout at CALL @<!-- -->foo, which is incorrect and results in a validation error.

      // Predecessor of landing pad live-out on last call.
      if (MFI-&gt;isEHPad()) {
        for (const MachineInstr &amp;MI : llvm::reverse(*Pred)) {
          if (MI.isCall()) {
            PEnd = Indexes-&gt;getInstructionIndex(MI).getBoundaryIndex();
            break;
          }
        }
      }

Also I'm not sure if I understand the description about llvm.experimental.gc.statepoint intrinsic given in 61d22f2. There doesn't seem to be any special thing about that intrinsic; if it can unwind to an EH pad, it would be at the end of a BB. If it unwinds to the caller, it shouldn't be where we take the live-outs from. Both the test case statepoint-invoke-ra1.ll given in that commit, and also https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/X86/statepoint-invoke-ra.mir that replaced it in 2e1150d, seem to pass even with this PR (because it was an invoke and at the end of a BB anyway).


This also deletes windows-seh-EHa-RegisterLiveness.ll, a test case added in #76921. It looks that test case was added to track a MachineVerifier bug, which I guess may be the same thing this PR is trying to fix. After this PR, that test unexpectedly succeeds, which I think means we can delete it. (There is also another PR (#76933) that tried to fix this bug in another way, which still has not landed.)


This bug is discovered from the reproducer in #126916, even though this is not the bug reported there.


Full diff: https://github.com/llvm/llvm-project/pull/131565.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineVerifier.cpp (-9)
  • (added) llvm/test/CodeGen/WebAssembly/cleanupret-live-intervals.mir (+48)
  • (removed) llvm/test/CodeGen/X86/windows-seh-EHa-RegisterLiveness.ll (-66)
diff --git a/llvm/lib/CodeGen/MachineVerifier.cpp b/llvm/lib/CodeGen/MachineVerifier.cpp
index 87d3033038414..fa05144b152c0 100644
--- a/llvm/lib/CodeGen/MachineVerifier.cpp
+++ b/llvm/lib/CodeGen/MachineVerifier.cpp
@@ -3837,15 +3837,6 @@ void MachineVerifier::verifyLiveRangeSegment(const LiveRange &LR,
     // Check that VNI is live-out of all predecessors.
     for (const MachineBasicBlock *Pred : MFI->predecessors()) {
       SlotIndex PEnd = LiveInts->getMBBEndIdx(Pred);
-      // Predecessor of landing pad live-out on last call.
-      if (MFI->isEHPad()) {
-        for (const MachineInstr &MI : llvm::reverse(*Pred)) {
-          if (MI.isCall()) {
-            PEnd = Indexes->getInstructionIndex(MI).getBoundaryIndex();
-            break;
-          }
-        }
-      }
       const VNInfo *PVNI = LR.getVNInfoBefore(PEnd);
 
       // All predecessors must have a live-out value. However for a phi
diff --git a/llvm/test/CodeGen/WebAssembly/cleanupret-live-intervals.mir b/llvm/test/CodeGen/WebAssembly/cleanupret-live-intervals.mir
new file mode 100644
index 0000000000000..197340a4773e9
--- /dev/null
+++ b/llvm/test/CodeGen/WebAssembly/cleanupret-live-intervals.mir
@@ -0,0 +1,48 @@
+# RUN: llc -mattr=+exception-handling -wasm-enable-eh -exception-model=wasm -verify-machineinstrs -run-pass=liveintervals %s
+
+# This is a regression test for a bug where MachineVerifier compute live-outs of
+# a BB not from the end of a BB but the last call instruction, which is not
+# correct when there is another unwinding instruction after the call.
+
+--- |
+  target triple = "wasm32-unknown-unknown"
+
+  declare void @foo()
+  declare void @bar(i32)
+  define void @cleanupret_live_range_test() {
+    ret void
+  }
+...
+---
+name: cleanupret_live_range_test
+liveins:
+  - { reg: '$arguments' }
+tracksRegLiveness: true
+body: |
+  bb.0:
+    successors: %bb.1, %bb.2
+    EH_LABEL <mcsymbol .Ltmp2>
+    CALL @foo, implicit-def dead $arguments, implicit $sp32, implicit $sp64
+    EH_LABEL <mcsymbol .Ltmp3>
+    BR %bb.1, implicit-def dead $arguments
+
+  bb.1:
+  ; predecessors: %bb.0
+    UNREACHABLE implicit-def dead $arguments
+
+  bb.2 (landing-pad):
+  ; predecessors: %bb.0
+    successors: %bb.3
+    EH_LABEL <mcsymbol .Ltmp4>
+    CALL @foo, implicit-def dead $arguments, implicit $sp32, implicit $sp64
+    ; Because bb.3's use of %0's def is here, the live-outs from the predecessor
+    ; bb.2 should be from the terminator (CLEANUPRET) and not the 'CALL @foo'
+    ; above.
+    %0:i32 = CONST_I32 0, implicit-def dead $arguments
+    CLEANUPRET %bb.2, implicit-def dead $arguments
+
+  bb.3 (landing-pad):
+  ; predecessors: %bb.2
+    EH_LABEL <mcsymbol .Ltmp5>
+    CALL @bar, %0:i32, implicit-def dead $arguments, implicit $sp32, implicit $sp64
+    CLEANUPRET %bb.3, implicit-def dead $arguments
diff --git a/llvm/test/CodeGen/X86/windows-seh-EHa-RegisterLiveness.ll b/llvm/test/CodeGen/X86/windows-seh-EHa-RegisterLiveness.ll
deleted file mode 100644
index ff07f4ddf0054..0000000000000
--- a/llvm/test/CodeGen/X86/windows-seh-EHa-RegisterLiveness.ll
+++ /dev/null
@@ -1,66 +0,0 @@
-; XFAIL: *
-; RUN: llc --verify-machineinstrs < %s
-source_filename = "test.cpp"
-target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
-target triple = "x86_64-pc-windows-msvc19.12.0"
-
-$"?test@Test@@Plugin@@Host@@@Z" = comdat any
-
-declare i32 @__CxxFrameHandler3(...)
-
-; Function Attrs: nounwind memory(none)
-declare void @llvm.seh.scope.begin() #1
-
-; Function Attrs: nobuiltin allocsize(0)
-declare ptr @"??2@Test@Z"(i64) #1
-
-; Function Attrs: nounwind memory(none)
-declare void @llvm.seh.scope.end() #0
-
-; Function Attrs: nobuiltin nounwind
-declare void @"??3@YAXPEAX@Z"(ptr) #2
-
-; Function Attrs: mustprogress uwtable
-define ptr @"?test@Test@@Plugin@@Host@@@Z"(ptr %this, ptr %host) #3 comdat align 2 personality ptr @__CxxFrameHandler3 {
-entry:
-  %host.addr = alloca ptr, align 8
-  %this.addr = alloca ptr, align 8
-  store ptr %host, ptr %host.addr, align 8
-  store ptr %this, ptr %this.addr, align 8
-  %this1 = load ptr, ptr %this.addr, align 8
-  %call = call noalias ptr @"??2@Test@Z"(i64 152) #5
-  invoke void @llvm.seh.scope.begin()
-          to label %invoke.cont unwind label %ehcleanup
-
-invoke.cont:                                      ; preds = %entry
-  %call3 = invoke ptr @"??Test@?A0x2749C4FD@@QEAA@Test@Test@@@Z"(ptr %call, ptr %this1)
-          to label %invoke.cont2 unwind label %ehcleanup
-
-invoke.cont2:                                     ; preds = %invoke.cont
-  invoke void @llvm.seh.scope.end()
-          to label %invoke.cont4 unwind label %ehcleanup
-
-invoke.cont4:                                     ; preds = %invoke.cont2
-  ret ptr %call
-
-ehcleanup:                                        ; preds = %invoke.cont2, %invoke.cont, %entry
-  %0 = cleanuppad within none []
-  call void @"??3@YAXPEAX@Z"(ptr %call) #6 [ "funclet"(token %0) ]
-  cleanupret from %0 unwind to caller
-}
-
-; Function Attrs: uwtable
-declare hidden ptr @"??Test@?A0x2749C4FD@@QEAA@Test@Test@@@Z"(ptr, ptr) #4 align 2
-
-attributes #0 = { nounwind memory(none) }
-attributes #1 = { nobuiltin allocsize(0) "target-cpu"="x86-64" "target-features"="+cmov,+crc32,+cx8,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" "tune-cpu"="generic" }
-attributes #2 = { nobuiltin nounwind "target-cpu"="x86-64" "target-features"="+cmov,+crc32,+cx8,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" "tune-cpu"="generic" }
-attributes #3 = { mustprogress uwtable "target-cpu"="x86-64" "target-features"="+cmov,+crc32,+cx8,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" "tune-cpu"="generic" }
-attributes #4 = { uwtable "target-cpu"="x86-64" "target-features"="+cmov,+crc32,+cx8,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" "tune-cpu"="generic" }
-attributes #5 = { builtin allocsize(0) }
-attributes #6 = { builtin nounwind }
-
-!llvm.module.flags = !{!1, !2}
-
-!1 = !{i32 2, !"eh-asynch", i32 1}
-!2 = !{i32 7, !"uwtable", i32 2}

@aheejin aheejin changed the title [WebAssembly] Fix LiveInterval verification of throwing instructions [MachineVerifier] Fix LiveInterval verification of throwing instructions Mar 17, 2025
@aheejin aheejin changed the title [MachineVerifier] Fix LiveInterval verification of throwing instructions [MachineVerifier] Fix LiveInterval verification of unwinding instructions Mar 17, 2025
@aheejin
Copy link
Member Author

aheejin commented Mar 20, 2025

Gentle ping 😀

@aheejin
Copy link
Member Author

aheejin commented Mar 26, 2025

@rnk Can you possibly take a look?

@aheejin
Copy link
Member Author

aheejin commented Apr 1, 2025

Ping 🥲

Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a valuable verifier check for standard, non-async-EH functions. Can we preserve the some of the value here by checking specifically for non-scoped EH personality functions?

@aheejin
Copy link
Member Author

aheejin commented Apr 3, 2025

What cases does this check cover? I thought the below was the case; please let me know if this is not correct.

But if the call has a possibility to unwind to an EH pad, it means it was an invoke before and it should either be the terminating instruction of a BB or be right before a branch, which is the terminator. If a call is in a middle of a BB, that means the call was not an invoke before and it unwinds to the caller if it throws.

@rnk
Copy link
Collaborator

rnk commented Apr 3, 2025

Well, this verifier check should fire on an analogous, non-wasm test case, where a virtual register is defined after a potentially throwing call in an Itanium landingpad predecessor. In such a case, the value will be computed and the register value defined after the call, but if the call throws and we try to read the value from a landingpad, bad things will happen.

Our MIR representation is somewhat deficient because we don't have an equivalent of the LLVM IR invoke instruction, and this verifier check is making sure that CodeGen passes don't try to insert PHI-elimination code (and eventually possibly register fills) between the call, the terminating jump, and the landingpad.

I agree we need to relax the check for wasm and Windows asynch EH, because, as your test case points out, we no longer have the invariant that all landingpad predecessor blocks end in a potentially throwing call. They may end in seh.try.end (non-call label) or CLEANUPRET or some other runtime-mediated control transfer instruction.

@aheejin
Copy link
Member Author

aheejin commented Apr 7, 2025

Well, this verifier check should fire on an analogous, non-wasm test case, where a virtual register is defined after a potentially throwing call in an Itanium landingpad predecessor. In such a case, the value will be computed and the register value defined after the call, but if the call throws and we try to read the value from a landingpad, bad things will happen.

Our MIR representation is somewhat deficient because we don't have an equivalent of the LLVM IR invoke instruction, and this verifier check is making sure that CodeGen passes don't try to insert PHI-elimination code (and eventually possibly register fills) between the call, the terminating jump, and the landingpad.

In theory this can happen in Wasm too. It would be nice if we can query something like MI.canThrow() can replace MI.isCall() with that. Do you know why there is no Throws or something in this list of MachineInstr properties?

enum Flag {
PreISelOpcode = 0,
Variadic,
HasOptionalDef,
Pseudo,
Meta,
Return,
EHScopeReturn,
Call,
Barrier,
Terminator,
Branch,
IndirectBranch,
Compare,
MoveImm,
MoveReg,
Bitcast,
Select,
DelaySlot,
FoldableAsLoad,
MayLoad,
MayStore,
MayRaiseFPException,
Predicable,
NotDuplicable,
UnmodeledSideEffects,
Commutable,
ConvertibleTo3Addr,
UsesCustomInserter,
HasPostISelHook,
Rematerializable,
CheapAsAMove,
ExtraSrcRegAllocReq,
ExtraDefRegAllocReq,
RegSequence,
ExtractSubreg,
InsertSubreg,
Convergent,
Add,
Trap,
VariadicOpsAreDefs,
Authenticated,
};

It would be nice if we can add one, but this probably is more work than it's worth, given that I don't have info on the complete set of all target instruction that can throw.

I agree we need to relax the check for wasm and Windows asynch EH, because, as your test case points out, we no longer have the invariant that all landingpad predecessor blocks end in a potentially throwing call. They may end in seh.try.end (non-call label) or CLEANUPRET or some other runtime-mediated control transfer instruction.

If we can't do MI.canThrow() or something, we can probably not run the check here in case of Wasm/WinEH, even though this feels like a hacky solution, because we can have assignments (which may have been generated by PHIs, as you suggested) after a call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants