Description
.clif
Test Case
The simplest case I've seen came from cranelift-fuzzgen in #4875, where the relevant CLIF produced by cranelift-frontend included these two lines:
v257 = rotl v255, v256 ; v256 = 0
v255 -> v257
Steps to Reproduce
I believe the simplest way to trigger this bug is:
- Use cranelift-frontend to construct SSA from non-SSA
- Create a non-entry block and do the remaining steps in that block
- Declare a new variable
- Add an instruction which uses the value returned by
use_var
on that variable - Define the variable using the result of the instruction
- End the block with a branch to itself
- Seal the block
It also happens if there's a cycle of multiple blocks that each have a single predecessor.
I think it also can happen if there are multiple predecessors for some blocks in the cycle, as long as the only definition which reaches the instruction comes from the same instruction.
We've also seen it happen if there is a dependency cycle across multiple instructions, where each instruction defines a variable used by the next instruction in the cycle. It's not limited to the case where an instruction depends directly on its own results.
Expected Results
CLIF in SSA form.
Actual Results
SSABuilder::finish_predecessors_lookup
sees that the variable does have a definition in the cycle, so it doesn't insert a constant zero. It also sees that the variable doesn't have any other definitions reaching the use, so it deletes the phi node and changes the original use_var
result to an alias for the value given to def_var
. That leads to a cycle through value aliases, and the resulting CLIF is not in SSA form.
I believe the Cranelift validator has run on the cases we've seen, without detecting this issue.
In #5020 this cycle was detected by the simple_preopt
pass. There was a series of shifts and rotates where the shift amount was 0 in each case, so the pass tried to replace each result with an alias to the instruction's first argument. However once it had replaced all of them, the alias pointed to itself.
The other way we've detected this bug is when we print the generated CLIF to a file, and trying to parse that file again fails. If no instruction in the cycle constrains the variable's type because they're all polymorphic (as e.g. rotr
and ishl
are), the parser fails with a message like "type variable required for polymorphic opcode, e.g. 'rotl.i32'; can't infer from v255 which is not yet defined".
Versions and Environment
Cranelift version or commit: on main
for at least the last month, as observed in these issues:
- cranelift-fuzzgen: Difference in interpreter and x64 execution #4875 (comment)
- fuzzgen: Generate compiler flags #5020
Operating system: observed on Linux (and Windows I think?)
Architecture: observed on aarch64 and x64
Extra Info
I'm not sure how to fix this yet, but I want to thank @afonso360 for reminding me to dig into why it's happening. @cfallin, any thoughts?