Description
Feature
An ISLE pattern like (iadd (pattern_a) (pattern_b))
should also match if (iadd (pattern_b) (pattern_a))
would match, because iadd
is a commutative operation.
Benefit
We currently have many rules, both in mid-end optimization and in backend lowering, where we have to write out all possible rules that are equivalent up to commutativity. If there are N commutative patterns in a rule's left-hand side, that requires writing up to 2^N rules. This is tedious and error-prone.
It's worthwhile having ISLE generate code for each distinct form of these rules because then the ISLE compiler can build an efficient pattern-matching tree that evaluates each part of the pattern as few times as possible.
In the mid-end optimization rules, we've considered rewriting commutative instructions to their flipped equivalents, so that all commutative alternatives are available in the same e-class (see #5546). But it's cheaper to have more ISLE rules than to double the number of instructions in e-classes, the GVN map, and the dataflow graph. And this doesn't help backends because the egraph cost function can pick either version, so there's no guarantee the backend will see the operands in the order it wants (see #5709).
If we can declare once that a term like iadd
is commutative and have ISLE generate all the equivalent rules, then we can get the performance advantages of an optimized pattern-matching tree while writing a minimum set of rules.
Challenges
These commutative alternative rules often overlap with each other. If the sub-patterns overlap each other, then swapping them will produce a rule which overlaps with the original. The simplest example is that (iadd x y)
overlaps with (iadd y x)
since x
and y
are wildcard patterns which only bind variables.
In some cases, swapping the operands produces exactly the same rule. For example, (iadd x x)
only matches if both operands are the same SSA value or e-class, and matches exactly the same inputs when flipped. In these cases, generating a duplicate lowering rule will lead to either an error during overlap checking or an assertion failure during code generation, depending on where we expand the rule. Generating a duplicate egraph rule will produce correct results but do more work than necessary.
In other cases, swapping the operands matches the same inputs when flipped but the right-hand side could evaluate to different results depending on which version of the rule is selected. A rule which lowers (iadd x y)
to (x64_add x y)
would, when flipped, lower to (x64_add y x)
instead. The result should be equivalent but is not the same machine instruction. We need to ensure that we choose one of these alternatives deterministically.
Implementation
We've discussed several alternative implementation plans, and I'm probably missing some. None have yet emerged as the "right" answer.
One idea is a new flag when declaring a term. (decl commutative iadd (Value Value) Inst)
would mean that when the term is used as an extractor, then implicitly two versions of the enclosing rule are generated, with the subpatterns swapped.
Another approach is to add some kind of support for or-patterns to the ISLE language. For example, if we allow a term to be defined by multiple internal extractors, then we might create a helper term for matching 2-element value arrays in commutative instructions:
(decl value_array_commutative (Value Value) ValueArray2)
(extractor (value_array_commutative x y) (value_array_2 x y))
(extractor (value_array_commutative y x) (value_array_2 x y))
An alternative syntax is to add a special (or ...)
pattern combinator, as proposed by @Kmeakin in https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/Commutative.20rewrites/near/344151148:
(decl bor (Type Value Value) Value)
(extractor
(bor ty x y)
(or
(inst_data ty (InstructionData.Binary (Opcode.Bor) (value_array_2 x y)))
(inst_data ty (InstructionData.Binary (Opcode.Bor) (value_array_2 y x))))
)
I prefer these versions over the commutative
flag because it allows us to be explicit about how to match the alternatives. For example, we could write explicit alternatives for icmp
which reverse the condition code if matching the operands flipped (hand-waving that intcc_reverse
is currently a constructor, but would need to be an extractor):
(decl icmp (IntCC Value Value) Inst)
(extractor
(icmp Cond x y)
(inst_data (InstructionData.IntCompare (Opcode.Icmp) (value_array_2 x y) Cond))
)
(extractor
(icmp Cond x y)
(inst_data (InstructionData.IntCompare (Opcode.Icmp) (value_array_2 y x) (intcc_reverse Cond)))
)
We might add priorities to internal extractors to declare how they should interact with overlap checking. But that leaves priorities ambiguous in a pattern like (isub (iadd a b) (iadd c d))
: swapping a
with b
would apparently have the same priority as swapping c
with d
.
It might also be useful to declare some constructors as commutative: If we've produced two versions of a rule with left-hand patterns swapped due to commutativity, and find that they match the same inputs and produce the same right-hand side up to constructor commutativity, then we don't need the duplicate rule. Currently we can just avoid writing such rules but if all uses of iadd
match an or-pattern, then ISLE needs to figure out how to avoid unnecessary work.
As an extension, it would be nice if ISLE could emit Rust match
or-patterns to group match arms which bind the same variables in different orders or from different enum variants. I suspect that optimization would never be relevant for the commutative-operator use case though: if we can tell the right-hand side is the same, I think that always means that the left-hand patterns are equivalent and the rules are just duplicates. I have other uses in mind for or-patterns which might benefit from this optimization though.
A bunch of people have expressed interest in this topic: @cfallin @fitzgen @elliottt @afonso360 @alexcrichton