Cranelift: Add a vcode peephole pass to the mach backend framework

(Fleshing out and jotting down for posterity an idea I floated and we collectively refined at a recent Cranelift meeting.)

This peephole pass could run after regalloc, just before encoding instructions. It wouldn't need to be a whole new traversal over the IR, it could run as we walk the vcode for encoding, essentially implemented as an iterator combinator.

This peephole pass would *not* be DFG-based. It would be a classic sliding-window-of-instructions peephole pass. This both allows us to avoid constructing a more-complete DFG for vcode like what we have for clif, which could be expensive, and it allows us to benefit from opportunities that arise only because of instruction scheduling of two different sequences that are otherwise disconnected in the DFG.

Put another way: if you want DFG-y rewrites, use the mid-end for target-agnostic rules and the clif-to-vcode instruction selection lowering for target-specific rules. We have good solutions for that kind of thing. If however you want non-DFG-y rewrites, we currently have nothing and this issue is proposing we add something for the target-specific dimension, which is where I suspect most of the opportunity exists.

Implementation idea brain dump:

* Treat finding matches as a a string-matching algorithm.
  * For simplicity, and to cut down on the size of tables, ignore operands and only consider opcodes.
  * So we treat the vcode as a string where the alphabet is its opcodes
* Take all the left-hand sides of our rewrites and treat them as substrings in that same alphabet
* Do an online version of aho-corasick or similar to find all substring matches in the vcode, without needing to repeat work for each index in the vcode or for each rewrite rule
  * On each substring match
    * run additional predicates to capture preconditions in the associated pattern(s)'s left-hand side that are not accounted for in the string match (i.e. that one of the operands is zero)
    * if those additional predicates are satisfied, replace the matched sequence with the rule's right-hand side
* As far as ISLE integration goes:
  * I don't think we could do this with just custom external multi extractors, since we need to process all the rules to get the substring patterns to create the aho-corasick automaton
  * This means we would need to add a new kind of mode/backend to ISLE or something?
  * We could, I guess, just accept that we are doing redundant work at each index in the vcode "string" and not do the full aho-corasick thing.
    * We would still get the trie-y bits via `islec`
    * But we wouldn't be able to avoid repeatedly looking at the same instructions when we call into ISLE at `vcode[i]` that we've already looked at when we called into ISLE from `vcode[i - 1]`.
    * pretty sure we could implement this less-ambitious version with ISLE today
    * but this is also way less cool

Between integrating with an existing traversal over the vcode, not constructing a DFG, and the string-matching sketch just above, this new peephole pass should impose extremely little overhead on compile times.

A few concrete example peephole optimizations we could do, I'm sure we can think of more and will do so if we put this infra in place:

* turn adjacent pairs of loads/stores into `ldp`/`stp` on aarch64
  * similarly: turn a pair of adjacent u32 loads into an unaligned u64 load and split on targets where the cost of an unaligned u64 load is less than the cost of two u32 loads (I think x64, at least)
* sink spills/reloads introduced by regalloc into the actual `add`s or whatever that produces/consumes the value on x64
* on x64 (and others?) avoid unnecessary `cmp` and `test` instructions when we know that the last instruction in the stream happened to set identical flags (eg `and rax, r10; test rax, r10` or `sub rdi, rsi; cmp rdi, rsi`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cranelift: Add a vcode peephole pass to the mach backend framework #8520

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cranelift: Add a vcode peephole pass to the mach backend framework #8520

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions