Description
Consider the following code, which matches on a 64-element array:
type T = [u8; 64];
const X: T = [0x06, 0x04, 0x88, 0x37, 0x89, 0x52, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
pub fn f(x: &T) -> bool {
matches!(x, &X)
}
When built with rustc --crate-type=lib t.rs --emit=llvm-ir
this emits 64 basic blocks, each doing a comparison. At -O
, the optimizer is able to reduce it to 64 single-byte loads, which are OR'd together. Finally, at -C opt-level=3
, the optimizer is able to put the whole thing back together into a single icmp ne <64 x i8>
.
The large number of basic blocks are visible in the MIR, which leads me to believe that all the optimizations here are being accomplished by LLVM.
Generating this large number of basic blocks is inefficient, and puts a lot of unnecessary pressure on LLVM to optimize things. It would be much better if either an integer comparison, or even a call to memcmp
, was emitted, as it could simple be lowered as LLVM wished. This is how normal, non-match, comparisons with arrays of primitives are handled (see https://github.com/rust-lang/rust/blob/master/library/core/src/array/equality.rs#L146-L152 and https://github.com/rust-lang/rust/blob/master/compiler/rustc_codegen_llvm/src/intrinsic.rs#L298-L336).
This whole thing is extracted and minimized from https://github.com/alex/rust-asn1/blob/main/src/object_identifier.rs#L5-L24, which is used in many match statements in https://github.com/pyca/cryptography/tree/main/src/rust