Open
Description
LLVM's cttz.v8i8
intrinsic is broken on AArch64 machines: #191
Our current workaround just applies u8::trailing_zeros
to each lane. With 8 lanes, that can be quite slow.
It could be optimized by adapting LLVM's algorithm to Rust's AArch64 SIMD intrinsics (some may be missing and we would have to implement those as well: rust-lang/stdarch#40).