Skip to content

Optimize u8x8::trailing_zeros for AArch64 #193

Open
@TheIronBorn

Description

@TheIronBorn

LLVM's cttz.v8i8 intrinsic is broken on AArch64 machines: #191

Our current workaround just applies u8::trailing_zeros to each lane. With 8 lanes, that can be quite slow.

It could be optimized by adapting LLVM's algorithm to Rust's AArch64 SIMD intrinsics (some may be missing and we would have to implement those as well: rust-lang/stdarch#40).

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-AArch64ARM 64-bit architectureBlocked-LLVMBugs blocked on bugfixes in LLVMPerformanceSomething isn't fast

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions