Investigate performance of integer comparisons for mask reductions

Currently, all / any mask vector reductions are implemented as:

* `x86`/`x86_64`:
   * `mmx`: using `_mm_movemask_pi8` 
    * `sse2`: using `_mm_movemask_epi8`
    * `avx`: using `_mm256_testc_si256`, `_mm256_testz_si256`
* `arm`/`aarch64`:
    * `neon`: using `vpmin`, `vpmax`
* other architectures call the `llvm.experimental.vector.reduce.{and, or}` intrinsics.

It might be potentially much better to just cast all vectors <= 128-bit wide to their corresponding integer and do a `!= 0` for `any`, and a `== iX::max_value()` for the `all` reduction. 

For 256-bit and 512-bit wide vectors we could cast them into an `[u128; N]` array and perform the integer comparisons against that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigate performance of integer comparisons for mask reductions #157

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigate performance of integer comparisons for mask reductions #157

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions