Open
Description
Currently, all / any mask vector reductions are implemented as:
x86
/x86_64
:mmx
: using_mm_movemask_pi8
sse2
: using_mm_movemask_epi8
avx
: using_mm256_testc_si256
,_mm256_testz_si256
arm
/aarch64
:neon
: usingvpmin
,vpmax
- other architectures call the
llvm.experimental.vector.reduce.{and, or}
intrinsics.
It might be potentially much better to just cast all vectors <= 128-bit wide to their corresponding integer and do a != 0
for any
, and a == iX::max_value()
for the all
reduction.
For 256-bit and 512-bit wide vectors we could cast them into an [u128; N]
array and perform the integer comparisons against that.