AVX512 and some AArch64 microarchitectures have instructions for calculating `count_ones` and `trailing/leading_zeros`. We should use `assert_instr` tests to verify that the compiler is using those instructions when appropriate.