Skip to content

Commit 21e1d56

Browse files
committed
Avoid bit errors in simdEqualCoeffComputer_neon
The existing unit tests and Neon implementation assumes that the equalCoeffComputer implementation is only called with 10-bit data, however running real encodings show that values up to 13 bits are not uncommon. This difference shows up in real encodings where the resulting video is not bit-exact compared to a scalar or x86-encoded reference when building with -DVVENC_FPP_CONTRACT_OFF=On. This commit revises the existing Neon implementation to be more conservative about the ranges of input values and adjusts the unit tests to test 13 bits of input. Also simplify a couple of things that don't make a meaningful difference to overall performance: * The innerloop logic to reduce the number of 32 to 64-bit widening steps is now much less relevant, so removed. * Introduce a new vmlal_s32_x2 helper function to simplify code where we need two instructions to multiply and widen to 64 bits. * Stop halving cx/cy since we no longer need to worry about the bit-width of the intermediate sums. This also simplifies the final shifts. This slows down the Neon implementation of equalCoeffComputer by about 20%, however this is still ~3x the speed of the SIMDe-translated x86 implementation when running on an AArch64 Neoverse V2 machine with LLVM 20. Change-Id: I5d323ed2ddfc9b4809e98c507a6446d5588c5d1e
1 parent d67a7f5 commit 21e1d56

File tree

2 files changed

+258
-371
lines changed

2 files changed

+258
-371
lines changed

0 commit comments

Comments
 (0)