Commit 21e1d56

committed

Avoid bit errors in simdEqualCoeffComputer_neon

The existing unit tests and Neon implementation assumes that the equalCoeffComputer implementation is only called with 10-bit data, however running real encodings show that values up to 13 bits are not uncommon. This difference shows up in real encodings where the resulting video is not bit-exact compared to a scalar or x86-encoded reference when building with -DVVENC_FPP_CONTRACT_OFF=On. This commit revises the existing Neon implementation to be more conservative about the ranges of input values and adjusts the unit tests to test 13 bits of input. Also simplify a couple of things that don't make a meaningful difference to overall performance: * The innerloop logic to reduce the number of 32 to 64-bit widening steps is now much less relevant, so removed. * Introduce a new vmlal_s32_x2 helper function to simplify code where we need two instructions to multiply and widen to 64 bits. * Stop halving cx/cy since we no longer need to worry about the bit-width of the intermediate sums. This also simplifies the final shifts. This slows down the Neon implementation of equalCoeffComputer by about 20%, however this is still ~3x the speed of the SIMDe-translated x86 implementation when running on an AArch64 Neoverse V2 machine with LLVM 20. Change-Id: I5d323ed2ddfc9b4809e98c507a6446d5588c5d1e

1 parent d67a7f5 commit 21e1d56Copy full SHA for 21e1d56

2 files changed

+258

-371

lines changed

source/Lib/CommonLib/arm/neon
- AffineGradientSearch_neon.cpp
test/vvenc_unit_test
- vvenc_unit_test.cpp

2 files changed

+258

-371

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 21e1d56

2 files changed

2 files changed

File tree

2 files changed

2 files changed

0 commit comments