Commit 21e1d56
committed
Avoid bit errors in simdEqualCoeffComputer_neon
The existing unit tests and Neon implementation assumes that the
equalCoeffComputer implementation is only called with 10-bit data,
however running real encodings show that values up to 13 bits are not
uncommon. This difference shows up in real encodings where the resulting
video is not bit-exact compared to a scalar or x86-encoded reference
when building with -DVVENC_FPP_CONTRACT_OFF=On.
This commit revises the existing Neon implementation to be more
conservative about the ranges of input values and adjusts the unit tests
to test 13 bits of input.
Also simplify a couple of things that don't make a meaningful difference
to overall performance:
* The innerloop logic to reduce the number of 32 to 64-bit widening
steps is now much less relevant, so removed.
* Introduce a new vmlal_s32_x2 helper function to simplify code where we
need two instructions to multiply and widen to 64 bits.
* Stop halving cx/cy since we no longer need to worry about the
bit-width of the intermediate sums. This also simplifies the final
shifts.
This slows down the Neon implementation of equalCoeffComputer by about
20%, however this is still ~3x the speed of the SIMDe-translated x86
implementation when running on an AArch64 Neoverse V2 machine with LLVM
20.
Change-Id: I5d323ed2ddfc9b4809e98c507a6446d5588c5d1e1 parent d67a7f5 commit 21e1d56
File tree
2 files changed
+258
-371
lines changed- source/Lib/CommonLib/arm/neon
- test/vvenc_unit_test
2 files changed
+258
-371
lines changed
0 commit comments