Commit 2f25b8a
authored
Refactor AArch64 Interpolation Filter 16x16 implementation (#431)
* Move InterpolationFilter{ARM.h => _neon.cpp}
Since this header is only used in one place and would not share any code
with an eventual SVE implementation, simply move it to a .cpp file
similar to MCTF.cpp.
* Refactor simdFilter16xX_N8_neon
The use of the vsrcv temporary array rather than simple local variables
meant that LLVM emitted an unnecessary number of load/store instructions
in the inner loops. Refactoring this to make the dependency between loop
iterations more explicit allows for much nicer generated code.
Running a video encoding job on a Neoverse V2 machine using the
--preset=fast setting shows a ~1.8% improvement in reported FPS.1 parent 7acfaba commit 2f25b8a
File tree
2 files changed
+287
-300
lines changed- source/Lib/CommonLib/arm
- neon
2 files changed
+287
-300
lines changedThis file was deleted.
0 commit comments