I am getting very good performance when using this library for SHA1 with 16 streams, but I am not very clear on whether using this library is the best choice when dealing with a smaller number of buffers. If possible, I want to use SSE4.2/AVX2 for some low priority computations that don't need the full 16 stream width, and leave the AVX512 hardware available for higher priority workloads.
The code in multibinary.asm seems to be selecting the instruction set based on the CPU capabilities.
Is using AVX512 for 4 buffers an overkill/wasteful?
I am thinking of running my own benchmarks for different widths and implementing my own selection logic, but the API in sha1_mb.h doesn't expose the power to choose the instruction set. What is the best approach to move forward?
I am getting very good performance when using this library for SHA1 with 16 streams, but I am not very clear on whether using this library is the best choice when dealing with a smaller number of buffers. If possible, I want to use SSE4.2/AVX2 for some low priority computations that don't need the full 16 stream width, and leave the AVX512 hardware available for higher priority workloads.
The code in multibinary.asm seems to be selecting the instruction set based on the CPU capabilities.
Is using AVX512 for 4 buffers an overkill/wasteful?
I am thinking of running my own benchmarks for different widths and implementing my own selection logic, but the API in sha1_mb.h doesn't expose the power to choose the instruction set. What is the best approach to move forward?