This draws on Accelerate but is a C++ first api that uses clang vector extensions for SSE/AVX/Neon with an emphasis on arm64 and AVX2.   All alignment is 16B.  Just thought you might add this to your list of libs, and I'll return the favor.  Would love to check out if simde has trancendentals/trig calls.
https://github.com/alecazam/kram/tree/main/libkram/vectormath