I am trying to build for aarch64 using `make CFLAGS="-mcpu=neoverse-n1 -O3" CXXFLAGS="-mcpu=neoverse-n1 -O3"` The build works but performance is much worse on the target system. (the resulting .so file is also ~1MB instead of ~5MB in my case)