Open
Description
Because of different alignment requirements and VEX encoding/VZEROUPPER woes, and because it would be so low-level and pervasive (we cannot gate every operation on R3Element behind a CPUID test) we need a separate DLL for that, with an intermediate layer that chooses which one to use based on CPUID.
We could generate the intermediate layer from journal.proto, and LoadLibrary/GetProcAddress etc.
This blocks FMA usage on Linux & macOS, see #3010 (comment).