Description
Is your feature request related to a problem? Please describe.
To get kernel performance matching clang
we have had to add fast-math flags such as contract
(which clang
and nvcc
do by default). Currently, we do this by an ugly-hack, see for example
Lines 21 to 57 in bb37b50
Describe the solution you'd like
I would like a macro like @fastmath
that had fine-grained control over the fast-math flags.
Describe alternatives you've considered
KernelAbstractions used to do this with https://github.com/JuliaLabs/Cassette.jl and other people use macros (although it opens up less optimization and thus not desired) https://github.com/JuliaLabs/Cassette.jl. I don't know if https://github.com/JuliaDebug/CassetteOverlay.jl can be used with kernels but it might be a possible way to implement this.
It would be nice if this functionality eventually got added to base julia.