Fine-grained fast-math flags

**Is your feature request related to a problem? Please describe.**

To get kernel performance matching `clang` we have had to add [fast-math flags](https://llvm.org/docs/LangRef.html#fast-math-flags) such as `contract` (which `clang` and `nvcc` do by default).  Currently, we do this by an ugly-hack, see for example https://github.com/JuliaGPU/CUDA.jl/blob/bb37b50006295833d5396d1c7b330eec55b408e4/perf/volumerhs.jl#L21-L57

**Describe the solution you'd like**

I would like a macro like `@fastmath` that had fine-grained control over the fast-math flags.

**Describe alternatives you've considered**

KernelAbstractions used to do this with https://github.com/JuliaLabs/Cassette.jl and other people use macros (although it opens up less optimization and thus not desired)  https://github.com/JuliaLabs/Cassette.jl.  I don't know if https://github.com/JuliaDebug/CassetteOverlay.jl can be used with kernels but it might be a possible way to implement this.

It would be nice if this functionality eventually got added to base julia.


	# HACK: module-local versions of core arithmetic; needed to get FMA
	for (jlf, f) in zip((:+, :*, :-), (:add, :mul, :sub))
	for (T, llvmT) in ((:Float32, "float"), (:Float64, "double"))
	ir = """
	%x = f$f contract nsz $llvmT %0, %1
	ret $llvmT %x
	"""
	@eval begin
	# the @pure is necessary so that we can constant propagate.
	@inline Base.@pure function $jlf(a::$T, b::$T)
	Base.llvmcall($ir, $T, Tuple{$T, $T}, a, b)
	end
	end
	end
	@eval function $jlf(args...)
	Base.$jlf(args...)
	end
	end

	let (jlf, f) = (:div_arcp, :div)
	for (T, llvmT) in ((:Float32, "float"), (:Float64, "double"))
	ir = """
	%x = f$f fast $llvmT %0, %1
	ret $llvmT %x
	"""
	@eval begin
	# the @pure is necessary so that we can constant propagate.
	@inline Base.@pure function $jlf(a::$T, b::$T)
	Base.llvmcall($ir, $T, Tuple{$T, $T}, a, b)
	end
	end
	end
	@eval function $jlf(args...)
	Base.$jlf(args...)
	end
	end
	rcp(x) = div_arcp(one(x), x) # still leads to rcp.rn which is also a function call

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-grained fast-math flags #1991

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fine-grained fast-math flags #1991

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions