Does anyone know the correct way to define fma for BFloat16? Based on my understanding of the double-rounding theorems, I think it should be correct to simply cast to Float32 and back:
@inline Base.fma(x::BFloat16, y::BFloat16, z::BFloat16) =
BFloat16(fma(Float32(x), Float32(y), Float32(z)))
But I've observed that this sometimes returns different results from BFloat16(Float32(x) * Float32(y) + Float32(z)), which puzzles me because I think they ought to be equivalent in round-to-nearest-even (provided that Float32 has more than twice plus two the precision of BFloat16, which is true).
Does anyone know the correct way to define
fmaforBFloat16? Based on my understanding of the double-rounding theorems, I think it should be correct to simply cast toFloat32and back:But I've observed that this sometimes returns different results from
BFloat16(Float32(x) * Float32(y) + Float32(z)), which puzzles me because I think they ought to be equivalent in round-to-nearest-even (provided thatFloat32has more than twice plus two the precision ofBFloat16, which is true).