-
Notifications
You must be signed in to change notification settings - Fork 421
Open
Labels
Description
It turns out the post-hoc rounding we are doing is not good enough for strict determinism.
There is a way to do implement exact determinisitic summation in paralllel on GPUs, but it is somewhat complicated. The key idea is to round the floating point summands into fixed-point superaccumulator bins, perform a fixed-point summation (using integer atomics), and then round back into floating point. This can handle the whole range of floating point numbers with exact reproducibility, but reduces precision compared to using atomic floating point summation. (The precision is determined by the number of superaccumulator bins.)
References: