-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
1a. to get gcc to do the same as -fp-model=strict, one uses
-ffp-contract=off.
1b. to get cuda to do the same, one adds --fmad=false to cuda_args in
nvcc_wrapper.
- in addition, team reductions and scans will still diff. hommexx serializes
these with a wrapper. however, these account for too much of the code in p3, so
we can't use this route w/o losing tons of testing of important ||ism.
thus, for gpu tests, we should do non-bfb testing and set an appropriate tol. we
can augment the tolerance specification in run_and_cmp with something like
if (OnGpu<Kokkos::DefaultExecutionSpace>::value)
tol = 10*std::numeric_limits<Real>::epsilon();
we can still use 1a and 1b to get the diffs down to just the reduction/scan
ones.
Metadata
Metadata
Assignees
Labels
No labels