Open
Description
simpleTensorCoreGEMM has errors in output(beyond the additive tolerance of 1e-5 and multiplicative tol of 1.01) when compiled with CUDA10 for Turing GPU (arch=sm_70, RTX 2080Ti)
I did not modify any datatypes in the run and both the wmma based explicit GEMM implementation and the cuBlasGemmEx call use the Tensorcores.
I am wondering what might be causing the errors beyond the specified tolerance limits?
Metadata
Metadata
Assignees
Labels
No labels