We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
bwd of GEMM is two GEMMs, but I wonder if I need to take some special care of the range of gradients?