-
Notifications
You must be signed in to change notification settings - Fork 145
Open
Description
Hello
Running the matmul.py (https://github.com/spcl/dace/tree/main/samples/optimization) with the FP32 data types on an H100 GPU shows the following performance result. The GEMM performance seems very low.
Do users need to add additional settings to run the script for evaluating the performance of GEMM properly ? Thank you for your instructions.
Matrix multiplication 8192x8192x8192 (version: optimize_gpu)
1.970089 iter/s, 101.518 sec, 2.2 TFLOPS
Difference: 2.9815700486324204e-07
The related code is
for _ in range(num_iter): # num_iter = 200
sdfg(A=A, B=B, C=C, M=m, N=n, K=k)
Metadata
Metadata
Assignees
Labels
No labels