performance evaluation using matmul.py

Hello


Running the matmul.py (https://github.com/spcl/dace/tree/main/samples/optimization) with the FP32 data types on an H100 GPU shows the following performance result. The GEMM performance seems very low. 
Do users need to add additional settings to run the script for evaluating the performance of GEMM properly ?   Thank you for your instructions.


Matrix multiplication 8192x8192x8192 (version: optimize_gpu)
1.970089 iter/s, 101.518 sec, 2.2 TFLOPS
Difference: 2.9815700486324204e-07

The related code is
```
        for _ in range(num_iter):   # num_iter = 200
             sdfg(A=A, B=B, C=C, M=m, N=n, K=k)
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

performance evaluation using matmul.py #2130

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

performance evaluation using matmul.py #2130

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions