Skip to content

performance evaluation using matmul.py #2130

@zjin-lcf

Description

@zjin-lcf

Hello

Running the matmul.py (https://github.com/spcl/dace/tree/main/samples/optimization) with the FP32 data types on an H100 GPU shows the following performance result. The GEMM performance seems very low.
Do users need to add additional settings to run the script for evaluating the performance of GEMM properly ? Thank you for your instructions.

Matrix multiplication 8192x8192x8192 (version: optimize_gpu)
1.970089 iter/s, 101.518 sec, 2.2 TFLOPS
Difference: 2.9815700486324204e-07

The related code is

        for _ in range(num_iter):   # num_iter = 200
             sdfg(A=A, B=B, C=C, M=m, N=n, K=k)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions