-
Notifications
You must be signed in to change notification settings - Fork 23
Closed
Description
Nice tool, thanks for making it public!
Just ran it on my 8x H100 system from your container ghcr.io/huggingface/gpu-fryer:latest, getting similar results of around 51500 GFlop/s. From both the performance number and nvidia-smi dmon --gpm-metrics 5 I see that tensor cores were not used at all in that run. Since your README has ~51 TFLOP/s I think this (no tensor cores) is also the case for your runs.
This also means GPU is not used in max power mode, my H100s were all in 580 - 650W range throughout the run, out of 700W max power. Suggest making sure tensor cores are used not for the sake of a better performance number, but to ensure the GPUs are tested in the max power mode and potentially discover more issues
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels