This repository contains code to reproduce performance numbers for the Intel Gaudi.
Specifically, code is available to measure the throughput for matrix multiplication (BF16 and FP8) and the prefill stage of Llama models.
In addition, we also provide code for users to reproduce throughput numbers for NVIDIA GPUs such as the A100 and the H100. However, setting up the necessary development environments is left to the user.
Visit https://github.com/NAVER-INTEL-Co-Lab/gaudi-cresset for detailed setup instructions.
- Run
make envto create a.envfile. This need only be done once per directory. - Run
make buildto build the Docker image and start the container. Run this command when you wish to rebuild the Docker image. - Run
make execto enter an existing Docker container.
For instructions on matrix multiplication throughput measurements,
visit the matmul directory. Commands are described in their respective files.
To measure prefill throughput for Llama models, visit the prefill directory.
Single-node training throughput is available in the train directory.