Skip to content

Commit f8e35c9

Browse files
authored
Refactor: Organize training configs by GPU architecture (MI300 / MI355) (#240)
1 parent 918e97d commit f8e35c9

File tree

61 files changed

+1986
-54
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+1986
-54
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ Primus leverages AMD’s ROCm Docker images to provide a consistent, ready-to-ru
5656

5757
```bash
5858
cd Primus && pip install -r requirements.txt
59-
EXP=examples/megatron/configs/llama2_7B-pretrain.yaml bash ./examples/run_local_pretrain.sh
59+
EXP=examples/megatron/configs/MI300X/llama2_7B-pretrain.yaml bash ./examples/run_local_pretrain.sh
6060
6161
```
6262

benchmark/megatron/checkpoint/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ example:
4343
```
4444
export DATA_PATH=/PATH/TO/DATA
4545
python3 benchmark/megatron/checkpoint/ckpt_launch.py \
46-
--yaml-config-path examples/megatron/configs/mixtral_8x7B_v0.1-pretrain.yaml \
46+
--yaml-config-path examples/megatron/configs/MI300X/mixtral_8x7B_v0.1-pretrain.yaml \
4747
--nnodes 1
4848
```
4949
If you need to benchmark multiple different models, parallel strategies, and checkpoint modes,

benchmark/megatron/model/benchmark_model.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
import subprocess
1010
from pathlib import Path
1111

12-
CONFIG_DIR = Path("examples/megatron/configs")
12+
CONFIG_DIR = Path("examples/megatron/configs/MI300X")
1313

1414

1515
def find_all_model_configs():
@@ -45,7 +45,7 @@ def main():
4545
type=str,
4646
help=(
4747
"Specify a model name (without -pretrain.yaml). "
48-
"For example, for config 'examples/megatron/configs/llama2_7B-pretrain.yaml', use: --model llama2_7B"
48+
"For example, for config 'examples/megatron/configs/MI300X/llama2_7B-pretrain.yaml', use: --model llama2_7B"
4949
),
5050
)
5151
parser.add_argument(

examples/README.md

Lines changed: 49 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,28 @@ It supports both **single-node** and **multi-node** training, and includes optio
77

88
## 📚 Table of Contents
99

10-
- [⚙️ Supported Backends](#️-supported-backends)
11-
- [🖥️ Single Node Training](#️-single-node-training)
12-
- [Setup Docker](#setup-docker)
13-
- [Setup Primus](#setup-primus)
14-
- [Run Pretraining](#run-pretraining)
15-
- [🌐 Multi-node Training](#-multi-node-training)
16-
- [🚀 HipBLASLt Auto Tuning (Optional)](#-hipblaslt-auto-tuning-optional)
17-
- [✅ Supported Models](#-supported-models)
18-
- [🏃‍♂️ How to Run a Supported Model](#️-how-to-run-a-supported-model)
19-
- [☸️ Kubernetes Training Management](#️-kubernetes-training-management-run_k8s_pretrainsh)
10+
- [🧠 Pretraining with Primus](#-pretraining-with-primus)
11+
- [📚 Table of Contents](#-table-of-contents)
12+
- [⚙️ Supported Backends](#️-supported-backends)
13+
- [🖥️ Single Node Training](#️-single-node-training)
14+
- [Setup Docker](#setup-docker)
15+
- [Setup Primus](#setup-primus)
16+
- [Run Pretraining](#run-pretraining)
17+
- [🚀 Quick Start Mode](#-quick-start-mode)
18+
- [🧑‍🔧 Interactive Mode](#-interactive-mode)
19+
- [🌐 Multi-node Training](#-multi-node-training)
20+
- [🔧 HipblasLT Auto Tuning](#-hipblaslt-auto-tuning)
21+
- [Stage 1: Dump GEMM Shape](#stage-1-dump-gemm-shape)
22+
- [Stage 2: Tune GEMM Kernel](#stage-2-tune-gemm-kernel)
23+
- [Stage 3: Train with Tuned Kernel](#stage-3-train-with-tuned-kernel)
24+
- [✅ Supported Models](#-supported-models)
25+
- [🏃‍♂️ How to Run a Supported Model](#️-how-to-run-a-supported-model)
26+
- [☸️ Kubernetes Training Management (`run_k8s_pretrain.sh`)](#️-kubernetes-training-management-run_k8s_pretrainsh)
27+
- [Requirements](#requirements)
28+
- [Usage](#usage)
29+
- [⚙️ Commands](#️-commands)
30+
- [⚙️ Create Command Options](#️-create-command-options)
31+
- [Example](#example)
2032

2133
---
2234

@@ -75,10 +87,10 @@ You do not need to enter the Docker container. Just set the config and run.
7587

7688
```bash
7789
# Example for megatron llama3.1_8B
78-
EXP=examples/megatron/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_local_pretrain.sh
90+
EXP=examples/megatron/configs/MI300X/llama3.1_8B-pretrain.yaml bash ./examples/run_local_pretrain.sh
7991

8092
# examples for torchtitan llama3.1_8B
81-
EXP=examples/torchtitan/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_local_pretrain.sh
93+
EXP=examples/torchtitan/configs/MI300X/llama3.1_8B-pretrain.yaml bash ./examples/run_local_pretrain.sh
8294
```
8395

8496
---
@@ -99,10 +111,10 @@ docker exec -it dev_primus bash
99111
cd Primus && pip install -r requirements.txt
100112

101113
# Example for megatron llama3.1_8B
102-
EXP=examples/megatron/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_pretrain.sh
114+
EXP=examples/megatron/configs/MI300X/llama3.1_8B-pretrain.yaml bash ./examples/run_pretrain.sh
103115

104116
# examples for torchtitan llama3.1_8B
105-
EXP=examples/torchtitan/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_pretrain.sh
117+
EXP=examples/torchtitan/configs/MI300X/llama3.1_8B-pretrain.yaml bash ./examples/run_pretrain.sh
106118

107119
```
108120

@@ -118,10 +130,10 @@ export DOCKER_IMAGE="docker.io/rocm/primus:v25.9_gfx942"
118130
export NNODES=8
119131

120132
# Example for megatron llama3.1_8B
121-
EXP=examples/megatron/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh
133+
EXP=examples/megatron/configs/MI300X/llama3.1_8B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh
122134

123135
# examples for torchtitan llama3.1_8b
124-
EXP=examples/torchtitan/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh
136+
EXP=examples/torchtitan/configs/MI300X/llama3.1_8B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh
125137
```
126138

127139
## 🔧 HipblasLT Auto Tuning
@@ -144,7 +156,7 @@ It is recommended to reduce `train_iters` for faster shape generation.
144156
# ./output/tune_hipblaslt/${PRIMUS_MODEL}/gemm_shape
145157

146158
export PRIMUS_HIPBLASLT_TUNING_STAGE=1
147-
export EXP=examples/megatron/configs/llama2_7B-pretrain.yaml
159+
export EXP=examples/megatron/configs/MI300X/llama2_7B-pretrain.yaml
148160
NNODES=1 bash ./examples/run_slurm_pretrain.sh
149161
```
150162

@@ -161,7 +173,7 @@ It typically takes 10–30 minutes depending on model size and shape complexity.
161173
# ./output/tune_hipblaslt/${PRIMUS_MODEL}/gemm_tune/tune_hipblas_gemm_results.txt
162174

163175
export PRIMUS_HIPBLASLT_TUNING_STAGE=2
164-
export EXP=examples/megatron/configs/llama2_7B-pretrain.yaml
176+
export EXP=examples/megatron/configs/MI300X/llama2_7B-pretrain.yaml
165177
NNODES=1 bash ./examples/run_slurm_pretrain.sh
166178
```
167179

@@ -173,7 +185,7 @@ In this final stage, the tuned kernel is loaded for efficient training:
173185

174186
```bash
175187
export PRIMUS_HIPBLASLT_TUNING_STAGE=3
176-
export EXP=examples/megatron/configs/llama2_7B-pretrain.yaml
188+
export EXP=examples/megatron/configs/MI300X/llama2_7B-pretrain.yaml
177189
NNODES=1 bash ./examples/run_slurm_pretrain.sh
178190
```
179191

@@ -183,18 +195,18 @@ The following models are supported out of the box via provided configuration fil
183195

184196
| Model | Huggingface Config | Megatron Config | TorchTitan Config |
185197
| ---------------- | ------------------ | --------------- | ----------------- |
186-
| llama2_7B | [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | [llama2_7B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/config/llama2_7B-pretrain.yaml) | |
187-
| llama2_70B | [meta-llama/Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) | [llama2_70B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/llama2_70B-pretrain.yaml) | |
188-
| llama3_8B | [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | [llama3_8B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/llama3_8B-pretrain.yaml) | |
189-
| llama3_70B | [meta-llama/Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) | [llama3_70B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/llama3_70B-pretrain.yaml) | |
190-
| llama3.1_8B | [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | [llama3.1_8B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/llama3.1_8B-pretrain.yaml) | [llama3.1_8B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/torchtitan/configs/llama3.1_8B-pretrain.yaml)|
191-
| llama3.1_70B | [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) | [llama3.1_70B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/llama3.1_70B-pretrain.yaml) | [llama3.1_70B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/torchtitan/configs/llama3.1_70B-pretrain.yaml)|
192-
| llama3.1_405B | [meta-llama/Llama-3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B) | [llama3.1_405B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/llama3.1_405B-pretrain.yaml) | [llama3.1_405B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/torchtitan/configs/llama3.1_405B-pretrain.yaml)|
193-
| deepseek_v2_lite | [deepseek-ai/DeepSeek-V2-Lite](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite) | [deepseek_v2_lite-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/deepseek_v2_lite-pretrain.yaml) | |
194-
| deepseek_v2 | [deepseek-ai/DeepSeek-V2](https://huggingface.co/deepseek-ai/DeepSeek-V2) | [deepseek_v2-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/deepseek_v2-pretrain.yaml) | |
195-
| deepseek_v3 | [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) | [deepseek_v3-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/deepseek_v3-pretrain.yaml) | |
196-
| Mixtral-8x7B-v0.1 | [mistralai/Mixtral-8x7B-v0.1 ](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | [mixtral_8x7B_v0.1-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/mixtral_8x7B_v0.1-pretrain.yaml) | |
197-
| Mixtral-8x22B-v0.1 | [mistralai/Mixtral-8x22B-v0.1 ](https://huggingface.co/mistralai/Mixtral-8x22B-v0.1) | [mixtral_8x22B_v0.1-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/mixtral_8x22B_v0.1-pretrain.yaml) | |
198+
| llama2_7B | [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | [llama2_7B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/llama2_7B-pretrain.yaml) | |
199+
| llama2_70B | [meta-llama/Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) | [llama2_70B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/llama2_70B-pretrain.yaml) | |
200+
| llama3_8B | [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | [llama3_8B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/llama3_8B-pretrain.yaml) | |
201+
| llama3_70B | [meta-llama/Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) | [llama3_70B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/llama3_70B-pretrain.yaml) | |
202+
| llama3.1_8B | [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | [llama3.1_8B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/llama3.1_8B-pretrain.yaml) | [llama3.1_8B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/torchtitan/configs/MI300X/llama3.1_8B-pretrain.yaml)|
203+
| llama3.1_70B | [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) | [llama3.1_70B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/llama3.1_70B-pretrain.yaml) | [llama3.1_70B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/torchtitan/configs/MI300X/llama3.1_70B-pretrain.yaml)|
204+
| llama3.1_405B | [meta-llama/Llama-3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B) | [llama3.1_405B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/llama3.1_405B-pretrain.yaml) | [llama3.1_405B-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/torchtitan/configs/MI300X/llama3.1_405B-pretrain.yaml)|
205+
| deepseek_v2_lite | [deepseek-ai/DeepSeek-V2-Lite](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite) | [deepseek_v2_lite-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/deepseek_v2_lite-pretrain.yaml) | |
206+
| deepseek_v2 | [deepseek-ai/DeepSeek-V2](https://huggingface.co/deepseek-ai/DeepSeek-V2) | [deepseek_v2-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/deepseek_v2-pretrain.yaml) | |
207+
| deepseek_v3 | [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) | [deepseek_v3-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/deepseek_v3-pretrain.yaml) | |
208+
| Mixtral-8x7B-v0.1 | [mistralai/Mixtral-8x7B-v0.1 ](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | [mixtral_8x7B_v0.1-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/mixtral_8x7B_v0.1-pretrain.yaml) | |
209+
| Mixtral-8x22B-v0.1 | [mistralai/Mixtral-8x22B-v0.1 ](https://huggingface.co/mistralai/Mixtral-8x22B-v0.1) | [mixtral_8x22B_v0.1-pretrain.yaml](https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/mixtral_8x22B_v0.1-pretrain.yaml) | |
198210

199211
---
200212

@@ -203,15 +215,15 @@ The following models are supported out of the box via provided configuration fil
203215
Use the following command pattern to start training with a selected model configuration:
204216

205217
```bash
206-
EXP=examples/megatron/configs/<model_config> bash ./examples/run_local_pretrain.sh
218+
EXP=examples/megatron/configs/MI300X/<model_config> bash ./examples/run_local_pretrain.sh
207219
```
208220

209221
For example, to run the llama3.1_8B model quickly:
210222

211223
```bash
212-
EXP=examples/megatron/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_local_pretrain.sh
224+
EXP=examples/megatron/configs/MI300X/llama3.1_8B-pretrain.yaml bash ./examples/run_local_pretrain.sh
213225

214-
EXP=examples/torchtitan/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_local_pretrain.sh
226+
EXP=examples/torchtitan/configs/MI300X/llama3.1_8B-pretrain.yaml bash ./examples/run_local_pretrain.sh
215227
```
216228

217229

@@ -221,10 +233,10 @@ For multi-node training via SLURM, use:
221233
export NNODES=8
222234

223235
#run megatron
224-
EXP=examples/megatron/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh
236+
EXP=examples/megatron/configs/MI300X/llama3.1_8B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh
225237

226238
# run torchtitan
227-
EXP=examples/torchtitan/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh
239+
EXP=examples/torchtitan/configs/MI300X/llama3.1_8B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh
228240
```
229241

230242
## ☸️ Kubernetes Training Management (`run_k8s_pretrain.sh`)
@@ -285,7 +297,7 @@ Create a training workload with 2 replicas and custom config:
285297

286298
```bash
287299
bash examples/run_k8s_pretrain.sh --url http://api.example.com create --replica 2 --cpu 96 --gpu 4 \
288-
--exp examples/megatron/configs/llama2_7B-pretrain.yaml --data_path /mnt/data/train \
300+
--exp examples/megatron/configs/MI300X/llama2_7B-pretrain.yaml --data_path /mnt/data/train \
289301
--image docker.io/custom/image:latest --hf_token myhf_token --workspace team-dev
290302

291303
#result:

0 commit comments

Comments
 (0)