@@ -7,16 +7,28 @@ It supports both **single-node** and **multi-node** training, and includes optio
77
88## 📚 Table of Contents
99
10- - [ ⚙️ Supported Backends] ( #️-supported-backends )
11- - [ 🖥️ Single Node Training] ( #️-single-node-training )
12- - [ Setup Docker] ( #setup-docker )
13- - [ Setup Primus] ( #setup-primus )
14- - [ Run Pretraining] ( #run-pretraining )
15- - [ 🌐 Multi-node Training] ( #-multi-node-training )
16- - [ 🚀 HipBLASLt Auto Tuning (Optional)] ( #-hipblaslt-auto-tuning-optional )
17- - [ ✅ Supported Models] ( #-supported-models )
18- - [ 🏃♂️ How to Run a Supported Model] ( #️-how-to-run-a-supported-model )
19- - [ ☸️ Kubernetes Training Management] ( #️-kubernetes-training-management-run_k8s_pretrainsh )
10+ - [ 🧠 Pretraining with Primus] ( #-pretraining-with-primus )
11+ - [ 📚 Table of Contents] ( #-table-of-contents )
12+ - [ ⚙️ Supported Backends] ( #️-supported-backends )
13+ - [ 🖥️ Single Node Training] ( #️-single-node-training )
14+ - [ Setup Docker] ( #setup-docker )
15+ - [ Setup Primus] ( #setup-primus )
16+ - [ Run Pretraining] ( #run-pretraining )
17+ - [ 🚀 Quick Start Mode] ( #-quick-start-mode )
18+ - [ 🧑🔧 Interactive Mode] ( #-interactive-mode )
19+ - [ 🌐 Multi-node Training] ( #-multi-node-training )
20+ - [ 🔧 HipblasLT Auto Tuning] ( #-hipblaslt-auto-tuning )
21+ - [ Stage 1: Dump GEMM Shape] ( #stage-1-dump-gemm-shape )
22+ - [ Stage 2: Tune GEMM Kernel] ( #stage-2-tune-gemm-kernel )
23+ - [ Stage 3: Train with Tuned Kernel] ( #stage-3-train-with-tuned-kernel )
24+ - [ ✅ Supported Models] ( #-supported-models )
25+ - [ 🏃♂️ How to Run a Supported Model] ( #️-how-to-run-a-supported-model )
26+ - [ ☸️ Kubernetes Training Management (` run_k8s_pretrain.sh ` )] ( #️-kubernetes-training-management-run_k8s_pretrainsh )
27+ - [ Requirements] ( #requirements )
28+ - [ Usage] ( #usage )
29+ - [ ⚙️ Commands] ( #️-commands )
30+ - [ ⚙️ Create Command Options] ( #️-create-command-options )
31+ - [ Example] ( #example )
2032
2133---
2234
@@ -75,10 +87,10 @@ You do not need to enter the Docker container. Just set the config and run.
7587
7688``` bash
7789# Example for megatron llama3.1_8B
78- EXP=examples/megatron/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_local_pretrain.sh
90+ EXP=examples/megatron/configs/MI300X/ llama3.1_8B-pretrain.yaml bash ./examples/run_local_pretrain.sh
7991
8092# examples for torchtitan llama3.1_8B
81- EXP=examples/torchtitan/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_local_pretrain.sh
93+ EXP=examples/torchtitan/configs/MI300X/ llama3.1_8B-pretrain.yaml bash ./examples/run_local_pretrain.sh
8294```
8395
8496---
@@ -99,10 +111,10 @@ docker exec -it dev_primus bash
99111cd Primus && pip install -r requirements.txt
100112
101113# Example for megatron llama3.1_8B
102- EXP=examples/megatron/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_pretrain.sh
114+ EXP=examples/megatron/configs/MI300X/ llama3.1_8B-pretrain.yaml bash ./examples/run_pretrain.sh
103115
104116# examples for torchtitan llama3.1_8B
105- EXP=examples/torchtitan/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_pretrain.sh
117+ EXP=examples/torchtitan/configs/MI300X/ llama3.1_8B-pretrain.yaml bash ./examples/run_pretrain.sh
106118
107119```
108120
@@ -118,10 +130,10 @@ export DOCKER_IMAGE="docker.io/rocm/primus:v25.9_gfx942"
118130export NNODES=8
119131
120132# Example for megatron llama3.1_8B
121- EXP=examples/megatron/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh
133+ EXP=examples/megatron/configs/MI300X/ llama3.1_8B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh
122134
123135# examples for torchtitan llama3.1_8b
124- EXP=examples/torchtitan/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh
136+ EXP=examples/torchtitan/configs/MI300X/ llama3.1_8B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh
125137```
126138
127139## 🔧 HipblasLT Auto Tuning
@@ -144,7 +156,7 @@ It is recommended to reduce `train_iters` for faster shape generation.
144156# ./output/tune_hipblaslt/${PRIMUS_MODEL}/gemm_shape
145157
146158export PRIMUS_HIPBLASLT_TUNING_STAGE=1
147- export EXP=examples/megatron/configs/llama2_7B-pretrain.yaml
159+ export EXP=examples/megatron/configs/MI300X/ llama2_7B-pretrain.yaml
148160NNODES=1 bash ./examples/run_slurm_pretrain.sh
149161```
150162
@@ -161,7 +173,7 @@ It typically takes 10–30 minutes depending on model size and shape complexity.
161173# ./output/tune_hipblaslt/${PRIMUS_MODEL}/gemm_tune/tune_hipblas_gemm_results.txt
162174
163175export PRIMUS_HIPBLASLT_TUNING_STAGE=2
164- export EXP=examples/megatron/configs/llama2_7B-pretrain.yaml
176+ export EXP=examples/megatron/configs/MI300X/ llama2_7B-pretrain.yaml
165177NNODES=1 bash ./examples/run_slurm_pretrain.sh
166178```
167179
@@ -173,7 +185,7 @@ In this final stage, the tuned kernel is loaded for efficient training:
173185
174186``` bash
175187export PRIMUS_HIPBLASLT_TUNING_STAGE=3
176- export EXP=examples/megatron/configs/llama2_7B-pretrain.yaml
188+ export EXP=examples/megatron/configs/MI300X/ llama2_7B-pretrain.yaml
177189NNODES=1 bash ./examples/run_slurm_pretrain.sh
178190```
179191
@@ -183,18 +195,18 @@ The following models are supported out of the box via provided configuration fil
183195
184196| Model | Huggingface Config | Megatron Config | TorchTitan Config |
185197| ---------------- | ------------------ | --------------- | ----------------- |
186- | llama2_7B | [ meta-llama/Llama-2-7b-hf] ( https://huggingface.co/meta-llama/Llama-2-7b-hf ) | [ llama2_7B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/config /llama2_7B-pretrain.yaml ) | |
187- | llama2_70B | [ meta-llama/Llama-2-70b-hf] ( https://huggingface.co/meta-llama/Llama-2-70b-hf ) | [ llama2_70B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/llama2_70B-pretrain.yaml ) | |
188- | llama3_8B | [ meta-llama/Meta-Llama-3-8B] ( https://huggingface.co/meta-llama/Meta-Llama-3-8B ) | [ llama3_8B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/llama3_8B-pretrain.yaml ) | |
189- | llama3_70B | [ meta-llama/Meta-Llama-3-70B] ( https://huggingface.co/meta-llama/Meta-Llama-3-70B ) | [ llama3_70B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/llama3_70B-pretrain.yaml ) | |
190- | llama3.1_8B | [ meta-llama/Llama-3.1-8B] ( https://huggingface.co/meta-llama/Llama-3.1-8B ) | [ llama3.1_8B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/llama3.1_8B-pretrain.yaml ) | [ llama3.1_8B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/torchtitan/configs/llama3.1_8B-pretrain.yaml ) |
191- | llama3.1_70B | [ meta-llama/Llama-3.1-70B] ( https://huggingface.co/meta-llama/Llama-3.1-70B ) | [ llama3.1_70B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/llama3.1_70B-pretrain.yaml ) | [ llama3.1_70B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/torchtitan/configs/llama3.1_70B-pretrain.yaml ) |
192- | llama3.1_405B | [ meta-llama/Llama-3.1-405B] ( https://huggingface.co/meta-llama/Llama-3.1-405B ) | [ llama3.1_405B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/llama3.1_405B-pretrain.yaml ) | [ llama3.1_405B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/torchtitan/configs/llama3.1_405B-pretrain.yaml ) |
193- | deepseek_v2_lite | [ deepseek-ai/DeepSeek-V2-Lite] ( https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite ) | [ deepseek_v2_lite-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/deepseek_v2_lite-pretrain.yaml ) | |
194- | deepseek_v2 | [ deepseek-ai/DeepSeek-V2] ( https://huggingface.co/deepseek-ai/DeepSeek-V2 ) | [ deepseek_v2-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/deepseek_v2-pretrain.yaml ) | |
195- | deepseek_v3 | [ deepseek-ai/DeepSeek-V3] ( https://huggingface.co/deepseek-ai/DeepSeek-V3 ) | [ deepseek_v3-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/deepseek_v3-pretrain.yaml ) | |
196- | Mixtral-8x7B-v0.1 | [ mistralai/Mixtral-8x7B-v0.1 ] ( https://huggingface.co/mistralai/Mixtral-8x7B-v0.1 ) | [ mixtral_8x7B_v0.1-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/mixtral_8x7B_v0.1-pretrain.yaml ) | |
197- | Mixtral-8x22B-v0.1 | [ mistralai/Mixtral-8x22B-v0.1 ] ( https://huggingface.co/mistralai/Mixtral-8x22B-v0.1 ) | [ mixtral_8x22B_v0.1-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/mixtral_8x22B_v0.1-pretrain.yaml ) | |
198+ | llama2_7B | [ meta-llama/Llama-2-7b-hf] ( https://huggingface.co/meta-llama/Llama-2-7b-hf ) | [ llama2_7B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X /llama2_7B-pretrain.yaml ) | |
199+ | llama2_70B | [ meta-llama/Llama-2-70b-hf] ( https://huggingface.co/meta-llama/Llama-2-70b-hf ) | [ llama2_70B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/ llama2_70B-pretrain.yaml ) | |
200+ | llama3_8B | [ meta-llama/Meta-Llama-3-8B] ( https://huggingface.co/meta-llama/Meta-Llama-3-8B ) | [ llama3_8B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/ llama3_8B-pretrain.yaml ) | |
201+ | llama3_70B | [ meta-llama/Meta-Llama-3-70B] ( https://huggingface.co/meta-llama/Meta-Llama-3-70B ) | [ llama3_70B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/ llama3_70B-pretrain.yaml ) | |
202+ | llama3.1_8B | [ meta-llama/Llama-3.1-8B] ( https://huggingface.co/meta-llama/Llama-3.1-8B ) | [ llama3.1_8B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/ llama3.1_8B-pretrain.yaml ) | [ llama3.1_8B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/torchtitan/configs/MI300X /llama3.1_8B-pretrain.yaml ) |
203+ | llama3.1_70B | [ meta-llama/Llama-3.1-70B] ( https://huggingface.co/meta-llama/Llama-3.1-70B ) | [ llama3.1_70B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/ llama3.1_70B-pretrain.yaml ) | [ llama3.1_70B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/torchtitan/configs/MI300X /llama3.1_70B-pretrain.yaml ) |
204+ | llama3.1_405B | [ meta-llama/Llama-3.1-405B] ( https://huggingface.co/meta-llama/Llama-3.1-405B ) | [ llama3.1_405B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/ llama3.1_405B-pretrain.yaml ) | [ llama3.1_405B-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/torchtitan/configs/MI300X /llama3.1_405B-pretrain.yaml ) |
205+ | deepseek_v2_lite | [ deepseek-ai/DeepSeek-V2-Lite] ( https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite ) | [ deepseek_v2_lite-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/ deepseek_v2_lite-pretrain.yaml ) | |
206+ | deepseek_v2 | [ deepseek-ai/DeepSeek-V2] ( https://huggingface.co/deepseek-ai/DeepSeek-V2 ) | [ deepseek_v2-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/ deepseek_v2-pretrain.yaml ) | |
207+ | deepseek_v3 | [ deepseek-ai/DeepSeek-V3] ( https://huggingface.co/deepseek-ai/DeepSeek-V3 ) | [ deepseek_v3-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/ deepseek_v3-pretrain.yaml ) | |
208+ | Mixtral-8x7B-v0.1 | [ mistralai/Mixtral-8x7B-v0.1 ] ( https://huggingface.co/mistralai/Mixtral-8x7B-v0.1 ) | [ mixtral_8x7B_v0.1-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/ mixtral_8x7B_v0.1-pretrain.yaml ) | |
209+ | Mixtral-8x22B-v0.1 | [ mistralai/Mixtral-8x22B-v0.1 ] ( https://huggingface.co/mistralai/Mixtral-8x22B-v0.1 ) | [ mixtral_8x22B_v0.1-pretrain.yaml] ( https://github.com/AMD-AIG-AIMA/Primus/blob/main/examples/megatron/configs/MI300X/ mixtral_8x22B_v0.1-pretrain.yaml ) | |
198210
199211---
200212
@@ -203,15 +215,15 @@ The following models are supported out of the box via provided configuration fil
203215Use the following command pattern to start training with a selected model configuration:
204216
205217``` bash
206- EXP=examples/megatron/configs/< model_config> bash ./examples/run_local_pretrain.sh
218+ EXP=examples/megatron/configs/MI300X/ < model_config> bash ./examples/run_local_pretrain.sh
207219```
208220
209221For example, to run the llama3.1_8B model quickly:
210222
211223``` bash
212- EXP=examples/megatron/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_local_pretrain.sh
224+ EXP=examples/megatron/configs/MI300X/ llama3.1_8B-pretrain.yaml bash ./examples/run_local_pretrain.sh
213225
214- EXP=examples/torchtitan/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_local_pretrain.sh
226+ EXP=examples/torchtitan/configs/MI300X/ llama3.1_8B-pretrain.yaml bash ./examples/run_local_pretrain.sh
215227```
216228
217229
@@ -221,10 +233,10 @@ For multi-node training via SLURM, use:
221233export NNODES=8
222234
223235# run megatron
224- EXP=examples/megatron/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh
236+ EXP=examples/megatron/configs/MI300X/ llama3.1_8B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh
225237
226238# run torchtitan
227- EXP=examples/torchtitan/configs/llama3.1_8B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh
239+ EXP=examples/torchtitan/configs/MI300X/ llama3.1_8B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh
228240```
229241
230242## ☸️ Kubernetes Training Management (` run_k8s_pretrain.sh ` )
@@ -285,7 +297,7 @@ Create a training workload with 2 replicas and custom config:
285297
286298``` bash
287299bash examples/run_k8s_pretrain.sh --url http://api.example.com create --replica 2 --cpu 96 --gpu 4 \
288- --exp examples/megatron/configs/llama2_7B-pretrain.yaml --data_path /mnt/data/train \
300+ --exp examples/megatron/configs/MI300X/ llama2_7B-pretrain.yaml --data_path /mnt/data/train \
289301 --image docker.io/custom/image:latest --hf_token myhf_token --workspace team-dev
290302
291303# result:
0 commit comments