Skip to content

Commit 0333eed

Browse files
committed
fix moe freq of dpsk v2
1 parent 1bdf247 commit 0333eed

File tree

5 files changed

+14
-5
lines changed

5 files changed

+14
-5
lines changed

README.md

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -49,9 +49,6 @@ pre-commit install
4949
## Examples
5050
```bash
5151
cd workspace/Primus
52-
# deepseek pretrain (default use deepseek_v2_lite model)
53-
./examples/deepseek/run_pretrain.sh
52+
# megatron pretrain
53+
./examples/megatron/run_pretrain.sh
5454
```
55-
56-
57-

examples/megatron/run_pretrain.sh

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
#!/bin/bash
22
# shellcheck disable=SC2086
3+
###############################################################################
4+
# Copyright (c) 2025, Advanced Micro Devices, Inc. All rights reserved.
5+
#
6+
# See LICENSE for license information.
7+
#################################################################################
38

49
# available models: primus/configs/models/megatron
510
export MODEL_CONFIG=${MODEL_CONFIG:-deepseek_v2_lite}

examples/megatron/run_slurm_pretrain.sh

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
#!/bin/bash
22
# shellcheck disable=SC2086
3+
###############################################################################
4+
# Copyright (c) 2025, Advanced Micro Devices, Inc. All rights reserved.
5+
#
6+
# See LICENSE for license information.
7+
#################################################################################
38

49
export RUN_ENV=slurm
510
export MODEL_CONFIG=deepseek_v2_lite

primus/configs/models/megatron/deepseek_v2.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ qk_pos_emb_head_dim: 64
2121
v_head_dim: 128
2222
kv_channels: 128
2323
# moe
24+
moe_layer_freq: "([0]*1+[1]*59)"
2425
num_experts: 160
2526
# num_shared_experts: 2
2627
moe_ffn_hidden_size: 1536 # moe_intermediate_size

primus/configs/models/megatron/deepseek_v2_lite.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ qk_pos_emb_head_dim: 64
2121
v_head_dim: 128
2222
kv_channels: 128
2323
# moe
24+
moe_layer_freq: "([0]*1+[1]*26)"
2425
num_experts: 64
2526
# num_shared_experts: 2
2627
moe_ffn_hidden_size: 1408

0 commit comments

Comments
 (0)