File tree Expand file tree Collapse file tree 3 files changed +65
-0
lines changed
Expand file tree Collapse file tree 3 files changed +65
-0
lines changed Original file line number Diff line number Diff line change @@ -93,6 +93,7 @@ PYTHONPATH=/root/Megatron-LM python tools/convert_hf_to_torch_dist.py \
9393```
9494
9595For larger models, you can use ` torchrun ` to start the covnersion script to convert with multi-gpus or even multi-nodes.
96+ Note: When converting the kimi-k2 model weights, you need to open config.json in the model path and change "model_type": "kimi_k2" to "model_type": "deepseek_v3".
9697
9798### Convert from Megatron Format to Hugging Face Format
9899
Original file line number Diff line number Diff line change @@ -93,6 +93,7 @@ PYTHONPATH=/root/Megatron-LM python tools/convert_hf_to_torch_dist.py \
9393```
9494
9595对于更大的模型,可以使用 ` torchrun ` 来启动转换脚本,从而使用多张 GPU 甚至多机进行权重转换。
96+ 注意:kimi-k2模型权重转换时,需打开模型路径中的config.json,将"model_type": "kimi_k2"修改为"model_type": "deepseek_v3"。
9697
9798### Megatron 格式 转换为 Hugging Face 格式
9899
Original file line number Diff line number Diff line change 1+ NLAYERS=61
2+ FIRST_K_DENSE_REPLACE=1
3+
4+ arr=()
5+ for (( i= 0 ; i< NLAYERS; i++ )) ; do
6+ if (( i < FIRST_K_DENSE_REPLACE )) ; then
7+ arr+=(0)
8+ else
9+ arr+=(1)
10+ fi
11+ done
12+
13+ printf -v MOE_LAYER_FREQ " [%s]" " $( IFS=' , ' ; echo " ${arr[*]} " ) "
14+
15+ # kimi-k2
16+ MODEL_ARGS=(
17+ --disable-bias-linear
18+ --num-layers 61
19+ --hidden-size 7168
20+ --ffn-hidden-size 18432
21+ --num-attention-heads 64
22+ --kv-channels 64
23+ --normalization RMSNorm
24+ --position-embedding-type rope
25+ --norm-epsilon 1e-6
26+ --swiglu
27+ --untie-embeddings-and-output-weights
28+ --vocab-size 163840
29+
30+ --multi-latent-attention
31+ --q-lora-rank 1536
32+ --kv-lora-rank 512
33+ --qk-head-dim 128
34+ --qk-pos-emb-head-dim 64
35+ --v-head-dim 128
36+ --qk-layernorm
37+ --rotary-scaling-factor 32.0
38+ --rotary-base 50000
39+ --mscale 1.0
40+ --mscale-all-dim 1.0
41+ --attention-softmax-in-fp32
42+ --no-rope-fusion
43+
44+ # moe
45+ --num-experts 384
46+ --moe-layer-freq $MOE_LAYER_FREQ
47+ --moe-ffn-hidden-size 2048
48+ --moe-router-topk 8
49+ --moe-shared-expert-intermediate-size 2048
50+ --moe-router-pre-softmax
51+ --moe-router-score-function sigmoid
52+ --moe-router-enable-expert-bias
53+ --moe-router-load-balancing-type seq_aux_loss
54+ --moe-token-dispatcher-type alltoall
55+ --moe-aux-loss-coeff 0
56+ --moe-router-bias-update-rate 0
57+ --moe-router-group-topk 1
58+ --moe-router-num-groups 1
59+ --moe-grouped-gemm
60+ --moe-router-topk-scaling-factor 2.827
61+ --moe-router-dtype fp32
62+ --moe-permute-fusion
63+ )
You can’t perform that action at this time.
0 commit comments