Skip to content

Conversation

@wenxie-amd
Copy link
Contributor

Primus/primus/configs/models/megatron/deepseek_v3.yaml

mtp

mtp_num_layers: 1
mtp_loss_scaling_factor: 0.1

@Xiaoming-AMD Xiaoming-AMD merged commit dbb8f97 into main Mar 25, 2025
1 check passed
@wenxie-amd wenxie-amd deleted the dev/wenx/support_mtp_config branch March 31, 2025 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants