Skip to content

support mtp layer support for qwen3.5 series models#98

Open
zpltys wants to merge 2 commits intoISEEKYAN:mainfrom
zpltys:qwen35_mtp
Open

support mtp layer support for qwen3.5 series models#98
zpltys wants to merge 2 commits intoISEEKYAN:mainfrom
zpltys:qwen35_mtp

Conversation

@zpltys
Copy link
Copy Markdown

@zpltys zpltys commented Mar 20, 2026

support qwen3.5 models' mtp layer.
I have test the correctness in example/qwen3_5/test_mtp_logits.py file and e2e sft training on qwen3.5 35ba3b and 9b

@ArronHZG
Copy link
Copy Markdown
Contributor

How does the mtp_loss_scaling_factor take effect?
Also, how do I load a model and disable MTP?

@ArronHZG
Copy link
Copy Markdown
Contributor

def _build_config(self):
    """Override to add MTP configuration."""
    hf_config = self.hf_config

    # Add MTP configuration if present
    mtp_args = {}
    if "num_nextn_predict_layers" in hf_config:
        mtp_args["mtp_num_layers"] = hf_config.num_nextn_predict_layers
        mtp_args["mtp_loss_scaling_factor"] = self.extra_args.get("mtp_loss_scaling_factor", 0.1)

    return self._build_base_config(
        add_qkv_bias=True,
        qk_layernorm=False,
        **mtp_args,
    )

just like this.

@zpltys
Copy link
Copy Markdown
Author

zpltys commented Mar 24, 2026

How does the mtp_loss_scaling_factor take effect? Also, how do I load a model and disable MTP?

I have fix this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants