[megatron] fix: MTP patch for newer mcore#5587
[megatron] fix: MTP patch for newer mcore#5587HollowMan6 wants to merge 1 commit intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses a regression in multi-token prediction (MTP) when using newer versions of megatron-core. It correctly switches to the new process_mtp_loss helper function while maintaining backward compatibility. My review includes a suggestion to refine the compatibility check to make it more robust and prevent it from unintentionally masking potential bugs, which will improve the code's maintainability and ease of debugging.
There was a problem hiding this comment.
Pull request overview
Updates the Megatron GPTModel postprocess monkey-patch to prefer Megatron-LM’s newer process_mtp_loss helper when available, while keeping a fallback implementation for older Megatron-LM versions.
Changes:
- Extend
_megatron_gptmodel_postprocesssignature to accept additional keyword arguments (**kwargs) for compatibility with upstream callers. - Use
megatron.core.transformer.multi_token_prediction.process_mtp_losswhen present; otherwise fall back to the existing manual MTP loss path. - Add logic to select
cp_groupfrom eitherself.pg_collection.cporself.cp_groupfor the upstream helper path.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There's no compute_output_layer_and_language_model_loss in new mcore and everything needs to be handled by process_mtp_loss Signed-off-by: Hollow Man <hollowman@opensuse.org>
What does this PR do?
Fix regression introduced in #5561
As mentioned in #5323 (comment), there's no compute_output_layer_and_language_model_loss in new mcore and everything needs to be handled by process_mtp_loss https://github.com/NVIDIA/Megatron-LM/blob/9a19203af5f8941e4ff9528443e3e5b8928b8c81/megatron/core/models/gpt/gpt_model.py#L651
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,fully_async,one_step_off,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.