-
-
Notifications
You must be signed in to change notification settings - Fork 12.3k
[BugFix] LoRA: Support loading base_layer of experts #31104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d822bed to
210bc7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a bug in weight loading for FusedMoE layers when LoRA is enabled. The changes correctly handle the base_layer component in weight names. The core logic is adjusted in make_expert_params_mapping, and this fix is propagated by adding an is_lora_enabled flag to this function, which is then passed from various model definitions. The overall approach is sound and the widespread changes are necessary boilerplate to support the fix. I have one suggestion to improve the robustness of the string formatting to prevent potential issues with certain model configurations.
00c09c7 to
f9008c9
Compare
hmellor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not be duplicating this code in every model. It should be abstracted to a util.
Also, please make sure that the fix is also applied to
| def load_weights( |
f9008c9 to
5c39293
Compare
5c39293 to
d70645e
Compare
|
@hmellor Thanks for reviewing, now this is changed as requested! cc: @jeejeelee |
c1022bd to
abc19bf
Compare
abc19bf to
2c1825b
Compare
Signed-off-by: Hollow Man <[email protected]>
2c1825b to
b1139f3
Compare
jeejeelee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once CI is green
|
Thank you @jeejeelee! Now all the CIs should have already been passed, but auto-merge (squash) is not merging it, maybe this needs some manual work. |
|
cc @hmellor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a fix for loading LoRA weights for experts in MoE models. The issue with incorrect weight name remapping when a base_layer is present is addressed by a new helper function, remap_expert_weight_name. This function correctly handles the insertion of base_layer into the parameter name. The fix has been consistently applied across numerous model files, replacing the simple string replacement with the new, more robust logic. The implementation of the new function is sound and correctly addresses the described bug. The changes are well-contained and improve the LoRA support for MoE models. Overall, this is a good and necessary bug fix.
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
|
@hmellor could you please take another look? |
Purpose
This PR fixes weight loading when LoRA is enabled, i.e., we have
base_layeradded to the:model.layers.0.mlp.experts.0.up_proj.weight->model.layers.0.mlp.experts.0.up_proj.base_layer.weightCurrently before this fix, the patched code will handled this as:
model.layers.0.mlp.experts.w13_base_layer.weight, which is wrong andit should actually be
model.layers.0.mlp.experts.base_layer.w13_weightTest Plan
Test on Qwen3 30B A3B
Test Result
Looks good.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.✨ Presented to you with Mind Lab - A Lab for Experiential Intelligence.