Skip to content

Conversation

@HollowMan6
Copy link
Contributor

@HollowMan6 HollowMan6 commented Dec 22, 2025

Purpose

This PR fixes weight loading when LoRA is enabled, i.e., we have base_layer added to the:

model.layers.0.mlp.experts.0.up_proj.weight -> model.layers.0.mlp.experts.0.up_proj.base_layer.weight

Currently before this fix, the patched code will handled this as:
model.layers.0.mlp.experts.w13_base_layer.weight, which is wrong and
it should actually be model.layers.0.mlp.experts.base_layer.w13_weight

Test Plan

Test on Qwen3 30B A3B

Test Result

Looks good.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

✨ Presented to you with Mind Lab - A Lab for Experiential Intelligence.

@mergify mergify bot added deepseek Related to DeepSeek models llama Related to Llama models qwen Related to Qwen models gpt-oss Related to GPT-OSS models speculative-decoding labels Dec 22, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in weight loading for FusedMoE layers when LoRA is enabled. The changes correctly handle the base_layer component in weight names. The core logic is adjusted in make_expert_params_mapping, and this fix is propagated by adding an is_lora_enabled flag to this function, which is then passed from various model definitions. The overall approach is sound and the widespread changes are necessary boilerplate to support the fix. I have one suggestion to improve the robustness of the string formatting to prevent potential issues with certain model configurations.

@jeejeelee jeejeelee self-assigned this Dec 22, 2025
@HollowMan6 HollowMan6 force-pushed the lora_base_layer branch 2 times, most recently from 00c09c7 to f9008c9 Compare December 22, 2025 12:36
@HollowMan6 HollowMan6 changed the title [BugFix] LoRA: FusedMoE make_expert_params_mapping supports base_layer [BugFix] LoRA: Support loading base_layer of experts Dec 22, 2025
Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be duplicating this code in every model. It should be abstracted to a util.

Also, please make sure that the fix is also applied to

@HollowMan6
Copy link
Contributor Author

@hmellor Thanks for reviewing, now this is changed as requested!

cc: @jeejeelee

Copy link
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once CI is green

@jeejeelee jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 25, 2025
@jeejeelee jeejeelee enabled auto-merge (squash) December 25, 2025 07:44
@HollowMan6
Copy link
Contributor Author

Thank you @jeejeelee! Now all the CIs should have already been passed, but auto-merge (squash) is not merging it, maybe this needs some manual work.

@jeejeelee jeejeelee disabled auto-merge December 26, 2025 00:35
@jeejeelee jeejeelee enabled auto-merge (squash) December 26, 2025 00:36
@jeejeelee
Copy link
Collaborator

cc @hmellor

Copilot AI review requested due to automatic review settings December 30, 2025 01:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@HollowMan6 HollowMan6 requested a review from Copilot December 30, 2025 11:37
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fix for loading LoRA weights for experts in MoE models. The issue with incorrect weight name remapping when a base_layer is present is addressed by a new helper function, remap_expert_weight_name. This function correctly handles the insertion of base_layer into the parameter name. The fix has been consistently applied across numerous model files, replacing the simple string replacement with the new, more robust logic. The implementation of the new function is sound and correctly addresses the described bug. The changes are well-contained and improve the LoRA support for MoE models. Overall, this is a good and necessary bug fix.

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@jeejeelee
Copy link
Collaborator

@hmellor could you please take another look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models gpt-oss Related to GPT-OSS models llama Related to Llama models qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

3 participants