Skip to content

Conversation

@yaoyu-33
Copy link
Contributor

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 18, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@HollowMan6
Copy link
Contributor

HollowMan6 commented Dec 18, 2025

Update: now all are fixed

Update: Only linear_proj and linear_fc2 are correctly mapped with current version for LoRA:

image

Looks like those fused weights (e.g. fc1, qkv) in LoRA are still not mapped to hf_name correctly and they are still using the megatron naming:

image

Meanwhile, though Canonical LoRA is functioning in the sense of naming, it doesn't seem like it's working correctly, I will investigate this further:

Co-authored-by: ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 <[email protected]>
Signed-off-by: Yu Yao <[email protected]>
@yaoyu-33
Copy link
Contributor Author

@HollowMan6 yes, I think I messed up a bit about the name conversion for fused base names. lemme try to fix

HollowMan6 added a commit to HollowMan6/Megatron-Bridge that referenced this pull request Dec 21, 2025
@HollowMan6
Copy link
Contributor

HollowMan6 commented Dec 21, 2025

@yaoyu-33 I've opened PR #1788 that targets to bridge/peft_bridge_1 branch for fixing expert layers' case, feel free to merge that one or integrate it manually into this PR.

The convergence situation is good on dense models for RL on verl, with the gray one representing Canonical LoRA with bridge, blue one representing the normal LoRA with bridge, and yellow one representing the LoRA merge.

image

The convergence tests for MoE (qwen3-30b-a3b):

image

if isinstance(adapter, ModuleDict):
adapter_name = local_param_name.removeprefix(local_base_prefix + ".adapter.").split(".")[0]
adapter = adapter[adapter_name]
input_is_parallel, _, _, _, base_linear_is_parallel = get_adapter_attributes_from_linear(to_wrap)
Copy link
Contributor

@HollowMan6 HollowMan6 Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This will need to be updated after #1800 is merged

Suggested change
input_is_parallel, _, _, _, base_linear_is_parallel = get_adapter_attributes_from_linear(to_wrap)
input_is_parallel, _, _, _, _, base_linear_is_parallel = get_adapter_attributes_from_linear(to_wrap)

Comment on lines +437 to +438
assert weights[0].param_name.endswith(".linear_in.weight")
assert weights[1].param_name.endswith(".linear_out.weight")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the test cases:

Suggested change
assert weights[0].param_name.endswith(".linear_in.weight")
assert weights[1].param_name.endswith(".linear_out.weight")
assert weights[0].param_name.endswith("lora_A.weight")
assert weights[1].param_name.endswith("lora_B.weight")

"materialize_adapter_weights",
lambda *_: [adapter_weight],
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the test cases:

Suggested change
# Provide a base HF weight name so stream_adapter_weights_megatron_to_hf can
# translate it into lora_A/lora_B names.
monkeypatch.setattr(
bridge,
"_get_base_hf_weight_names_for_adapter",
lambda *_args, **_kwargs: ["model.layers.0.mlp.linear_fc1.weight"],
)


weights = list(
bridge.stream_adapter_weights_megatron_to_hf(
[SimpleNamespace(config=SimpleNamespace())],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the test cases:

Suggested change
[SimpleNamespace(config=SimpleNamespace())],
[SimpleNamespace(config=SimpleNamespace(num_moe_experts=0))],


weights = list(
bridge.stream_adapter_weights_megatron_to_hf(
[SimpleNamespace(config=SimpleNamespace())],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the test cases:

Suggested change
[SimpleNamespace(config=SimpleNamespace())],
[SimpleNamespace(config=SimpleNamespace(num_moe_experts=0))],

HollowMan6 added a commit to HollowMan6/Megatron-Bridge that referenced this pull request Dec 29, 2025
HollowMan6 and others added 3 commits December 29, 2025 15:31
# Conflicts:
#	src/megatron/bridge/models/conversion/model_bridge.py
#	tests/unit_tests/models/test_model_bridge_lora.py
)
from megatron.bridge.peft.canonical_lora import ModuleDict
from megatron.bridge.peft.lora import LoRAMerge
from megatron.bridge.peft.utils import get_adapter_attributes_from_linear, is_expert_linear
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from megatron.bridge.peft.utils import get_adapter_attributes_from_linear, is_expert_linear
from megatron.bridge.peft.utils import ParallelLinearAdapter, get_adapter_attributes_from_linear, is_expert_linear

if isinstance(adapter, ModuleDict):
adapter_name = local_param_name.removeprefix(local_base_prefix + ".adapter.").split(".")[0]
adapter = adapter[adapter_name]
input_is_parallel, _, _, _, _, base_linear_is_parallel = get_adapter_attributes_from_linear(to_wrap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ParallelLinearAdapter, base_linear_is_parallel can be different from the base layer (e.g. for linear_kv_down_proj).

Suggested change
input_is_parallel, _, _, _, _, base_linear_is_parallel = get_adapter_attributes_from_linear(to_wrap)
if isinstance(adapter, ParallelLinearAdapter):
input_is_parallel = adapter.input_is_parallel
base_linear_is_parallel = True
else:
input_is_parallel, _, _, _, _, base_linear_is_parallel = get_adapter_attributes_from_linear(to_wrap)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants