-
Notifications
You must be signed in to change notification settings - Fork 117
[Draft] Peft Bridge #1766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Draft] Peft Bridge #1766
Conversation
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Co-authored-by: ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 <[email protected]> Signed-off-by: Yu Yao <[email protected]>
|
@HollowMan6 yes, I think I messed up a bit about the name conversion for fused base names. lemme try to fix |
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: Hollow Man <[email protected]>
|
@yaoyu-33 I've opened PR #1788 that targets to The convergence situation is good on dense models for RL on verl, with the gray one representing Canonical LoRA with bridge, blue one representing the normal LoRA with bridge, and yellow one representing the LoRA merge.
The convergence tests for MoE (qwen3-30b-a3b):
|
| if isinstance(adapter, ModuleDict): | ||
| adapter_name = local_param_name.removeprefix(local_base_prefix + ".adapter.").split(".")[0] | ||
| adapter = adapter[adapter_name] | ||
| input_is_parallel, _, _, _, base_linear_is_parallel = get_adapter_attributes_from_linear(to_wrap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This will need to be updated after #1800 is merged
| input_is_parallel, _, _, _, base_linear_is_parallel = get_adapter_attributes_from_linear(to_wrap) | |
| input_is_parallel, _, _, _, _, base_linear_is_parallel = get_adapter_attributes_from_linear(to_wrap) |
| assert weights[0].param_name.endswith(".linear_in.weight") | ||
| assert weights[1].param_name.endswith(".linear_out.weight") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix the test cases:
| assert weights[0].param_name.endswith(".linear_in.weight") | |
| assert weights[1].param_name.endswith(".linear_out.weight") | |
| assert weights[0].param_name.endswith("lora_A.weight") | |
| assert weights[1].param_name.endswith("lora_B.weight") |
| "materialize_adapter_weights", | ||
| lambda *_: [adapter_weight], | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix the test cases:
| # Provide a base HF weight name so stream_adapter_weights_megatron_to_hf can | |
| # translate it into lora_A/lora_B names. | |
| monkeypatch.setattr( | |
| bridge, | |
| "_get_base_hf_weight_names_for_adapter", | |
| lambda *_args, **_kwargs: ["model.layers.0.mlp.linear_fc1.weight"], | |
| ) | |
|
|
||
| weights = list( | ||
| bridge.stream_adapter_weights_megatron_to_hf( | ||
| [SimpleNamespace(config=SimpleNamespace())], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix the test cases:
| [SimpleNamespace(config=SimpleNamespace())], | |
| [SimpleNamespace(config=SimpleNamespace(num_moe_experts=0))], |
|
|
||
| weights = list( | ||
| bridge.stream_adapter_weights_megatron_to_hf( | ||
| [SimpleNamespace(config=SimpleNamespace())], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix the test cases:
| [SimpleNamespace(config=SimpleNamespace())], | |
| [SimpleNamespace(config=SimpleNamespace(num_moe_experts=0))], |
Signed-off-by: Hollow Man <[email protected]>
…when EP > 1 (#1817) Signed-off-by: Hollow Man <[email protected]>
# Conflicts: # src/megatron/bridge/models/conversion/model_bridge.py # tests/unit_tests/models/test_model_bridge_lora.py
Signed-off-by: yaoyu-33 <[email protected]>
| ) | ||
| from megatron.bridge.peft.canonical_lora import ModuleDict | ||
| from megatron.bridge.peft.lora import LoRAMerge | ||
| from megatron.bridge.peft.utils import get_adapter_attributes_from_linear, is_expert_linear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| from megatron.bridge.peft.utils import get_adapter_attributes_from_linear, is_expert_linear | |
| from megatron.bridge.peft.utils import ParallelLinearAdapter, get_adapter_attributes_from_linear, is_expert_linear |
| if isinstance(adapter, ModuleDict): | ||
| adapter_name = local_param_name.removeprefix(local_base_prefix + ".adapter.").split(".")[0] | ||
| adapter = adapter[adapter_name] | ||
| input_is_parallel, _, _, _, _, base_linear_is_parallel = get_adapter_attributes_from_linear(to_wrap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ParallelLinearAdapter, base_linear_is_parallel can be different from the base layer (e.g. for linear_kv_down_proj).
| input_is_parallel, _, _, _, _, base_linear_is_parallel = get_adapter_attributes_from_linear(to_wrap) | |
| if isinstance(adapter, ParallelLinearAdapter): | |
| input_is_parallel = adapter.input_is_parallel | |
| base_linear_is_parallel = True | |
| else: | |
| input_is_parallel, _, _, _, _, base_linear_is_parallel = get_adapter_attributes_from_linear(to_wrap) |




What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information