-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[megatron] feat: Share actor and ref in LoRA #4673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable feature by enabling LoRA for Megatron models, allowing the actor and reference models to share weights. The implementation correctly uses a context manager to disable LoRA adapters for computing reference log probabilities. My review includes two high-severity suggestions: one to address duplicated code for determining lora_rank across multiple files, and another to fix a performance issue where entropy is unnecessarily calculated and then discarded.
60ce274 to
aac6651
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the LoRA configuration handling to support a nested structure and introduces a mechanism to share the actor and reference models in megatron_workers.py when using LoRA. This is achieved by temporarily disabling the LoRA adapter to compute reference log probabilities, which is a clean and efficient approach.
My main feedback is regarding the duplicated logic for determining the lora_rank across four different trainer files. I've left comments with suggestions to refactor this into a shared utility function for better maintainability. The rest of the changes look good.
a59fe4a to
6a59c97
Compare
f231448 to
e8ad2a0
Compare
For `compute_ref_log_prob`, we can do that by disabling lora layers temporarily for the forward pass, as base weight are frozen and only lora layers are trained. This has already been supported in FSDP LoRA. Signed-off-by: Hollow Man <[email protected]>
e8ad2a0 to
4d9bd5c
Compare
Waiting for:
For
compute_ref_log_prob, we can do that by disabling lora layers temporarily for the forward pass, as base weight are frozen and only lora layers are trained.This has already been supported in FSDP LoRA.
What does this PR do?
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)✨ Presented to you with Mind Lab - A Lab for Experiential Intelligence.