-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[ci] chore: migrate all rm related ci to reward loop #4520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request migrates all Reward-Model-related CI to use the new reward_loop feature. The changes span across example scripts, test scripts, and configuration files, consistently replacing the old reward model setup with the reward_loop configuration. The core logic in verl/trainer/ppo/ray_trainer.py is updated to handle both the legacy and the new reward_loop paths. I've found one minor issue in a test script where a parameter is duplicated. Overall, the changes look good and align with the PR's objective.
| reward_model.profiler.enable=$PROFILE_ENABLE \ | ||
| reward_model.profiler.ranks=$PROFILE_RANKS \ | ||
| reward_model.profiler.all_ranks=$PROFILE_RANKS_ALL \ | ||
| reward_model.use_reward_loop=True \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this PR do?
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)