generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add support for DGPO (ICLR 2026) to GRPO #5102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
YanqiDai
wants to merge
28
commits into
huggingface:main
Choose a base branch
from
YanqiDai:grpo-dgpo
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+223
−2
Open
Changes from 20 commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
4a8bf02
Add DGPO (Difficulty-Aware Group Policy Optimization, ICLR 2026) supp…
YanqiDai 0383a57
Merge branch 'main' into grpo-dgpo
YanqiDai b0f72ef
Revise DGPO description and usage instructions
YanqiDai 90fc3f5
Remove DGPO section from grpo_trainer.md
YanqiDai 34a69eb
Remove ICLR 2026
YanqiDai da8c445
Apply all other suggestions from code review
YanqiDai 721c3eb
Merge branch 'main' into grpo-dgpo
YanqiDai 25def33
Polish the description of accuracy handling logic in DQW
YanqiDai 1ecdae6
Rewrite the DGPO code
YanqiDai 6dd54db
Merge branch 'huggingface:main' into grpo-dgpo
YanqiDai 6291ae3
Remove ICLR 2026
YanqiDai df1fb48
Recover the code position of is_std_zero
YanqiDai e2b254d
Merge branch 'main' into grpo-dgpo
YanqiDai fb95c87
Merge branch 'main' into grpo-dgpo
YanqiDai 89e1384
Apply suggestions from code review
YanqiDai e07db90
Remove repeated use_bias_correction_kl in suggestions from grpo_config
YanqiDai f34d3f1
Modify _compute_advantages_with_dgae and implement it directly in _ge…
YanqiDai 1535983
Merge branch 'main' into grpo-dgpo
YanqiDai 2137c9d
Merge branch 'main' into grpo-dgpo
YanqiDai 7c5acea
Merge branch 'main' into grpo-dgpo
YanqiDai ecdbe7e
Merge branch 'main' into grpo-dgpo
YanqiDai 4a2c30c
Resolve conflicts between the grpo-dgpo branch and the main branch.
YanqiDai aec3912
Remove an extra blank line in grpo_config.py
YanqiDai 4b864c1
Merge branch 'main' into grpo-dgpo
YanqiDai 4420d51
Remove redundant gather() for is_std_zero found by cursor
YanqiDai 158b3de
Merge branch 'main' into grpo-dgpo
YanqiDai 56e489a
Fix type casting for `global_completion_length_sum`, `local_completio…
YanqiDai 87782d9
Refactor standard deviation calculation in GRPOTrainer to use nanstd …
YanqiDai File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.