Replies: 1 comment
-
I found custom settings by the authors are published as modified version of open r1. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
According to the recently published CPPO literature and GitHub
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models
https://arxiv.org/abs/2503.22342
https://github.com/lzhxmu/CPPO
It seems that this can be achieved by setting three parameters for GRPOConfig
https://github.com/lzhxmu/CPPO/blob/main/scripts/CPPO.sh
metric= 'smallest'
pruning= 0.5
allocation= True
Currently, none of these are included in the configuration items.
In v0.16.0, it became possible to set scale_rewards in response to the DRGRPO literature, and I was impressed by how quickly it could be set up.
Are there any plans to set parameters for CPPO as well?
Beta Was this translation helpful? Give feedback.
All reactions