What supervision RL is used for GRPO? #37
Unanswered
helperfunc
asked this question in
Q&A
Replies: 1 comment
-
https://github.com/huggingface/open-r1/blob/main/src/open_r1/grpo.py#L41 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
For different dataset(e.g. coding), which method is used as the rewards? Is it the outcome supervision RL or process supervision RL is used for GRPO?
Beta Was this translation helpful? Give feedback.
All reactions