Need help to understand #3372
Unanswered
FloSophoraeX
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I’m using the GRPO algorithm. Inside the
compute_policy_loss_vanilla()function in/verl/trainer/ppo/core_algos.py, I printlog_prob.shapeand see[1, max_prompt_length], i.e. the batch-size is 1.How is the averaging over G computed and reflected in the GRPO implementation?
Could anyone kindly help answer this? Thank you very much!
Beta Was this translation helpful? Give feedback.
All reactions