Need help to understand #3372

FloSophoraeX · 2025-09-06T08:43:23Z

FloSophoraeX
Sep 6, 2025

I’m using the GRPO algorithm. Inside the compute_policy_loss_vanilla() function in /verl/trainer/ppo/core_algos.py, I print log_prob.shape and see [1, max_prompt_length], i.e. the batch-size is 1.
How is the averaging over G computed and reflected in the GRPO implementation?
Could anyone kindly help answer this? Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Need help to understand #3372

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Need help to understand #3372

Uh oh!

FloSophoraeX Sep 6, 2025

Replies: 0 comments

FloSophoraeX
Sep 6, 2025