Open
Description
Hello,
I have noticed some variability in the benchmark scores by running the same GRPO experiment multiple times. I understand that the grpo code is not entirely reproducible, however, have you noticed the variations to be a lot?
Also, have you tried more ways to make the runs more reproducible?
Thank you!
Metadata
Metadata
Assignees
Labels
No labels