Revert "[reward] fix: fix reward computation in _validate when use_r… #1
setup
e2e_ppo_trainer_fsdp_sglang
0s
e2e_ppo_trainer_fsdp-qwen2_5vl-3b
cleanup
4s