Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

README.md

Dr. GRPO Open-Source Implementation

https://github.com/sail-sg/understand-r1-zero

This paper suggests a way to calculate the unbiased policy gradient.

Configuration

actor_rollout_ref:
  actor:
    loss_agg_mode: "seq-mean-token-sum-norm" # turn off seq-dim averaging
    use_kl_loss: False
algorithm:
  norm_adv_by_std_in_grpo: False # turn off standard deviation norm

, with all other parameters set same as GRPO.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drgrpo

drgrpo

README.md

Dr. GRPO Open-Source Implementation

Configuration

Files

drgrpo

Directory actions

More options

Directory actions

More options

Latest commit

History

drgrpo

Folders and files

parent directory

README.md

Dr. GRPO Open-Source Implementation

Configuration