Replies: 1 comment
-
Hi, same question here, have you figured out? Thank u!! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, trl community. In the PPO, trl has the following code, which i think might be problematic, because the reward is added on one position after the last EOS token, not on the EOS token like it use to be. You can see the
actual_end
below doesn't seem to be right?What does
padding_mask_p1
mean ? The link attached above does not seem to be a detailed explanation, with just one graph. Thank you for your help.Beta Was this translation helpful? Give feedback.
All reactions