-
Notifications
You must be signed in to change notification settings - Fork 2.4k
feat: DeepSeek V3.2 Off-policy sequence masking #4689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Ensures the threshold is >= 0 in GRPOConfig to prevent invalid configuration.
…ernel` Since we need logprobs, compatibility should be done on the Liger side. This check prevent users thinking It would work with Liger loss
|
What does this PR do?
Fixes #4697
This PR aims to implement the Off-policy sequence masking from the DeepSeek V3.2 paper
Before submitting
Pull Request section?
to it if that's the case.
Changes
_get_off_policy_maskstaticmethod to compute the off-policy maskadded logic in_compute_lossto inject the off-policy mask into the loss mask_compute_lossto inject the off-policy mask only to the surrogate loss__post_init__checks and some tests (including not allowing with Lieger loss atm)Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.