Skip to content

Conversation

@felipemello1
Copy link
Contributor

@felipemello1 felipemello1 commented Jan 29, 2026

  • I have personally reviewed this PR and description before asking others to do so. It meets the quality bar I expect from others. I understand that if this PR is perceived as unverified AI-generated code, it will be closed without further explanation.
  • I have run tests and confirmed that this code works

Description

  • We want to drop a sample that is truncated, since we cannot properly compute rewards for it
  • Previously we were dropping all episodes in a group if any episode was truncated
  • This eventually caused the buffer to never have episodes available

Fix: instead of dropping all samples, we just set the adv of truncated samples to 0.
You may ask: why not just drop them?
Because if our bsz=8, and we drop 1, now we have only 7 samples, and the trainer may need to wait until the next batch to train on the previous one, but at this point, the replay buffer may evict older policies, we would have a mess

image

Test plan

image image

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant