Fix nan leak from masked positions in `compute_approx_kl` by jamesbraza · Pull Request #1635 · NovaSky-AI/SkyRL

jamesbraza · 2026-05-08T21:18:14Z

Replace kld * loss_mask with torch.where(loss_mask.bool(), kld * loss_mask, 0.0) in compute_approx_kl, so nan at masked positions (where 0 * nan = nan) can no longer leak through into policy_kl / final_loss.

Closes #1633

Masking via `kld * loss_mask` propagates `nan` from masked positions because IEEE 754 defines `0 * nan = nan`, poisoning the downstream masked_mean and any metric (e.g. policy_kl, final_loss) that consumes the KL scalar. Switch to `masked_fill` so masked positions are forced to 0.0 regardless of the input value there. Autograd is unaffected. Add a parametrized regression test covering all four estimator types (k1, k2, k3, abs) that injects `nan` at a masked position and asserts the output and downstream `masked_mean` stay finite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request updates the KL divergence computation to prevent nan leakage from masked positions by replacing direct multiplication with masked_fill. A corresponding unit test was added to verify the fix. The reviewer recommended using torch.where instead of masked_fill to preserve potential soft masking functionality while still addressing the nan leakage issue.

Switch the mask-sanitization step from `masked_fill(~mask.bool(), 0.0)` to `torch.where(mask.bool(), kld * mask, 0.0)` so non-binary mask values still scale the kept positions multiplicatively, while masked (mask==0) positions are still forced to 0.0 so non-finite inputs there cannot leak. Combine the two prior regression test cases into one parametrized test (`test_compute_approx_kl_applies_loss_mask`) that exercises both invariants in one shot: a soft mask `{1.0, 0.5, 0.25, 0.0}` with `nan` injected at the masked position, asserting kept-position scaling and masked-position zeroing for all four estimator types. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

erictang000

nice, this lgtm, thanks!

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

Comment thread skyrl/backends/skyrl_train/utils/ppo_utils.py Outdated

erictang000 approved these changes May 8, 2026

View reviewed changes

erictang000 merged commit 2df8f51 into NovaSky-AI:main May 8, 2026
4 of 5 checks passed

jamesbraza deleted the fix-compute-approx-kl-nan-leak branch May 8, 2026 23:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix nan leak from masked positions in `compute_approx_kl`#1635

Fix nan leak from masked positions in `compute_approx_kl`#1635
erictang000 merged 2 commits into
NovaSky-AI:mainfrom
EdisonScientific:fix-compute-approx-kl-nan-leak

jamesbraza commented May 8, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

erictang000 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jamesbraza commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

erictang000 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jamesbraza commented May 8, 2026 •

edited

Loading