[megatron] fix: add protections for logits_processor_args.pop("loss_mask"), which may cause the forward_fn of value net collapse (verl-project#5204)

albertcity · albertyi · web-flow · commit ad404cb52c6e · 2026-02-05T16:20:41.000+08:00
### What does this PR do?

Fix a bug in `gpt_model_forward_no_padding`.
The `MegatronEngineWithValueHead` class fails to pass
`logits_processor_args` to `forward_fn`, causing a crash when
`gpt_model_forward_no_padding` attempts to pop the `loss_mask`.



### Test

&gt; No need.


### Design &amp; Code Changes

&gt; add `if logits_processor_args and "loss_mask" in
logits_processor_args:` check before try to
`logits_processor_args.pop("loss_mask")`

Co-authored-by: albertyi &lt;albertyi@tencent.com&gt;
diff --git a/verl/models/mcore/model_forward.py b/verl/models/mcore/model_forward.py
@@ -198,7 +198,8 @@ def gptmodel_forward_no_padding(
             }
             model_kwargs["labels"] = args["label"].contiguous()
             model_kwargs["loss_mask"] = args["loss_mask"].contiguous()
-        logits_processor_args.pop("loss_mask")
+        if logits_processor_args and 'loss_mask' in logits_processor_args:
+            logits_processor_args.pop("loss_mask")
 
         # For VLM model, need to pass bshd format `input_ids` and `attention_mask`.
         attention_mask = None
@@ -251,7 +252,8 @@ def gptmodel_forward_no_padding(
             }
             model_kwargs["labels"] = args["label"].contiguous()
             model_kwargs["loss_mask"] = args["loss_mask"].contiguous()
-        logits_processor_args.pop("loss_mask")
+        if logits_processor_args and 'loss_mask' in logits_processor_args:
+            logits_processor_args.pop("loss_mask")
 
         output_orig = model(
             input_ids=input_ids_bshd,