Skip to content

Commit fd9a5d9

Browse files
authored
[algo] fix gspo sequence ratio broadcast (#596)
1 parent 55b40c8 commit fd9a5d9

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

verl/trainer/core_algos.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -469,7 +469,7 @@ def compute_policy_loss(
469469
if loss_type == "gspo_token":
470470
log_importance_ratio = negative_approx_kl_in_seq.detach().unsqueeze(-1) + log_probs - log_probs.detach()
471471
else:
472-
log_importance_ratio = negative_approx_kl_in_seq * response_mask
472+
log_importance_ratio = negative_approx_kl_in_seq.unsqueeze(-1) * response_mask
473473
else:
474474
log_importance_ratio = negative_approx_kl
475475

0 commit comments

Comments
 (0)