当我设置loss_type==gspo的时候,运行实验发现报错:
File "EasyR1/verl/trainer/core_algos.py", line 681, in compute_policy_loss
log_importance_ratio = negative_approx_kl_in_seq * response_mask
~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (5) must match the size of tensor b (5000) at non-singleton dimension 1
具体定位到函数:compute_policy_loss()
这里是因为
negative_approx_kl_in_seq = VF.masked_mean(negative_approx_kl, response_mask, dim=-1)
这里把negative_approx_kl_in_seq 维度坍缩了,但是在loss_type==gspo情况下,没有正确地升维吗?
当我设置loss_type==gspo的时候,运行实验发现报错:
File "EasyR1/verl/trainer/core_algos.py", line 681, in compute_policy_loss
log_importance_ratio = negative_approx_kl_in_seq * response_mask
~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (5) must match the size of tensor b (5000) at non-singleton dimension 1
具体定位到函数:compute_policy_loss()
这里是因为
negative_approx_kl_in_seq = VF.masked_mean(negative_approx_kl, response_mask, dim=-1)
这里把negative_approx_kl_in_seq 维度坍缩了,但是在loss_type==gspo情况下,没有正确地升维吗?