Skip to content

GSPO: loss type==gspo错误 #595

@taoszhang

Description

@taoszhang

当我设置loss_type==gspo的时候,运行实验发现报错:
File "EasyR1/verl/trainer/core_algos.py", line 681, in compute_policy_loss
log_importance_ratio = negative_approx_kl_in_seq * response_mask
~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (5) must match the size of tensor b (5000) at non-singleton dimension 1
具体定位到函数:compute_policy_loss()
这里是因为
negative_approx_kl_in_seq = VF.masked_mean(negative_approx_kl, response_mask, dim=-1)
这里把negative_approx_kl_in_seq 维度坍缩了,但是在loss_type==gspo情况下,没有正确地升维吗?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions