Skip to content

Customized loss value #52

@ZN1010

Description

@ZN1010

full parameter update里面,我最近在试一种新的loss function,就是在原有的next token prediction上面加一个regularized term,希望某些layer的weights能尽可能小。可是总会遇到一些奇怪的bug。下面是我加的代码和遇到的error:

比如在lomo_trainer.py中:

lamda, regularization = 1, torch.tensor(0, requires_grad=True, dtype=torch.float32)
self.model.train()
for name, param in self.model.named_parameters():
    if "self_attn.q_proj" in name:
        with GatheredParameters(param):
            regularization = regularization + torch.mean(param)
...
loss = get_loss(outs.logits, batch['labels'], self.training_args.clip_loss_value) + lamda * regularization

可是这样做完,会造成lomo.py里面grad_norm()的loss.backward(retain_graph=True)产生RuntimeError: The size of tensor a (0) must match the size of tensor b (4096) at non-singleton dimension 1的错误。我猜是backward的时候找不到我新加的那些layer的weights。想请问一下,该怎么解决这个bug或者有没有更好的implementation?

非常感谢!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions