Customized loss value

full parameter update里面，我最近在试一种新的loss function，就是在原有的next token prediction上面加一个regularized term，希望某些layer的weights能尽可能小。可是总会遇到一些奇怪的bug。下面是我加的代码和遇到的error：

比如在lomo_trainer.py中：
```python
lamda, regularization = 1, torch.tensor(0, requires_grad=True, dtype=torch.float32)
self.model.train()
for name, param in self.model.named_parameters():
    if "self_attn.q_proj" in name:
        with GatheredParameters(param):
            regularization = regularization + torch.mean(param)
...
loss = get_loss(outs.logits, batch['labels'], self.training_args.clip_loss_value) + lamda * regularization
```

可是这样做完，会造成lomo.py里面grad_norm()的`loss.backward(retain_graph=True)`产生`RuntimeError: The size of tensor a (0) must match the size of tensor b (4096) at non-singleton dimension 1`的错误。我猜是backward的时候找不到我新加的那些layer的weights。想请问一下，该怎么解决这个bug或者有没有更好的implementation？

非常感谢！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Customized loss value #52

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Customized loss value #52

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions