-
Notifications
You must be signed in to change notification settings - Fork 68
Open
Description
full parameter update里面,我最近在试一种新的loss function,就是在原有的next token prediction上面加一个regularized term,希望某些layer的weights能尽可能小。可是总会遇到一些奇怪的bug。下面是我加的代码和遇到的error:
比如在lomo_trainer.py中:
lamda, regularization = 1, torch.tensor(0, requires_grad=True, dtype=torch.float32)
self.model.train()
for name, param in self.model.named_parameters():
if "self_attn.q_proj" in name:
with GatheredParameters(param):
regularization = regularization + torch.mean(param)
...
loss = get_loss(outs.logits, batch['labels'], self.training_args.clip_loss_value) + lamda * regularization可是这样做完,会造成lomo.py里面grad_norm()的loss.backward(retain_graph=True)产生RuntimeError: The size of tensor a (0) must match the size of tensor b (4096) at non-singleton dimension 1的错误。我猜是backward的时候找不到我新加的那些layer的weights。想请问一下,该怎么解决这个bug或者有没有更好的implementation?
非常感谢!
Metadata
Metadata
Assignees
Labels
No labels