-
Notifications
You must be signed in to change notification settings - Fork 79
Description
I have read the code and I am confused about add shift part to the bias.
When changing ffn's linear module to shiftedlinear, it is about doing XW+B = (X + shifted)W + B - shifted * W = (X + shifted)s^-1(L1L2 + R) + B - shifted * W = Xs^-1(L1L2 + R) + shifted * s^-1(L1L2 + R) + B - shifted * W
But in convert.py, we only add shifted * s^-1 * L1L2 to the B (which has already sub shifted * W). But I cannot find where we handle shifted * s^-1 * R part. Is it a bug?
In convert.py, line 42~57
if lora is not None and (smooth is not None or shift is not None):
# unsmooth lora down projection
dtype = weight.dtype
lora_down, lora_up = lora
lora_down = lora_down.to(dtype=torch.float64)
if smooth is not None and not smooth_fused:
lora_down = lora_down.div_(smooth.to(torch.float64).unsqueeze(0))
if shift is not None:
bias = torch.zeros([lora_up.shape[0]], dtype=torch.float64) if bias is None else bias.to(torch.float64)
if shift.numel() == 1:
shift = shift.view(1, 1).expand(lora_down.shape[1], 1).to(torch.float64)
else:
shift = shift.view(-1, 1).to(torch.float64)
bias = bias.add_((lora_up.to(dtype=torch.float64) @ lora_down @ shift).view(-1))
bias = bias.to(dtype=dtype)
lora = (lora_down.to(dtype=dtype), lora_up)