-
Notifications
You must be signed in to change notification settings - Fork 128
Open
Description
I have a question for _yarn_linear_ramp_mask implementation, linear_func = (torch.arange(dim, dtype=torch.float32) - min) / (max - min). For this part, the calculation is based on the dimension rather than num of rotation, but when I checked the paper of defining the ramp function, it seems the r, alpha, beta are all relate to num of rotation rather than dimension since the definition of r(d) = L/lambda, which is the num of rotation comparing with alpha and beta.
So is the implementation the same as the paper statement?
Could anyone help me understand this part?
Metadata
Metadata
Assignees
Labels
No labels