Question related to _yarn_linear_ramp_mask

I have a question for _yarn_linear_ramp_mask implementation, linear_func = (torch.arange(dim, dtype=torch.float32) - min) / (max - min). For this part, the calculation is based on the dimension rather than num of rotation, but when I checked the paper of defining the ramp function, it seems the r, alpha, beta are all relate to num of rotation rather than dimension since the definition of r(d) = L/lambda, which is the num of rotation comparing with alpha and beta. 

So is the implementation the same as the paper statement?


Could anyone help me understand this part?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question related to _yarn_linear_ramp_mask #60

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question related to _yarn_linear_ramp_mask #60

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions