Clarification on Position Encoding in the Proof

Hello,

I was reviewing the proof presented in the paper regarding how NoPE (No Positional Encoding) can recover absolute positions in the hidden states, and I had some concerns and questions about the underlying assumptions and the conclusions drawn. Specifically, I believe the proof mostly demonstrates that the <bos> token can recover positional information, rather than proving that positional information is recovered for all tokens in the sequence.

Here are my thoughts:

Issue with the Embedding Design:
The approach in the proof seems to suggest that by using a special embedding for the <bos> token, its positional information can be retained. However, the embedding matrix is designed in such a way that only the <bos> token actually encodes positional information (in the second and third dimensions). For all other tokens in the sequence, there is no explicit positional encoding in the embeddings, which could lead to the conclusion that only <bos>'s position is encoded rather than the positions of all tokens.

Self-Attention Mechanism:
The self-attention mechanism operates on the same query, key, and value transformations for all tokens, including <bos> and other tokens. While the <bos> token has a specially designed embedding, the other tokens' embeddings lack explicit positional information. This leads me to believe that while <bos> might retain positional information through attention, the proof does not show that the same is true for other tokens in the sequence.

Potential Misinterpretation:
From my understanding, the proof mainly demonstrates that the <bos> token's position is encoded due to its special embedding. However, it does not seem to show that the same happens for other tokens in the sequence, which is a critical aspect of the NoPE model's claim.

I would appreciate it if you could provide clarification on these points or any additional explanation that might resolve these concerns. It would be great to understand whether the proof is intended to apply only to the <bos> token or if the proposed method indeed recovers positional information for all tokens as suggested.

Thank you very much for your time and for the work on this interesting approach!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification on Position Encoding in the Proof #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarification on Position Encoding in the Proof #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions