Skip to content

Clarification on Position Encoding in the Proof #6

@rababit

Description

@rababit

Hello,

I was reviewing the proof presented in the paper regarding how NoPE (No Positional Encoding) can recover absolute positions in the hidden states, and I had some concerns and questions about the underlying assumptions and the conclusions drawn. Specifically, I believe the proof mostly demonstrates that the token can recover positional information, rather than proving that positional information is recovered for all tokens in the sequence.

Here are my thoughts:

Issue with the Embedding Design:
The approach in the proof seems to suggest that by using a special embedding for the token, its positional information can be retained. However, the embedding matrix is designed in such a way that only the token actually encodes positional information (in the second and third dimensions). For all other tokens in the sequence, there is no explicit positional encoding in the embeddings, which could lead to the conclusion that only 's position is encoded rather than the positions of all tokens.

Self-Attention Mechanism:
The self-attention mechanism operates on the same query, key, and value transformations for all tokens, including and other tokens. While the token has a specially designed embedding, the other tokens' embeddings lack explicit positional information. This leads me to believe that while might retain positional information through attention, the proof does not show that the same is true for other tokens in the sequence.

Potential Misinterpretation:
From my understanding, the proof mainly demonstrates that the token's position is encoded due to its special embedding. However, it does not seem to show that the same happens for other tokens in the sequence, which is a critical aspect of the NoPE model's claim.

I would appreciate it if you could provide clarification on these points or any additional explanation that might resolve these concerns. It would be great to understand whether the proof is intended to apply only to the token or if the proposed method indeed recovers positional information for all tokens as suggested.

Thank you very much for your time and for the work on this interesting approach!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions