-
Notifications
You must be signed in to change notification settings - Fork 55
Open
Description
In your paper, you mentioned that "In practice, the projection layer can transform the queries to any desired output, making the self-attention module redundant". But self-attention has softmax, which means that self-attention is non-linear in general, but the projection layer can only make linear transformations. I don't understand why you said it can transform the queries to any desired output.
Metadata
Metadata
Assignees
Labels
No labels