Hi,
I notice there is a consistency between the implementation and what you have described in the paper.
In your paper, the self-attention module is used to self-attend between keypoint features, which are extracted from support features; and the cross-attention module is used to cross-attend resulting keypoint features to query features.
However, in the implementation, the self-attention module is used to self-attend between query features:
And the cross-attention module is used to cross-attend resulting query features to the keypoint features:
In this file, x is the query features, and query_embed is in fact the keypoint features.