How to process "Token-to-Image Cross-Attention 和 Image-to-Token Cross-Attention" in PerSAM's Decoder?

Thank you for your attracting work "PERSONALIZE SEGMENT ANYTHING MODEL WITH  ONE SHOT". There are some problems getting ready to be answered. 

Firstly, how to process "Token-to-Image Cross-Attention 和 Image-to-Token Cross-Attention" in PerSAM's Decoder? I  cannot find the relevant code in the open code, though I find
"sim = (sim - sim.mean()) / torch.std(sim)
sim = F.interpolate(sim.unsqueeze(0).unsqueeze(0), size=(64, 64), mode="bilinear")
attn_sim = sim.sigmoid_().unsqueeze(0).flatten(3)
".
Secondly, how to achieve the formula(8) ? I cannot find the implementation in the open code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to process "Token-to-Image Cross-Attention 和 Image-to-Token Cross-Attention" in PerSAM's Decoder? #55

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to process "Token-to-Image Cross-Attention 和 Image-to-Token Cross-Attention" in PerSAM's Decoder? #55

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions