-
Hi, I need some clarification regarding the block selection in the "swinv2_base_window12_192.ms_in22k" model when setting Swin v2 architecture has a series of stages and each stage has a series of blocks. From my understanding, this setup extracts features from the four different stages of the model. It actually seems that the maximum allowed index in Additionally, is this a general rule, where the last block in a stage is always used for feature extraction? I understand that in ViT architectures, this issue does not arise, as they consist of blocks without distinct stages, allowing for precise control over which block to select. Thanks for your insights! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
@vadori yes, when out indices are matched to stages (and thus strides, meaning resolution reduction level) it's always the deepest block for that stride. So last block. |
Beta Was this translation helpful? Give feedback.
@vadori yes, when out indices are matched to stages (and thus strides, meaning resolution reduction level) it's always the deepest block for that stride. So last block.