some questions...

Hello, i have some small questions about the code.

- First, the MLP block uses `self.pos` layer, actually, the author hadn't mentioned it in the paper. It acts like a depth-wise separable convolution together with `self.fc2`, but it add some extra parameters, the effective of this layer are huge??? 
- Second, in the Block code, i see the default args of kernel_size is 11, and padding is 5 for `self.a` layer, however, in the last stage(stage 4), the size of feature map is 7x7 (224x224 inputs), using kernel_size = 11 for convolution seems some strange. 

Thanks for your replay!
```
class MLP(nn.Module):
    def __init__(self, dim, mlp_ratio=4):
        super().__init__()

        self.norm = LayerNorm(dim, eps=1e-6, data_format="channels_first")
        
        self.fc1 = nn.Conv2d(dim, dim * mlp_ratio, 1)
        self.pos = nn.Conv2d(dim * mlp_ratio, dim * mlp_ratio, 3, padding=1, groups=dim * mlp_ratio)
        self.fc2 = nn.Conv2d(dim * mlp_ratio, dim, 1)
        self.act = nn.GELU()

    def forward(self, x):
        B, C, H, W = x.shape

        
        x = self.norm(x)
        x = self.fc1(x)
        x = self.act(x)
        x = x + self.act(self.pos(x))
        x = self.fc2(x)

        return 
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

some questions... #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

some questions... #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions