Description
Thanks for providing an implementation for this architecture. I'm currently implementing a variant of it, and wondered why a padding is chosen in the conv layers although it is not mentioned in the LiLaNet paper (at least I couldn't find it):
pytorch-LiLaNet/lilanet/model/lilanet.py
Lines 78 to 81 in f68aae9
I think it might make sense that a padding is applied along the axis where the kernel size is 7, so that the spatial dimensions decrease in the same way as for the side that has a kernel size of 3. But it isn't mentioned in the paper, or am I missing something?
Also, why is a padding of 1 applied in the 1x1 convolution?
Bonus question: So both spatial dimensions are decreased by one after each LiLaBlock. Why not use an (additional) padding of 1 so that the size is preserved through the network?
I'm just a beginner in deep learning so any help is appreciated.