RuntimeError "Wrong input shape" when RandomBatchGeoSampler patch_size < 32 #1610

roybenhayun · 2023-10-02T20:28:51Z

roybenhayun
Oct 2, 2023

we are training a segmentation model (FWIW, model="unet", backbone="resnext50_32x4d", weights="imagenet") and when trying to improve results and try different model parameters, we tried to reduce the patch_size used by RandomBatchGeoSampler. initially used 256 as in most examples we've seen. however, the elements we try to segment may be in 10m-20m sizes.
trying to reduce the patch_size to 128, 64, 32 worked. but when trying 16, getting RuntimeError: Wrong input shape height=16, width=16. Expected image height and width divisible by 32. Consider pad your images to shape (32, 32).

from debugging a bit, the exception is thrown at check_input_shape() in \segmentation_models_pytorch\base\model.py and it's related to ResNet Encoder output_stride which is 32 by default (see output_stride==32 in get_encoder() in \segmentation_models_pytorch\encoders_init_.py). Seems like the TorchGeo RandomBatchGeoSampler patch_size is not compatible with the model\encoder stride.

as we need to identify samples in 10m-20m, I assume we should use a small patch_size. for example, if I understand correctly, in Sentinel2 with 10m res, a patch_size of 32 is 32 pixels each 10m. even in 1m resolution we would need 8 or 16 patch_size. please suggest otherwise or correct if not the case.

so the question is if there is a way to change the default output_stride of the Encoder, or another way to sample patches below 32, to 16 and even 8, 4 and 2..

thanks!

adamjstewart · 2023-10-03T11:26:45Z

adamjstewart
Oct 3, 2023
Maintainer

I don't think it's possible to use a U-Net with patches smaller than 32 x 32 px.

the elements we try to segment may be in 10m-20m sizes.

Can you clarify what you mean by this? It's pretty normal to segment images with objects smaller than the dimensions of the image. I don't see why you can't use 256 for those as well.

1 reply

roybenhayun Oct 3, 2023
Author

I tried switching the model from unet to deeplabv3 which has 16 size stride and technically training and prediction worked with patch_size 16.
however, I am not too familiar with differences between unet and deeplabv3. would be happy to get explanation on thoughts on why use either with torchgeo and remote sensing.

as for the objects dimensions, it may be a wrong assumption on our behalf.
I've seen 256 patches used in the training example for big objects with a lot of ground truth for the training. was assuming we would need smaller patches. so I'll try to rephrase that part of the question. thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RuntimeError "Wrong input shape" when RandomBatchGeoSampler patch_size < 32 #1610

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

RuntimeError "Wrong input shape" when RandomBatchGeoSampler patch_size < 32 #1610

Uh oh!

Uh oh!

roybenhayun Oct 2, 2023

Replies: 1 comment · 1 reply

Uh oh!

adamjstewart Oct 3, 2023 Maintainer

Uh oh!

roybenhayun Oct 3, 2023 Author

roybenhayun
Oct 2, 2023

Replies: 1 comment 1 reply

adamjstewart
Oct 3, 2023
Maintainer

roybenhayun Oct 3, 2023
Author