Using MaxViT as a backbone for object detection #2430
-
Hello, I am trying to use MaxVit ('maxvit_base_tf_512.in1k') as a backbone for object detection, using a FasterRCNN detector. I'm basically following the approach shown here: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html#modifying-the-model-to-add-a-different-backbone. However, when I test the model's inference with some example images I get the error message: Can anybody shed some light into this issue? where is this height=200 coming from? or more in general, can someone point me to an example of using MaxVit as a backbone for object detection? Here's my code
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
So, I don't know what's going on within your combined model, but the backbone itself works fine if it gets a 512x512 image, so seems like the images aren't actually 512... height = 200, would suggest something around 800 if it's after the first stage.. or 400 if it's right after the stem... or there is something happening within FasterRCNN that's altering them.
|
Beta Was this translation helpful? Give feedback.
Took a quick look while waiting for a result, the fasterrcnn wrapper does it's own transforms, so you have to change that https://github.com/pytorch/vision/blob/0d68c7df8640abff43355afd57c494cf5d74f4a9/torchvision/models/detection/faster_rcnn.py#L171-L175