-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resize with pad transformation #6236
Comments
Hi @oekosheri , you can check following function, it will do bottom-right padding mode vision/torchvision/models/detection/transform.py Lines 25 to 71 in d6e39ff
And I wrote a similar letterboxing mode at belows: https://github.com/zhiqwang/yolov5-rt-stack/blob/main/yolort/models/transform.py#L65-L109 |
Hi @zhiqwang, Thanks! you mean I can use "TORCH.NN.FUNCTIONAL.INTERPOLATE" ? I tried it on a an image tensor now and it constantly gives value error of input/output size not matching. |
Let's invite @datumbox to this disscusion, and hear his viewpoint on this problem. |
Just FYI, a previous issue #3286 also has some relevance to the discussion here. |
@oekosheri Thanks for the proposal. I would like to understand more about the use-case. Why can't we just use the resize in combination with pad? It should be 2 relatively straightforward calls. Maintaining TorchVision is a balancing act between providing the necessary primitives for people to build upon it and avoid bloating the library. A good reason to add a functionality is if it's very popular or there are specific tricky corner-cases that need to be handled carefully. Is this the case here? @zhiqwang I wouldn't recommend using the method from detection as it's private and might change on the near future. Though you are right to say that the specific detection transforms file does what @oekosheri wants to do (resize and then batch + pad), the code does too many things and is very coupled to the logic of Detection. We've started moving some of this logic at the references and on the near future we plan to start porting them in to main TorchVision. @vfdev-5 is currently working on the prototype transforms to finalize the API. |
Hi @datumbox , imagine you have input images from different sources with different sizes and aspect ratios. You want to transform them all to one final size without distortion. If you separate out pad and resize, you need to manually apply different transforms to different images. However, when you have one transform applied to all inputs, in it you can check whether or not to pad and how to pad. An example code would sth like this:
|
@oekosheri I understand this is strongly motivated for the Detection use-case where things need to be resized to a maximum size proportionally and then padded to ensure we can produce batches, right? |
@datumbox They are padded to ensure images that have different original aspect ratio to the final one, don't get distorted. Distorted images may not work well with CNNs. I updated a mistake in the code above. As it is now, it produces the exact output that tf.image.resize_with_pad does. |
@oekosheri Thanks for the references and context. I'll sync with @vfdev-5 offline to see if we can add this on the new API and how. I'll leave the issue open to ensure it stays on our radar. |
Yes, I also think that such transformation would be very useful. I also had cases when images of different resolutions and aspect ratios, but when cropping images, I could lose pieces important for classification (this was a classification of defects and they could be on the edge of the image) and I would like to maintain the aspect ratio in order to avoid strong distortions. So I had to use a combination of LongestMaxSize and PadIfNeeded from the Albumentations library. I would like something similar, you can implement it as suggested here in the form of one transformation. |
I Strongly second this feature. It is a very important transformation to have. |
ditto, it'd be really handy to have one. curious, do you think the torchvision team could implement it soon? @datumbox @zhiqwang |
@AsiaCao Thanks for the input. Right now we are focusing on finalizing the Transforms V2 API. Once we complete the work on that front, we can review this request and see what's the best way forwards. |
thanks @datumbox |
Any plans for this to be implemented now? This would be convenient to have. |
Any update to this? |
🚀 The feature
In tensorflow tf.image has a method, tf.image.resize_with_pad, that pads and resizes if the aspect ratio of input and output images are different to avoid distortion. I couldn't find an equivalent in torch transformations and had to write it myself. I think it would be a useful feature to have.
Motivation, pitch
When moving to pytoch from Tensorflow, one does not want to lose handy features!
Alternatives
No response
Additional context
No response
cc @vfdev-5 @datumbox
The text was updated successfully, but these errors were encountered: