Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resize with pad transformation #6236

Open
oekosheri opened this issue Jul 5, 2022 · 16 comments
Open

resize with pad transformation #6236

oekosheri opened this issue Jul 5, 2022 · 16 comments

Comments

@oekosheri
Copy link

oekosheri commented Jul 5, 2022

🚀 The feature

In tensorflow tf.image has a method, tf.image.resize_with_pad, that pads and resizes if the aspect ratio of input and output images are different to avoid distortion. I couldn't find an equivalent in torch transformations and had to write it myself. I think it would be a useful feature to have.

Motivation, pitch

When moving to pytoch from Tensorflow, one does not want to lose handy features!

Alternatives

No response

Additional context

No response

cc @vfdev-5 @datumbox

@zhiqwang
Copy link
Contributor

zhiqwang commented Jul 5, 2022

Hi @oekosheri , you can check following function, it will do bottom-right padding mode

def _resize_image_and_masks(
image: Tensor,
self_min_size: float,
self_max_size: float,
target: Optional[Dict[str, Tensor]] = None,
fixed_size: Optional[Tuple[int, int]] = None,
) -> Tuple[Tensor, Optional[Dict[str, Tensor]]]:
if torchvision._is_tracing():
im_shape = _get_shape_onnx(image)
else:
im_shape = torch.tensor(image.shape[-2:])
size: Optional[List[int]] = None
scale_factor: Optional[float] = None
recompute_scale_factor: Optional[bool] = None
if fixed_size is not None:
size = [fixed_size[1], fixed_size[0]]
else:
min_size = torch.min(im_shape).to(dtype=torch.float32)
max_size = torch.max(im_shape).to(dtype=torch.float32)
scale = torch.min(self_min_size / min_size, self_max_size / max_size)
if torchvision._is_tracing():
scale_factor = _fake_cast_onnx(scale)
else:
scale_factor = scale.item()
recompute_scale_factor = True
image = torch.nn.functional.interpolate(
image[None],
size=size,
scale_factor=scale_factor,
mode="bilinear",
recompute_scale_factor=recompute_scale_factor,
align_corners=False,
)[0]
if target is None:
return image, target
if "masks" in target:
mask = target["masks"]
mask = torch.nn.functional.interpolate(
mask[:, None].float(), size=size, scale_factor=scale_factor, recompute_scale_factor=recompute_scale_factor
)[:, 0].byte()
target["masks"] = mask
return image, target

And I wrote a similar letterboxing mode at belows:

https://github.com/zhiqwang/yolov5-rt-stack/blob/main/yolort/models/transform.py#L65-L109

@oekosheri
Copy link
Author

Hi @zhiqwang, Thanks! you mean I can use "TORCH.NN.FUNCTIONAL.INTERPOLATE" ? I tried it on a an image tensor now and it constantly gives value error of input/output size not matching.
Also, this is pretty hidden. Why not add a simple wrapper to resize that does padding when aspect ratio can't be preserved?

@zhiqwang
Copy link
Contributor

zhiqwang commented Jul 5, 2022

Also, this is pretty hidden. Why not add a simple wrapper to resize that does padding when aspect ratio can't be preserved?

Let's invite @datumbox to this disscusion, and hear his viewpoint on this problem.

@zhiqwang
Copy link
Contributor

zhiqwang commented Jul 6, 2022

Just FYI, a previous issue #3286 also has some relevance to the discussion here.

@datumbox
Copy link
Contributor

datumbox commented Jul 6, 2022

@oekosheri Thanks for the proposal.

I would like to understand more about the use-case. Why can't we just use the resize in combination with pad? It should be 2 relatively straightforward calls. Maintaining TorchVision is a balancing act between providing the necessary primitives for people to build upon it and avoid bloating the library. A good reason to add a functionality is if it's very popular or there are specific tricky corner-cases that need to be handled carefully. Is this the case here?

@zhiqwang I wouldn't recommend using the method from detection as it's private and might change on the near future. Though you are right to say that the specific detection transforms file does what @oekosheri wants to do (resize and then batch + pad), the code does too many things and is very coupled to the logic of Detection. We've started moving some of this logic at the references and on the near future we plan to start porting them in to main TorchVision. @vfdev-5 is currently working on the prototype transforms to finalize the API.

@oekosheri
Copy link
Author

oekosheri commented Jul 6, 2022

Hi @datumbox , imagine you have input images from different sources with different sizes and aspect ratios. You want to transform them all to one final size without distortion. If you separate out pad and resize, you need to manually apply different transforms to different images. However, when you have one transform applied to all inputs, in it you can check whether or not to pad and how to pad. An example code would sth like this:

import torchvision.transforms.functional as F


class Resize_with_pad:
    def __init__(self, w=1024, h=768):
        self.w = w
        self.h = h

    def __call__(self, image):

        w_1, h_1 = image.size
        ratio_f = self.w / self.h
        ratio_1 = w_1 / h_1


        # check if the original and final aspect ratios are the same within a margin
        if round(ratio_1, 2) != round(ratio_f, 2):

            # padding to preserve aspect ratio
            hp = int(w_1/ratio_f - h_1)
            wp = int(ratio_f * h_1 - w_1)
            if hp > 0 and wp < 0:
                hp = hp // 2
                image = F.pad(image, (0, hp, 0, hp), 0, "constant")
                return F.resize(image, [self.h, self.w])

            elif hp < 0 and wp > 0:
                wp = wp // 2
                image = F.pad(image, (wp, 0, wp, 0), 0, "constant")
                return F.resize(image, [self.h, self.w])

        else:
            return F.resize(image, [self.h, self.w])

@datumbox
Copy link
Contributor

datumbox commented Jul 6, 2022

@oekosheri I understand this is strongly motivated for the Detection use-case where things need to be resized to a maximum size proportionally and then padded to ensure we can produce batches, right?

@oekosheri
Copy link
Author

@datumbox They are padded to ensure images that have different original aspect ratio to the final one, don't get distorted. Distorted images may not work well with CNNs. I updated a mistake in the code above. As it is now, it produces the exact output that tf.image.resize_with_pad does.

@datumbox
Copy link
Contributor

datumbox commented Jul 6, 2022

@oekosheri Thanks for the references and context. I'll sync with @vfdev-5 offline to see if we can add this on the new API and how. I'll leave the issue open to ensure it stays on our radar.

@Inkorak
Copy link

Inkorak commented Jul 9, 2022

Yes, I also think that such transformation would be very useful. I also had cases when images of different resolutions and aspect ratios, but when cropping images, I could lose pieces important for classification (this was a classification of defects and they could be on the edge of the image) and I would like to maintain the aspect ratio in order to avoid strong distortions. So I had to use a combination of LongestMaxSize and PadIfNeeded from the Albumentations library. I would like something similar, you can implement it as suggested here in the form of one transformation.

@H-Sorkatti
Copy link

H-Sorkatti commented Aug 7, 2022

I Strongly second this feature. It is a very important transformation to have.
I almost always have to use hacks to resolve this when working with images.

@AsiaCao
Copy link

AsiaCao commented Dec 5, 2022

ditto, it'd be really handy to have one.
Our team also uses such a feature. We currently implemented a custom version with albumentations that only works for numpy (not torch tensor). And we are looking for an alternative that works with torch tensor and can be converted/embedded into an onnx graph via torch.onnx.export.

curious, do you think the torchvision team could implement it soon? @datumbox @zhiqwang

@datumbox
Copy link
Contributor

datumbox commented Dec 5, 2022

@AsiaCao Thanks for the input. Right now we are focusing on finalizing the Transforms V2 API. Once we complete the work on that front, we can review this request and see what's the best way forwards.

@AsiaCao
Copy link

AsiaCao commented Dec 6, 2022

thanks @datumbox

@swap-10
Copy link

swap-10 commented Jun 30, 2023

Any plans for this to be implemented now? This would be convenient to have.
Thanks!

@amanikiruga
Copy link
Contributor

amanikiruga commented Sep 24, 2023

Any update to this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants