Skip to content

Added the KeyPoints TVTensor #8817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 60 commits into
base: main
Choose a base branch
from

Conversation

Alexandre-SCHOEPP
Copy link

Description

Adds and integrates the KeyPoints TVTensor (requested in #8728), which is a representation of picture-attached points (or vertices) attached point

Details

Inner workings

The KeyPoints represent a tensor of shape [..., 2], which allow for arbitrarily complex structures to be represented (polygons, skeletons, or even SAM-like points prompts). Whenever the __new__ is called, the shape of the source tensor is checked.

Tensors of shape [2] are reshaped to [1, 2], similarly to BoundingBoxes.

KeyPoints, like BoundingBoxes, carry arround a canvas_size attribute which represents the scale of a batch-typical picture.

Kernels

Kernels for all operations should be supported (if I missed one, I will fix this). It merely consists of an adaptation of the code of BoundingBoxes.

Particularities

Maintainers may notice that a TYPE_CHECKING section was added that differs significantly from the implementation:

class KeyPoints(TVTensors)


    if TYPE_CHECKING:
        # EVIL: Just so that MYPY+PYLANCE+others stop shouting that everything is wrong when initializeing the TVTensor
        # Not read or defined at Runtime (only at linting time).
        # TODO: BOUNDING BOXES needs something similar
        def __init__(
            self,
            data: Any,
            *,
            dtype: Optional[torch.dtype] = None,
            device: Optional[Union[torch.device, str, int]] = None,
            requires_grad: Optional[bool] = None,
            canvas_size: Tuple[int, int],
        ):
            ...

I marked this section as EVIL since it is a trick, but it cannot generate vulnerabilities: TYPE_CHECKING is always False at runtime, and only ever True for the linter.

For the last few months, I had issues in my weird PyLance + Mypy mix with BoundingBoxes initialization. No overload is ever detected to match it. By "re-defining" it, I got to it solved on my machine.

Convertors

Added a convertor convert_box_to_points in torchvision.transorfms.v2.functional._meta exported in torchvision.transforms.v2 which (as its name states) converts a [N, 4] BoundingBoxes TVTensor into a [N, 4, 2] KeyPoints TVTensor.

Other changes

For the purposes of my custom type checking, I also changed tv_tensors.wrap to be 3.8-compatible generics.

Since wrap only ever outputs a subclass of its like argument, I used a TypeVar bound to TVTensor to ensure that type-checking passes no matter the checker used.

Methodology

  • Formated using ufmt
  • Flake8 compliance with line-length 120 enforced by editor
  • Documented the classes

Discussion

Since many converters of BoundingBoxes are based on chaning the bboxes to polygons, then operating on the points, I believe that there is a possibility to lower line count and increase reliability for negligeable computational latency cost by using KeyPoints kernels and converting using the method described in the details above

Copy link

pytorch-bot bot commented Dec 17, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8817

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure

As of commit a30737c with merge base 428a54c (image):

NEW FAILURE - The following job has failed:

  • Docs / build / linux-job (gh)
    RuntimeError: Command docker exec -t abd4b52b09d3369cf81d4f14e8cd13b6ed2cf5daa8dd8ed35c4c036742e3ba78 /exec failed with exit code 2

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link

Hi @Alexandre-SCHOEPP!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@broomhead
Copy link

I'd love to see this get included in a future release.

@Alexandre-SCHOEPP
Copy link
Author

I'd love to see this get included in a future release.

Given that the PR is about 4 months old now... me too ?

@NicolasHug
Copy link
Member

Thank you for your patience @Alexandre-SCHOEPP , I don't have much bandwidth at the moment, but I will try to get this reviewed and merged for 0.23

@Alexandre-SCHOEPP
Copy link
Author

No worries @NicolasHug, that project is very large given the size of the team behind it, i understant that prioritization needs to happen.

Copy link
Member

@AntoineSimoulin AntoineSimoulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Thanks a lot for the cool PR. I do think it will be a very relevant addition to torchvision. I feel the PR is well aligned with the structure of the codebase. However, this is introduces a large number of changes, so I propose we make a couple iterations to fully align with the main branch before merging. How does it sound?

I left some comments in the code, but here are my high level questions and remarks.

  • Regarding parameter naming, let's make sure we are sticking to "keypoint" for user-facing functions and not use "kp" and "kpoint" there;
  • I am not sure about the _xyxy_to_points output. Can we include some tests for both _xyxy_to_points and convert_box_to_points to make sure the input/output is as expected?
  • We a keypoint rcnn model that support keypoints in torchvision. I'm not sure we should change the code because of the risk of breaking it, but can use this as some advanced integration test?
  • I think we are missing a certain number of tests for the transforms: TestResize.test_kernel_key_points, TestResize.test_functional for key_points, TestResize.test_key_points_correctness, TestResize.test_noop, TestResize.test_no_regression_5405, TestHorizontalFlip.test_kernel_key_points, TestHorizontalFlip.test_bounding_boxes_correctness ... Would it be eventually possible to add those?
  • More generally, the keypoints are defined here a set of independent points. Is it the usual definition, can those be used to define arbitrary polygons? In those case, will some other functionalities be needed?

Thanks again for your time submitting this PR. Let me know what you think about my comments. On my side, I am excited to work on this with you and make sure we can merge the feature into torchvision.

Best,
Antoine

Comment on lines 165 to 178
def get_all_keypoints(flat_inputs: List[Any]) -> Iterable[tv_tensors.KeyPoints]:
"""Yields all KeyPoints in the input.

Raises:
ValueError: No KeyPoints can be found
"""
generator = (inpt for inpt in flat_inputs if isinstance(inpt, tv_tensors.KeyPoints))
try:
yield next(generator)
except StopIteration:
raise ValueError("No Keypoints were found in the sample.")
return generator


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this get_all_keypoints function is not used elsewhere. Should we delete it if this is not needed in the current state of the PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is meant to parallel the one for bounding boxes, the schematic of which I followed here. I don't necessarily have an attachment to it, but I felt like it was better to have it if someone who worked on the BBoxes had feedback on it.

dtype: Optional[torch.dtype] = None,
device: Optional[Union[torch.device, str, int]] = None,
requires_grad: Optional[bool] = None,
canvas_size: Tuple[int, int],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe align the order of the arguments with the BoundingBoxes class? In this case, canvas_size should be the 4th argument and not the last.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree for the sake of consistent code styling, however, given that this is a keyword-only argument, this difference isn't visible at runtime outside of inspecting the function's attributes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm putting it in third position since in BoundingBoxes the 3rd argument is the format, which does not exist in KeyPoints.

# Since a Tensor is a sequence of Tensor, had it not been the case, we may have had silent
# or complex errors
output = tuple(KeyPoints(part, canvas_size=canvas_size) for part in output)
elif isinstance(output, MutableSequence):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this case here?

Copy link
Author

@Alexandre-SCHOEPP Alexandre-SCHOEPP Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case handles lists, and list-like objects.

Unlike in the case of tuples, (which are non-mutable sequences), I can't reuse the container object. Here, I can.

I agree that it needs to be changed, however. Custom-made non-mutable sequences are not supported, which is an oversight on my part.

The valid logic would be :

     if isinstance(output, torch.Tensor) and not isinstance(output, KeyPoints):
            output = KeyPoints(output, canvas_size=canvas_size)
     elif isinstance(output, MutableSequence):
             # For lists and list-like object we don't try to create a new object, we just set the values in the list
             for i, part in enumerate(output):
                output[i] = KeyPoints(part, canvas_size=canvas_size)
     elif isinstance(output, Sequence):  # Non-mutable sequences handled here (like tuples)
            # NB: output is checked against sequence because it has already been checked against Tensor
            # Since a Tensor is a sequence of Tensor, had it not been the case, we may have had silent
            # or complex errors
            output = tuple(KeyPoints(part, canvas_size=canvas_size) for part in output)
     return output

new_format=BoundingBoxFormat.XYXY,
inplace=False,
)
return tv_tensors.KeyPoints(_xyxy_to_points(bbox), canvas_size=bounding_boxes.canvas_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also make sure this function is covered in the test case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

Copy link
Author

@Alexandre-SCHOEPP Alexandre-SCHOEPP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I marked every comment where I agreed with the need to change things with the word agree in my response just so I can ensure to not miss one whenever I get the time to fix it

Comment on lines 165 to 178
def get_all_keypoints(flat_inputs: List[Any]) -> Iterable[tv_tensors.KeyPoints]:
"""Yields all KeyPoints in the input.

Raises:
ValueError: No KeyPoints can be found
"""
generator = (inpt for inpt in flat_inputs if isinstance(inpt, tv_tensors.KeyPoints))
try:
yield next(generator)
except StopIteration:
raise ValueError("No Keypoints were found in the sample.")
return generator


Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is meant to parallel the one for bounding boxes, the schematic of which I followed here. I don't necessarily have an attachment to it, but I felt like it was better to have it if someone who worked on the BBoxes had feedback on it.

new_format=BoundingBoxFormat.XYXY,
inplace=False,
)
return tv_tensors.KeyPoints(_xyxy_to_points(bbox), canvas_size=bounding_boxes.canvas_size)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

dtype: Optional[torch.dtype] = None,
device: Optional[Union[torch.device, str, int]] = None,
requires_grad: Optional[bool] = None,
canvas_size: Tuple[int, int],
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree for the sake of consistent code styling, however, given that this is a keyword-only argument, this difference isn't visible at runtime outside of inspecting the function's attributes

points.canvas_size = canvas_size
return points

if TYPE_CHECKING:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had no need for type:ignore. This solution works, and quite frankly, I think should be the standard in the library (despite marking it jokingly as EVIL)

# Since a Tensor is a sequence of Tensor, had it not been the case, we may have had silent
# or complex errors
output = tuple(KeyPoints(part, canvas_size=canvas_size) for part in output)
elif isinstance(output, MutableSequence):
Copy link
Author

@Alexandre-SCHOEPP Alexandre-SCHOEPP Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case handles lists, and list-like objects.

Unlike in the case of tuples, (which are non-mutable sequences), I can't reuse the container object. Here, I can.

I agree that it needs to be changed, however. Custom-made non-mutable sequences are not supported, which is an oversight on my part.

The valid logic would be :

     if isinstance(output, torch.Tensor) and not isinstance(output, KeyPoints):
            output = KeyPoints(output, canvas_size=canvas_size)
     elif isinstance(output, MutableSequence):
             # For lists and list-like object we don't try to create a new object, we just set the values in the list
             for i, part in enumerate(output):
                output[i] = KeyPoints(part, canvas_size=canvas_size)
     elif isinstance(output, Sequence):  # Non-mutable sequences handled here (like tuples)
            # NB: output is checked against sequence because it has already been checked against Tensor
            # Since a Tensor is a sequence of Tensor, had it not been the case, we may have had silent
            # or complex errors
            output = tuple(KeyPoints(part, canvas_size=canvas_size) for part in output)
     return output

Copy link
Author

@Alexandre-SCHOEPP Alexandre-SCHOEPP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_geometry.py was hidden by the change reviewer, and so I failed to answer a few comments. Here's the change

@broomhead
Copy link

@Alexandre-SCHOEPP I pulled this fork and compiled it for my own project, as I really need keypoints that match the TVTensor design, particularly for transforms.

It's working well for me. One thing I did find is KeyPoints needs to be added to transforms like SanitizeBoundingBoxes. The issue arises that when a transform removes data on some criterion, like bounding box size, the associated data should be removed with it. Well, KeyPoints are not getting removed. I looked at the code and it was as simple as adding KeyPoints into a check in one line. For now I just overrode the class and made the one-line change locally:

# This is a direct copy-and-paste of SanitzeBoundingBoxes, with KeyPoints added.
@register(name="SanitizeBoundingBoxesWithKeyPoints")
class SanitizeBoundingBoxesWithKeyPoints(T.SanitizeBoundingBoxes):
    def transform(self, inpt: Any, params: Dict[str, Any]) -> Any:
        is_label = params["labels"] is not None and any(inpt is label for label in params["labels"])
        is_bounding_boxes_or_mask_or_keypoints = isinstance(
            inpt, (BoundingBoxes, Mask, KeyPoints)
        )

        if not (is_label or is_bounding_boxes_or_mask_or_keypoints):
            return inpt

        output = inpt[params["valid"]]
        if is_label:
            return output
        else:
            wrapped = wrap(output, like=inpt)
            return wrapped

that isinstance check is all we need; I just added KeyPoints to the list. Now, I'm new to PyTorch, so I'm not sure if there are other transforms that require this sort of check as well. None of the others used in my project appear to.

@Alexandre-SCHOEPP
Copy link
Author

Alexandre-SCHOEPP commented Apr 30, 2025

@Alexandre-SCHOEPP I pulled this fork and compiled it for my own project, as I really need keypoints that match the TVTensor design, particularly for transforms.

Thanks ! Always good to have this feedback.

One thing I did find is KeyPoints needs to be added to transforms like SanitizeBoundingBoxes

I am waiting for feedback on it by the maintainers (@NicolasHug, @AntoineSimoulin). I am willing to do it as I correct all other mistakes, but they may want it to be another PR.

@Alexandre-SCHOEPP
Copy link
Author

@NicolasHug Why remove the type annotation for the init method of KeyPoints (commit 08e8843) ? Is it breaking some things ? (I use that specific trick a lot)

@NicolasHug
Copy link
Member

Hi @Alexandre-SCHOEPP ,

I'm in the process of finalizing this PR, there are a few tests I need to add / fix and also for some functionalities, I need to make sure we're supporting what's strictly needed. The type annotations tend to get in the way of doing that, and they're not strictly needed for this feature. I don't mind re-considering in the future but for the sake of this PR, my job as a reviewer is a lot simpler if I can ignore the annotation logic


index_xy = keypoints.to(dtype=torch.long)
index_x, index_y = index_xy[:, 0], index_xy[:, 1]
# Unlike bounding boxes, this may not work well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Alexandre-SCHOEPP can you share more about this comment? What may not work well, and how? Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a rather old comment, so i don't fully remember. The issue I see right now is that the code relies on keypoints being 2D (of shape N, 2), which is not always correct.

This comment also precedes a clamping that does not occur during the bounding boxes version. I think this is because in the bounding boxes, points outside the edge are erased, and in Keypoints they don't necessarily are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants