-
Notifications
You must be signed in to change notification settings - Fork 656
Torchvision objective API #6139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Selected operators, that enable implementing TIMM pipeline - Unit tests - Test scripts Signed-off-by: Marek Dabek <[email protected]>
Greptile SummaryThis PR introduces a torchvision-compatible API for DALI, enabling users to use familiar torchvision transforms with DALI's accelerated execution. The implementation covers essential transforms (Resize, CenterCrop, ColorJitter, Normalize, Pad, GaussianBlur, and flip operations) that can be composed into pipelines. Key Changes:
Implementation Details:
Limitations:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Compose
participant PipelineHWC/CHW
participant DALI Pipeline
participant Transform Ops
participant Output
User->>Compose: __call__(PIL Image or torch.Tensor)
Compose->>Compose: _build_pipeline() on first call
alt PIL Image Input
Compose->>PipelineHWC: Create pipeline with HWC layout
else torch.Tensor Input
Compose->>PipelineCHW: Create pipeline with CHW layout
end
Compose->>PipelineHWC/CHW: run(data_input)
alt PIL Image
PipelineHWC/CHW->>PipelineHWC/CHW: Convert PIL to torch tensor
else torch.Tensor
PipelineHWC/CHW->>PipelineHWC/CHW: Add batch dimension if needed
end
PipelineHWC/CHW->>DALI Pipeline: _pipeline_function(op_list)
DALI Pipeline->>DALI Pipeline: fn.external_source(input_data)
loop For each transform in op_list
DALI Pipeline->>Transform Ops: Apply transform (Resize, CenterCrop, etc.)
Transform Ops->>Transform Ops: __call__(data_input)
alt GPU device specified
Transform Ops->>Transform Ops: data_input.gpu()
end
Transform Ops->>DALI Pipeline: Return transformed DataNode
end
DALI Pipeline->>PipelineHWC/CHW: Return output
PipelineHWC/CHW->>PipelineHWC/CHW: to_torch_tensor()
alt ToTensor in pipeline
PipelineHWC/CHW->>Output: Return torch.Tensor
else PIL Image mode
PipelineHWC/CHW->>PipelineHWC/CHW: Convert to PIL Image
PipelineHWC/CHW->>Output: Return PIL Image
else Batch mode
PipelineHWC/CHW->>Output: Return list or torch.Tensor
end
Output->>User: Transformed output
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (7)
-
dali/python/nvidia/dali/experimental/torchvision/v2/compose.py, line 137 (link)syntax: typo: "cound" should be "count"
-
dali/python/nvidia/dali/experimental/torchvision/v2/compose.py, line 138 (link)syntax: typo: "unsable" should be "unusable"
-
dali/python/nvidia/dali/experimental/torchvision/v2/compose.py, line 59 (link)syntax: typo: "runst" should be "runs"
-
dali/python/nvidia/dali/experimental/torchvision/v2/compose.py, line 186 (link)logic: parentheses mismatch:
append()call has wrong parentheses placement, causinglayoutto be passed as second argument torange()instead of_convert_tensor_to_image() -
dali/python/nvidia/dali/experimental/torchvision/v2/resize.py, line 40 (link)syntax: typo: "resizedimage" should be "resized image" (missing space)
-
dali/python/nvidia/dali/experimental/torchvision/v2/resize.py, line 108-113 (link)logic: logic error: condition will never be false - checking
len(self.size) == 2after already confirmingnot isinstance(self.size, (tuple, list))is contradictory -
dali/python/nvidia/dali/experimental/torchvision/v2/pad.py, line 29 (link)syntax: typo: "boarder_type" should be "border_type"
21 files reviewed, 7 comments
Signed-off-by: Marek Dabek <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (6)
-
dali/python/nvidia/dali/experimental/torchvision/v2/compose.py, line 48 (link)logic: This assumes
tensor_or_tlhas a__len__method in the else branch, but not all inputs may support this (e.g., if it's not a tuple but a single TensorList). Consider checking type before callinglen(). -
dali/python/nvidia/dali/experimental/torchvision/v2/pad.py, line 56-57 (link)logic: Missing validation: when
paddingis a sequence, its length should be either 2 or 4 (per torchvision API). Length 2 means left/right and top/bottom, while length 4 is left, top, right, bottom. This unpacking will fail for sequences of length 2. -
dali/python/nvidia/dali/experimental/torchvision/v2/pad.py, line 65-66 (link)style: Layout check is fragile - comparing against encoded byte of 'C' to detect CHW layout. Consider using string comparison or DALI's layout constants for better readability and maintainability.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
-
dali/python/nvidia/dali/experimental/torchvision/v2/resize.py, line 76-77 (link)style: This validation is incomplete - when
size > max_size, should raise a more descriptive error message explaining why the configuration is invalid.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
-
dali/python/nvidia/dali/experimental/torchvision/v2/compose.py, line 117-118 (link)style: This restriction prevents
ToTensorfrom being used mid-pipeline. Consider whether this should raise a more specific exception class or provide guidance on workarounds.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
-
dali/python/nvidia/dali/experimental/torchvision/v2/color.py, line 138 (link)style: Comment asks "what if it is HSV?" but no validation is performed. If non-RGB color spaces are unsupported, should validate and raise an error.
21 files reviewed, 6 comments
| elif in_tensor.shape[channels] == 3: | ||
| mode = "RGB" | ||
| else: | ||
| raise ValueError(f"Unsupported channels count: {channels}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| raise ValueError(f"Unsupported channels count: {channels}") | |
| raise ValueError(f"Unsupported channel count: {channels}") |
or
| raise ValueError(f"Unsupported channels count: {channels}") | |
| raise ValueError(f"Unsupported number of channels: {channels}") |
also, I'd add some info like "Expected 1 or 3." to make users' life easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| return _to_torch_tensor(tensor_or_tl) | ||
|
|
||
|
|
||
| class Compose(Pipeline): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's not going to be used as a regular pipeline (?), why inheritance? Composition would work much better here, since you'd be able to create the inner pipeline with @pipeline_def and the conditional execution would be handled for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| def test_compose_tensor(): | ||
| test_tensor = make_test_tensor(shape=(5, 5, 5, 3)) | ||
| dali_out = Compose([RandomHorizontalFlip(p=1.0)], batch_size=test_tensor.shape[0])(test_tensor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should perpetuate this pattern even in the tests - creating an ad-hoc pipeline like this is extremely expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
|
|
||
| """ | ||
| TODO: DALI ColorJitter does not work on tensors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by that? That it doesn't work with HWC layout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HWC layout works, but CHW does not.
The API transforms PIL image to HWC, but expects tensor input to be CHW, so passing tensor will not work. This is something that needs to be solved at some point if the API would need to work efficiently with batches.
|
|
||
|
|
||
| def build_centercrop_transform( | ||
| size: Union[int, Sequence[int]], batch_size: int = 1, device: Literal["cpu", "gpu"] = "cpu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I think we can use the | syntax already.
size: int | Sequence[int]
| def __init__( | ||
| self, | ||
| op_list: List[Callable[..., Union[Sequence[_DataNode], _DataNode]]], | ||
| batch_size: int = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A food for thought:
Generally, batch_size argument in DALI Pipeline ctor denotes max_batch_size rather than actual batch size. Handling batches with smaller size than max does not introduce observable memory overhead. I wonder, would it be a good idea to set the batch_size here to some value (it's a good question, what kind of value), so that the user in most cases won't need to set it? Like 256 from top of my head?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the same might apply to num threads, like nproc-1 or something?
| batch_size: int = 1, | ||
| num_threads: int = 1, | ||
| *args, | ||
| **kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those args and kwargs are the arguments I'd pass to DALI pipeline? If so it might be good to provide this info in documentation. Maybe even call it such? **dali_pipeline_args.
Also, I'm not sure if having *args after defined keyword arguments is a good pattern. Do we have any specific usages for this? Maybe we can just omit *args and have keyword args only?
| if not isinstance(data_input, (Image.Image, torch.Tensor)): | ||
| raise TypeError(f"input should be PIL Image or torch.Tensor. Got {type(data_input)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to also allow numpy array and nvimgcodec image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vanilla Torchvision currently supports only the following formats:
Images as pure tensors, Image or PIL image
Videos as Video
Axis-aligned and rotated bounding boxes as BoundingBoxes
Segmentation and detection masks as Mask
KeyPoints as KeyPoints.
I chose to support images and tensor for now, since these seemed to be supported out of the box or with minimal effort by DALI.
| if self.device == "gpu": | ||
| data = data.gpu() | ||
|
|
||
| return fn.crop( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it work on bounding boxes? Shouldn't there be a detection of the contents and rescaling the bbox to a cropped coordinate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, bounding boxes need separate handling and are not supported yet.
Signed-off-by: Marek Dabek <[email protected]>
|
|
||
| return output | ||
|
|
||
| def get_layout(self) -> str: ... |
Check notice
Code scanning / CodeQL
Statement has no effect Note
|
|
||
| def get_layout(self) -> str: ... | ||
|
|
||
| def get_channel_reverse_idx(self) -> int: ... |
Check notice
Code scanning / CodeQL
Statement has no effect Note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (4)
-
dali/python/nvidia/dali/experimental/torchvision/v2/compose.py, line 148-151 (link)logic:
ValueErroris instantiated but never raised, silently failing validation. -
dali/python/nvidia/dali/experimental/torchvision/v2/compose.py, line 208-211 (link)logic:
ValueErroris instantiated but never raised, silently failing validation. -
dali/python/nvidia/dali/experimental/torchvision/v2/compose.py, line 260 (link)syntax: Typo in error message: "torch.Tesors" should be "torch.Tensors"
-
dali/python/nvidia/dali/experimental/torchvision/v2/pad.py, line 65-66 (link)syntax: Comment says "CWH" but should be "CHW" (channels-height-width)
21 files reviewed, 4 comments
Signed-off-by: Marek Dabek <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
dali/python/nvidia/dali/experimental/torchvision/v2/compose.py, line 93-94 (link)logic: Indexing with
-4may fail if output has fewer than 4 dimensions. Add bounds check or handle IndexError.
21 files reviewed, 1 comment
| or validated_param[1] < 0 | ||
| or validated_param[0] > validated_param[1] | ||
| ): | ||
| raise ValueError("Parameters must be > 0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, but maybe propagate a name of the parameter here, and report that it also should form a correct range if sequence is provided.
So we would write:
| raise ValueError("Parameters must be > 0") | |
| raise ValueError(f"Parameter {name} must be > 0, got {param}.") |
or
| raise ValueError("Parameters must be > 0") | |
| raise ValueError("Parameters {name} must form a range, got {param}") |
| self.saturation = self._create_validated_param(saturation) | ||
|
|
||
| if isinstance(hue, float): | ||
| self.hue = (-hue, hue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hue hue :D
| self.hue = hue | ||
|
|
||
| if self.hue is not None and (len(self.hue) != 2 or self.hue[0] < -0.5 or self.hue[1] > 0.5): | ||
| raise ValueError(f"hue values should be between (-0.5, 0.5) but got {self.hue}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this looks like a list, but the 0.5 is supposed to be inclusive:
| raise ValueError(f"hue values should be between (-0.5, 0.5) but got {self.hue}") | |
| raise ValueError(f"hue values should be between [-0.5, 0.5] but got {self.hue}") |
| if isinstance(kernel_size, int) and kernel_size <= 0: | ||
| raise ValueError("Kernel size value should be an odd and positive number") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no check for odd.
|
|
||
| mean (sequence) – Sequence of means for each channel. | ||
| std (sequence) – Sequence of standard deviations for each channel. | ||
| inplace (bool,optional) – Bool to make this operation in-place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just hard error on this?
| if self.size is None: | ||
| if orig_h > orig_w: | ||
| target_w = (self.max_size * orig_w) / orig_h | ||
| else: | ||
| target_h = (self.max_size * orig_h) / orig_w |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are missing some calculation for when size is an int, size * max(height, width) / min(height, width) > max_size (the longer edge crosses the max_size threshold) -> this means that the image is max_size on the longer edge, and max_size * min(height, width) / max(height, width)?
Also, you are setting target_w or target_h here, and only using target_h in the if self.mode == "resize_shorter":
Looks like we are missing some cases here unless I'm mistaken.
| loop_images_test(t, td) | ||
|
|
||
|
|
||
| @params((480, 512), (100, 124), (None, 512), (1024, 512), ([256, 256], 512)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have images that after scaling the shorter edge to resize, the max_size threshold would be crossed?
| def _to_torch_tensor(tensor_or_tl: Union[TensorListGPU, TensorListCPU]) -> torch.Tensor: | ||
| if isinstance(tensor_or_tl, (TensorListGPU, TensorListCPU)): | ||
| dali_tensor = tensor_or_tl.as_tensor() | ||
| else: | ||
| dali_tensor = tensor_or_tl | ||
|
|
||
| return torch.from_dlpack(dali_tensor) | ||
|
|
||
|
|
||
| def to_torch_tensor(tensor_or_tl: Union[tuple, TensorListGPU, TensorListCPU]) -> torch.Tensor: | ||
|
|
||
| if isinstance(tensor_or_tl, tuple) and len(tensor_or_tl) > 1: | ||
| tl = [] | ||
| for elem in tensor_or_tl: | ||
| tl.append(_to_torch_tensor(elem)) | ||
| return tuple(tl) | ||
| else: | ||
| if len(tensor_or_tl) == 1: | ||
| tensor_or_tl = tensor_or_tl[0] | ||
| return _to_torch_tensor(tensor_or_tl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a bit of a doc here? Is it just to convert dali pipeline output to a Torch Tensor?
The batch of 1 case is not reflected in the signature of _to_torch_tensor.
| return output | ||
|
|
||
| if isinstance(output, tuple): | ||
| output = self._convert_tensor_to_image(output[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because it's always one output, right?
| # We need to convert tensor to CPU, otherwise it will be unsable | ||
| return Image.fromarray(in_tensor.cpu().numpy(), mode=mode) | ||
|
|
||
| def run(self, data_input): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be cleaner if in both of the Pipelines we had implemented
- convert_input_to_dali_batch
- convert_dali_batch_to_output
or something similar, and just call them in the base run()?
Btw, I'm a bit lost what the input and outputs are expected to be. Is it just a torch tensor or PIL image that can have leading batch dimension?
Category:
New feature
Description:
Torchvision objective API.
The implementation is currently limited to few selected operators and composing them into a single pipeline. The operators has been selected to enable TIMM pipeline implementation. Input data support is limited to PIL Images and Pytorch tensors.
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
There will be additional PR with documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-4309