-
Notifications
You must be signed in to change notification settings - Fork 78
Transforms tutorial #1123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Transforms tutorial #1123
Changes from 5 commits
138e22e
408ac57
41e3343
2a16bbd
ba993ad
1a37b74
ade732c
8ea5384
37351c3
3cd8a3c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,318 @@ | ||||||||||
| # Copyright (c) Meta Platforms, Inc. and affiliates. | ||||||||||
| # All rights reserved. | ||||||||||
| # | ||||||||||
| # This source code is licensed under the BSD-style license found in the | ||||||||||
| # LICENSE file in the root directory of this source tree. | ||||||||||
|
|
||||||||||
| """ | ||||||||||
| ======================================================= | ||||||||||
| Decoder Transforms: Applying transforms during decoding | ||||||||||
| ======================================================= | ||||||||||
|
|
||||||||||
| In this example, we will demonstrate how to use the ``transforms`` parameter of | ||||||||||
| the :class:`~torchcodec.decoders.VideoDecoder` class. This parameter allows us | ||||||||||
| to specify a list of :class:`~torchcodec.transforms.DecoderTransform` or | ||||||||||
| :class:`~torchvision.transforms.v2.Transform` objects. These objects serve as | ||||||||||
| transform specificiations that the :class:`~torchcodec.decoders.VideoDecoder` | ||||||||||
|
||||||||||
| will apply during the decoding process. | ||||||||||
| """ | ||||||||||
|
|
||||||||||
| # %% | ||||||||||
| # First, a bit of boilerplate and definitions that we will use later: | ||||||||||
|
||||||||||
| # %% | |
| # First, a bit of boilerplate: we'll download a video from the web, and define a | |
| # plotting utility. You can ignore that part and jump right below to | |
| # :ref:`sampling_tuto_start`. |
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
NicolasHug marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
NicolasHug marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
NicolasHug marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
NicolasHug marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It usually makes more sense to first crop and then resize, because resize will then work on a smaller surface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it does, and curiously, it actually makes decoder transforms faster than the TorchVision version now (at least on my dev machine).
Results with the old way:
0:
decoder transforms: times_med = 1474.17ms +- 79.85
torchvision transform: times_med = 4683.55ms +- 28.71
1:
decoder transforms: times_med = 18486.50ms +- 165.66
torchvision transform: times_med = 16066.02ms +- 164.19
Results with the new way:
0:
decoder transforms: times_med = 1352.46ms +- 34.86
torchvision transform: times_med = 4077.44ms +- 45.63
1:
decoder transforms: times_med = 14771.99ms +- 148.83
torchvision transform: times_med = 16112.88ms +- 62.15
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's another core reason why that's more memory efficient: the decompressed RGB frame is never materialized in its original resolution.
Without decoder-native transform we have:
YUV compressed frame in original res -> RGB decompressed frame in original res -> RGB decompressed frame in final (smaller) res
WIth the decoder-native transform we have:
YUV compressed frame in original res -> RGB decompressed frame in final (smaller) res
i.e. we can skip the "RGB decompressed frame in original res" materialization, which is the most memory-expensive bit.
The garbage collector being less pressure is a consequence of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not entirely accurate - we definitely never get the "RGB decompressed frame in original res" in the Python layer, but it exists in FFmpeg. This is because we ensure that the FFmpeg filters get applied in the output color space. So without decoder transforms we have (parenthesis to indicate where it happens, TC or TV):
YUV compressed, original res (TC) ->
RGB decompressed , original res (TC) ->
RGB decompressed, smaller res (TV)
With decoder transforms it's:
YUV compressed, original res (TC) ->
RGB decompressed, original res (TC) ->
RGB decompressed, smaller res (TC)
So we really do go through the same steps in decoder transforms. That middle step - getting the RGB image in the original resolution - is because of this line:
| filters_ = "format=rgb24," + filters.str(); |
Eliminating the explicit "format=rgb24" does improve performance a lot, but at the cost of similarity with using TorchVision transforms on full frames.
Since the filtergraph inputs and outputs are known statically, I suspect they're able to optimize things and reuse memory. That is, it's possible for them to allocate exactly the memory they need for each step and reuse it every time. But I don't know that's the case. I'll try to say something about all this.
NicolasHug marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
NicolasHug marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth mentioning decoder native transforms in the performance tips docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mollyxu, yes, absolutely. I'd like to do that in a follow-up PR.
Uh oh!
There was an error while loading. Please reload this page.