Skip to content

Conversation

@mollyxu
Copy link
Contributor

@mollyxu mollyxu commented Dec 13, 2025

Summary

This PR implements resize transforms using sw_scale twice instead of using filtergraph:

  1. First sw_scale(): Color conversion (YUV → RGB24) at original resolution
  2. Second sw_scale(): Resize in RGB24 space

This ensures transforms happen in the output color space (RGB24), matching filtergraph behavior while potentially offering better performance.

Note: this optimization is not on the current active code path because the intention is to benchmark the performance of sw_scale against filtergraph before deciding whether we should switch over.

Benchmark Results

Key findings:

  • Larger resize targets show significant speedup (up to 2.2× faster at 135×240)
  • Speedup increases with more frames (better amortization of context caching)
Sampling Resize Dims swscale (ms) filtergraph (ms) Speedup
0.5% (2 frames) 135×240 49.55 86.15 1.7×
0.5% (2 frames) 67×120 37.72 45.07 1.2×
0.5% (2 frames) 33×60 36.79 41.11 1.1×
1% (4 frames) 135×240 68.32 103.19 1.5×
1% (4 frames) 67×120 56.19 72.24 1.3×
1% (4 frames) 33×60 51.93 58.95 1.1×
5% (20 frames) 135×240 114.87 217.67 1.9×
5% (20 frames) 67×120 83.11 112.06 1.3×
5% (20 frames) 33×60 75.27 83.32 1.1×
10% (39 frames) 135×240 183.67 396.93 2.2×
10% (39 frames) 67×120 127.18 180.97 1.4×
10% (39 frames) 33×60 111.35 125.17 1.1×
Benchmarking nasa_13013.mp4, duration: 13.013, codec: h264, averaging over 10 runs:

======================================================================
SWSCALE VS FILTERGRAPH BENCHMARKS
======================================================================

Sampling 0.5%, 2, of 390 frames

--- Resize to (135, 240) ---
swscale resize(135, 240)                      med = 61.19, mean = 61.35 +- 0.52, min = 60.68, max = 62.40 - in ms
filtergraph resize(135, 240)                  med = 77.22, mean = 77.54 +- 2.72, min = 73.85, max = 81.12 - in ms

--- Resize to (67, 120) ---
swscale resize(67, 120)                       med = 63.59, mean = 63.53 +- 1.61, min = 60.36, max = 65.37 - in ms
filtergraph resize(67, 120)                   med = 62.77, mean = 63.26 +- 1.68, min = 61.82, max = 67.65 - in ms

--- Resize to (33, 60) ---
swscale resize(33, 60)                        med = 61.76, mean = 62.36 +- 2.20, min = 58.92, max = 66.76 - in ms
filtergraph resize(33, 60)                    med = 61.23, mean = 62.88 +- 3.77, min = 60.19, max = 73.20 - in ms

Sampling 1.0%, 4, of 390 frames

--- Resize to (135, 240) ---
swscale resize(135, 240)                      med = 86.18, mean = 89.46 +- 6.01, min = 83.14, max = 98.62 - in ms
filtergraph resize(135, 240)                  med = 99.92, mean = 107.05 +- 14.53, min = 90.91, max = 130.27 - in ms

--- Resize to (67, 120) ---
swscale resize(67, 120)                       med = 92.55, mean = 96.65 +- 13.01, min = 82.60, max = 123.35 - in ms
filtergraph resize(67, 120)                   med = 86.76, mean = 86.40 +- 1.48, min = 84.11, max = 88.42 - in ms

--- Resize to (33, 60) ---
swscale resize(33, 60)                        med = 85.48, mean = 85.29 +- 1.13, min = 82.53, max = 86.33 - in ms
filtergraph resize(33, 60)                    med = 84.64, mean = 85.15 +- 1.51, min = 83.15, max = 87.77 - in ms

Sampling 5.0%, 20, of 390 frames

--- Resize to (135, 240) ---
swscale resize(135, 240)                      med = 137.73, mean = 137.98 +- 1.08, min = 136.12, max = 140.18 - in ms
filtergraph resize(135, 240)                  med = 240.32, mean = 239.19 +- 31.94, min = 192.18, max = 286.86 - in ms

--- Resize to (67, 120) ---
swscale resize(67, 120)                       med = 130.74, mean = 130.36 +- 1.54, min = 126.45, max = 131.81 - in ms
filtergraph resize(67, 120)                   med = 133.50, mean = 133.49 +- 2.21, min = 129.31, max = 137.92 - in ms

--- Resize to (33, 60) ---
swscale resize(33, 60)                        med = 128.08, mean = 127.12 +- 2.42, min = 122.83, max = 129.82 - in ms
filtergraph resize(33, 60)                    med = 132.55, mean = 133.43 +- 2.49, min = 130.11, max = 137.67 - in ms

Sampling 10.0%, 39, of 390 frames

--- Resize to (135, 240) ---
swscale resize(135, 240)                      med = 154.48, mean = 155.21 +- 2.05, min = 153.36, max = 160.70 - in ms
filtergraph resize(135, 240)                  med = 202.84, mean = 204.87 +- 20.35, min = 165.98, max = 230.64 - in ms

--- Resize to (67, 120) ---
swscale resize(67, 120)                       med = 141.14, mean = 141.52 +- 0.82, min = 140.48, max = 143.27 - in ms
filtergraph resize(67, 120)                   med = 145.69, mean = 145.92 +- 1.15, min = 143.44, max = 147.54 - in ms

--- Resize to (33, 60) ---
swscale resize(33, 60)                        med = 137.54, mean = 137.72 +- 1.13, min = 135.47, max = 139.86 - in ms
filtergraph resize(33, 60)                    med = 140.16, mean = 140.25 +- 1.25, min = 138.31, max = 142.17 - in ms

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 13, 2025
path: Path, pts_seconds: list[float], dims: tuple[int, int], num_threads: int
) -> Tensor:
height, width = dims
decoder = create_from_file(str(path), seek_mode="approximate")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using core api calls here to make it easier to benchmark the two code paths against each other. Will modify if we decide to switch to sw_scale

// path). Like the color conversion context above, we cache this to avoid
// recreating it for every frame.
UniqueSwsContext resizeSwsContext_;
SwsFrameContext prevResizeSwsFrameContext_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably refactor the sws logic into its own class, similar to what we did for filtergraph. But let's tackle that as a follow-up PR to this functionality.

@scotts
Copy link
Contributor

scotts commented Dec 15, 2025

@mollyxu, great work! Yes, let's move forward, this is very convincing evidence that it's worth doing this optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants