feat(benchmark): add video CPU benchmarks and kornia transforms by ternaus · Pull Request #45 · albumentations-team/benchmark

ternaus · 2026-03-01T10:41:03Z

Made-with: Cursor

Summary by Sourcery

Add CPU-targeted video benchmark support and improve Kornia/torchvision video transform handling in the benchmarking pipeline.

New Features:

Allow forcing the device for video benchmarks via a --video-device option and BENCHMARK_VIDEO_DEVICE environment variable.
Add separate CPU video benchmark result outputs alongside existing GPU results.
Introduce a video-specific base requirements file for torchvision and Kornia benchmarks without OpenCV to avoid conflicting libav dependencies.

Bug Fixes:

Align Kornia video tensor creation to always use float16 to match the video loader output and avoid dtype mismatches.
Disable problematic Kornia Elastic and Perspective video transforms that fail or OOM with batched video tensors.
Update torchvision video reading to use pts_unit='sec' and correct tensor layout handling to avoid warnings and shape issues.

Enhancements:

Expose the selected video device in benchmark metadata for video runs.
Propagate video device selection through the CLI, runner, and GCP run paths for consistent configuration.
Refresh README video benchmark table with updated performance numbers for AlbumentationsX video transforms.

Build:

Adjust virtual environment dependency installation to use a video-focused base requirements file for torchvision and Kornia video benchmarks when needed.

Documentation:

Update README video benchmark table values to reflect the latest measurements.

Chores:

Reorganize output video benchmark JSON files into a cpu subdirectory to separate CPU video benchmark results.

Made-with: Cursor

sourcery-ai · 2026-03-01T10:41:11Z

Reviewer's Guide

Updates the video benchmark pipeline to support explicit CPU/GPU selection, align kornia video tensor dtypes and torchvision video loading behavior, work around problematic kornia transforms for batched video, adjust virtualenv base requirements for torch-based video libs, and refresh the README video benchmark table plus output JSON layout to reflect new CPU runs.

Sequence diagram for video benchmark execution with explicit video_device and env override

sequenceDiagram
    actor User
    participant CLI as BenchmarkCLI
    participant Venv as VirtualEnvManager
    participant RunnerProc as BenchmarkRunnerMain
    participant BR as BenchmarkRunner
    participant Utils as VideoUtils
    participant Kornia as KorniaVideoImpl

    User->>CLI: cmd_run --media video --video-device cpu
    CLI->>Venv: _run_single(media=video, video_device=cpu)
    Venv->>Venv: _ensure_venv(library, media)
    Venv->>RunnerProc: python benchmark/runner.py --media video --video-device cpu

    RunnerProc->>RunnerProc: build_parser()
    RunnerProc->>RunnerProc: parse_args()
    RunnerProc->>RunnerProc: media_type = MediaType.VIDEO
    RunnerProc->>RunnerProc: setenv BENCHMARK_VIDEO_DEVICE=cpu

    RunnerProc->>BR: BenchmarkRunner(..., media_type=VIDEO, video_device=cpu)
    User->>BR: run(output_path)

    BR->>Utils: _load_videos()
    Utils->>Utils: import torch
    Utils->>Utils: device = torch.device(cpu)
    Utils->>Utils: gpu_available = False
    Utils-->>BR: list of video tensors

    note over Kornia,RunnerProc: In kornia_video_impl
    RunnerProc->>Kornia: import kornia_video_impl
    Kornia->>Kornia: _dev = BENCHMARK_VIDEO_DEVICE
    Kornia->>Kornia: device = torch.device(cpu)

    BR->>Kornia: create_transform(spec)
    Kornia-->>BR: transform or None

    loop for each run
        BR->>Kornia: create_tensor(frame_batch, device=cpu)
        Kornia-->>BR: tensor float16
        BR->>BR: call_fn(transform, video)
    end

    BR->>RunnerProc: results, metadata(video_device=cpu)
    RunnerProc-->>User: write results JSON and summary

Updated class diagram for BenchmarkRunner, CLI, and kornia video backend

classDiagram
    class BenchmarkRunner {
        +str library
        +Path data_dir
        +list[Any] transforms
        +Callable call_fn
        +MediaType media_type
        +int num_items
        +int num_runs
        +int max_warmup
        +float warmup_threshold
        +int min_warmup_windows
        +int num_channels
        +str~None _video_device
        +__init__(library, data_dir, transforms, call_fn, media_type, num_items, num_runs, max_warmup, warmup_threshold, min_warmup_windows, num_channels, video_device)
        +run(output_path) dict~str,Any~
        -_load_videos() list~Any~
        +filter_transforms(transforms, filter_names) list~TransformSpec~
    }

    class RunnerMain {
        +main()
        -build_parser() argparse_ArgumentParser
    }

    class BenchmarkCLI {
        +build_parser() argparse_ArgumentParser
        +cmd_run(args)
        -_ensure_venv(library, media, repo_root) Path
        -_run_single(library, media, repo_root, specs_file, output_dir, num_items, num_runs, max_warmup, warmup_threshold, min_warmup_windows, transforms_filter, verbose, num_channels, video_device)
        -_cmd_run_gcp(args, repo_root, local_output_dir)
    }

    class KorniaVideoImpl {
        +torch_device device
        +str LIBRARY
        +call(transform, video) Any
        +create_tensor(data, device) torch_Tensor
        +create_transform(spec) Any
    }

    class VideoUtils {
        +read_video_cv2(path) np_ndarray
        +read_video_torch(path) Any
    }

    class MediaType {
        <<enumeration>>
        IMAGE
        VIDEO
    }

    BenchmarkCLI --> RunnerMain : invokes python module
    RunnerMain --> BenchmarkRunner : constructs
    BenchmarkRunner --> VideoUtils : uses _load_videos
    RunnerMain --> KorniaVideoImpl : imports when library == kornia
    BenchmarkRunner --> KorniaVideoImpl : uses create_transform and create_tensor
    RunnerMain --> MediaType : selects
    BenchmarkCLI --> MediaType : parses from args

    class RequirementsManager {
        +_ensure_venv(library, media, repo_root) Path
    }

    RequirementsManager <|-- BenchmarkCLI
    RequirementsManager : selects base_req

    class BaseRequirements {
        +requirements_txt
        +requirements_base_video_torch_txt
    }

    RequirementsManager --> BaseRequirements : chooses file based on media and library

File-Level Changes

Change	Details	Files
Allow forcing video benchmark device and propagate it through runner, CLI, and environment.	Add optional video_device parameter to BenchmarkRunner, store it, and include it in video benchmark metadata. Update _load_videos to derive torch device from video_device when provided while still checking CUDA availability. Extend CLI/main argument parsing with --video-device, wire it into BenchmarkRunner construction, and set BENCHMARK_VIDEO_DEVICE env var for video runs. Plumb video_device through benchmark.cli _run_single, cmd_run, and GCP run helpers so local and remote invocations can pass the flag.	`benchmark/runner.py` `benchmark/cli.py`
Adjust kornia video implementation device selection and tensor dtype, and disable problematic transforms for video.	Change kornia_video_impl device selection to honor BENCHMARK_VIDEO_DEVICE when present before falling back to CUDA-if-available logic. Simplify create_tensor helper to always create float16 tensors (matching kornia video loader) and accept numpy arrays or simple float lists. Return None for Elastic and Perspective specs with comments explaining OOM and type issues when used with batched video tensors, effectively skipping them in kornia video benchmarks. Switch RandomRotate90 transform to RandomRotation with degrees=(0.0, 360.0) instead of times=(0, 3) for video usage.	`benchmark/transforms/kornia_video_impl.py`
Improve torchvision video reading behavior for benchmarks.	Call torchvision.io.read_video with pts_unit='sec' to avoid wrong-result warnings and align with recommended API usage. Update the docstring for read_video_torch to document the pts_unit change and returned shape before permutation.	`benchmark/utils.py`
Use a torch-specific base requirements file for video benchmarks with torchvision or kornia to avoid libavdevice conflicts.	In _ensure_venv, choose requirements-base-video-torch.txt instead of the default base requirements when media is video and library is torchvision or kornia. Add the new requirements-base-video-torch.txt file to the repo as the base for these video torch environments.	`benchmark/cli.py` `requirements/requirements-base-video-torch.txt`
Refresh video benchmark results artifacts and documentation for the new CPU video runs.	Update README video benchmark table numbers and albumentationsx version tag from 2.0.18 to 2.0.20 to reflect new measurements, including speedup recalculations. Reorganize video benchmark outputs under output_videos/cpu/ and add new CPU result JSON files for albumentationsx and torchvision video benchmarks, replacing/augmenting the old location.	`README.md` `output_videos/albumentationsx_video_results.json` `output_videos/cpu/albumentationsx_video_results.json` `output_videos/cpu/torchvision_video_results.json`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've left some high level feedback:

In BenchmarkRunner._load_videos, when --video-device cuda is requested but torch.cuda.is_available() is false, device is still set to CUDA; consider either failing fast with a clear error or falling back to CPU so you don't hit a runtime device error later.
For Kornia Elastic and Perspective you now return None from create_transform; if the rest of the pipeline treats this the same as a valid transform it may silently skip these cases—consider explicitly marking them as unsupported (e.g., via specs or filtering) so the omission is visible to callers.
The updated create_tensor always forces float16, including on CPU runs; if some Kornia ops are slower or less well supported in FP16 on CPU, you might want to keep CPU tensors in float32 and handle any necessary casts at the video loading boundary instead.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `BenchmarkRunner._load_videos`, when `--video-device cuda` is requested but `torch.cuda.is_available()` is false, `device` is still set to CUDA; consider either failing fast with a clear error or falling back to CPU so you don't hit a runtime device error later.
- For Kornia `Elastic` and `Perspective` you now return `None` from `create_transform`; if the rest of the pipeline treats this the same as a valid transform it may silently skip these cases—consider explicitly marking them as unsupported (e.g., via specs or filtering) so the omission is visible to callers.
- The updated `create_tensor` always forces `float16`, including on CPU runs; if some Kornia ops are slower or less well supported in FP16 on CPU, you might want to keep CPU tensors in `float32` and handle any necessary casts at the video loading boundary instead.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

- Fail fast when --video-device cuda requested but CUDA unavailable - Add unsupported_transforms to output for Kornia Elastic/Perspective - Use float32 on CPU, float16 on GPU for Kornia create_tensor and video loading Made-with: Cursor

Copilot

Pull request overview

This PR extends the benchmark tooling and artifacts to support CPU video benchmarking (alongside existing GPU-focused results), adds a torch-video base requirements set to avoid macOS PyAV/OpenCV conflicts, and updates some video-loading and kornia video transform behaviors.

Changes:

Add --video-device {cpu,cuda} plumbing through benchmark.cli → benchmark.runner and persist the setting in results metadata.
Update torchvision video loading to pass pts_unit="sec" to avoid torchvision warnings.
Add new/updated video benchmark result JSONs (including CPU runs) and refresh the README’s embedded video benchmark table.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
requirements/requirements-base-video-torch.txt	New “no-opencv” base requirements for torch-based video venvs (torchvision/kornia) to avoid macOS conflicts.
benchmark/cli.py	Uses the new base requirements for torch video venvs; forwards `--video-device` into the runner (local + GCP).
benchmark/runner.py	Adds `--video-device` option, device selection logic, and writes `video_device` into metadata.
benchmark/utils.py	Uses `torchvision.io.read_video(..., pts_unit="sec")` for video loading.
benchmark/transforms/kornia_video_impl.py	Attempts to respect `BENCHMARK_VIDEO_DEVICE`; disables problematic transforms; changes RandomRotate90 behavior.
output_videos/cpu/torchvision_video_results.json	New CPU torchvision video benchmark output artifact.
output_videos/cpu/albumentationsx_video_results.json	New CPU albumentationsx video benchmark output artifact.
output_videos/albumentationsx_video_results.json	Updates existing albumentationsx video results (version + numbers).
README.md	Refreshes the embedded video benchmark comparison table (albumentationsx column/version).

Comments suppressed due to low confidence (2)

benchmark/cli.py:110

Selecting a base requirements file without OpenCV won’t remove an already-installed opencv-python(-headless) from an existing venv (since this uses uv pip install, not a sync). That means the macOS libavdevice conflict may still occur unless the venv is recreated. Consider switching to uv pip sync (or explicitly uninstalling OpenCV / recreating the venv when the base requirements change) so the environment actually matches the intended base set.

    # Torch video venvs use a base without opencv to avoid duplicate libavdevice (av vs cv2)
    if media == "video" and library in ("torchvision", "kornia"):
        base_req = repo_root / "requirements" / "requirements-base-video-torch.txt"
    else:
        base_req = repo_root / "requirements" / "requirements.txt"
    subprocess.run(
        [str(python), "-m", "uv", "pip", "install", "-q", "-U", "-r", str(base_req)],
        check=True,
    )

benchmark/runner.py:557

BENCHMARK_VIDEO_DEVICE is set directly from --video-device. If a user passes --video-device cuda on a host without CUDA, kornia_video_impl will still pick torch.device('cuda') at import time and crash when moving tensors/transforms. Consider guarding this with torch.cuda.is_available() (fall back to CPU with a clear warning/error) before exporting the env var.

        default=3,
        help=(
            "Number of image channels. Must be a multiple of 3. "

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-01T10:47:09Z

benchmark/transforms/kornia_video_impl.py

-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# Get device (respect BENCHMARK_VIDEO_DEVICE for CPU-only runs)
+_dev = (os.environ.get("BENCHMARK_VIDEO_DEVICE") or "").strip().lower()
+device = torch.device(_dev if _dev in ("cpu", "cuda") else ("cuda" if torch.cuda.is_available() else "cpu"))


If BENCHMARK_VIDEO_DEVICE is set to "cuda" on a machine without CUDA, this will still select torch.device("cuda"), and subsequent .to(device) calls will crash. Consider validating torch.cuda.is_available() when _dev == "cuda" and falling back to CPU (and/or emitting a clear warning).

Suggested change

device = torch.device(_dev if _dev in ("cpu", "cuda") else ("cuda" if torch.cuda.is_available() else "cpu"))

if _dev == "cpu":

device = torch.device("cpu")

elif _dev == "cuda":

if torch.cuda.is_available():

device = torch.device("cuda")

else:

# Fall back to CPU if CUDA is requested but not available

print(

"BENCHMARK_VIDEO_DEVICE is set to 'cuda' but CUDA is not available; falling back to CPU.",

)

device = torch.device("cpu")

else:

# Default behavior: prefer CUDA if available, otherwise CPU

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Copilot · 2026-03-01T10:47:10Z

benchmark/transforms/kornia_video_impl.py

+        register_unsupported_reason(
+            LIBRARY,
+            "Elastic",
+            "RandomElasticTransform per-frame displacement fields cause OOM when T frames treated as batch",


The comment says this transform will be processed frame-by-frame and repeated, but the implementation returns None, which marks the transform as unsupported (it will be excluded entirely). Either implement the described fallback behavior or adjust the comment to reflect that Elastic is currently disabled for videos.

Suggested change

"RandomElasticTransform per-frame displacement fields cause OOM when T frames treated as batch",

"Elastic video transform is currently disabled: RandomElasticTransform per-frame displacement fields cause OOM when T frames are treated as a batch",

Copilot · 2026-03-01T10:47:10Z

benchmark/transforms/kornia_video_impl.py

@@ -316,7 +313,7 @@ def create_transform(spec: TransformSpec) -> Any | None:
        ).to(device)


RandomRotate90 is expected to rotate by multiples of 90°, but this uses RandomRotation(degrees=(0.0, 360.0)), which produces arbitrary angles and changes the transform semantics vs the spec/name and other libraries. Consider using a 90°-step rotation op (or sampling from {0,90,180,270} and applying a deterministic rotation).

Copilot · 2026-03-01T10:47:10Z

benchmark/runner.py

+            if self._video_device:
+                device = torch.device(self._video_device)
+                gpu_available = device.type == "cuda" and torch.cuda.is_available()
+                if device.type == "cuda" and not torch.cuda.is_available():
+                    raise RuntimeError(
+                        "--video-device cuda was requested but torch.cuda.is_available() is False. "


When video_device is explicitly set to cuda but CUDA isn’t actually available, this silently proceeds with device=torch.device('cuda') while leaving gpu_available=False (so tensors stay on CPU). That can lead to confusing behavior/metadata (and can still break libraries that honor BENCHMARK_VIDEO_DEVICE). Consider explicitly erroring or warning and falling back to CPU when video_device=='cuda' but torch.cuda.is_available() is false.

feat(benchmark): add video CPU benchmarks and kornia transforms

59233fa

Made-with: Cursor

ternaus requested a review from Copilot March 1, 2026 10:41

Copilot started reviewing on behalf of ternaus March 1, 2026 10:41 View session

sourcery-ai bot reviewed Mar 1, 2026

View reviewed changes

fix(benchmark): address video benchmark feedback

7fb1aaf

- Fail fast when --video-device cuda requested but CUDA unavailable - Add unsupported_transforms to output for Kornia Elastic/Perspective - Use float32 on CPU, float16 on GPU for Kornia create_tensor and video loading Made-with: Cursor

Copilot AI reviewed Mar 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(benchmark): add video CPU benchmarks and kornia transforms#45

feat(benchmark): add video CPU benchmarks and kornia transforms#45
ternaus wants to merge 2 commits intomainfrom
feat/video-cpu-benchmarks

ternaus commented Mar 1, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Mar 1, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 1, 2026

Uh oh!

Copilot AI Mar 1, 2026

Uh oh!

Copilot AI Mar 1, 2026

Uh oh!

Copilot AI Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-device = torch.device(_dev if _dev in ("cpu", "cuda") else ("cuda" if torch.cuda.is_available() else "cpu"))
+if _dev == "cpu":
+    device = torch.device("cpu")
+elif _dev == "cuda":
+    if torch.cuda.is_available():
+        device = torch.device("cuda")
+    else:
+        # Fall back to CPU if CUDA is requested but not available
+        print(
+            "BENCHMARK_VIDEO_DEVICE is set to 'cuda' but CUDA is not available; falling back to CPU.",
+        )
+        device = torch.device("cpu")
+else:
+    # Default behavior: prefer CUDA if available, otherwise CPU
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	"RandomElasticTransform per-frame displacement fields cause OOM when T frames treated as batch",
	"Elastic video transform is currently disabled: RandomElasticTransform per-frame displacement fields cause OOM when T frames are treated as a batch",

		@@ -316,7 +313,7 @@ def create_transform(spec: TransformSpec) -> Any \| None:
		).to(device)

Uh oh!

Conversation

ternaus commented Mar 1, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for video benchmark execution with explicit video_device and env override

Updated class diagram for BenchmarkRunner, CLI, and kornia video backend

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ternaus commented Mar 1, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Mar 1, 2026 •

edited

Loading