Skip to content

feat(benchmark): add video CPU benchmarks and kornia transforms#45

Open
ternaus wants to merge 2 commits intomainfrom
feat/video-cpu-benchmarks
Open

feat(benchmark): add video CPU benchmarks and kornia transforms#45
ternaus wants to merge 2 commits intomainfrom
feat/video-cpu-benchmarks

Conversation

@ternaus
Copy link
Contributor

@ternaus ternaus commented Mar 1, 2026

Made-with: Cursor

Summary by Sourcery

Add CPU-targeted video benchmark support and improve Kornia/torchvision video transform handling in the benchmarking pipeline.

New Features:

  • Allow forcing the device for video benchmarks via a --video-device option and BENCHMARK_VIDEO_DEVICE environment variable.
  • Add separate CPU video benchmark result outputs alongside existing GPU results.
  • Introduce a video-specific base requirements file for torchvision and Kornia benchmarks without OpenCV to avoid conflicting libav dependencies.

Bug Fixes:

  • Align Kornia video tensor creation to always use float16 to match the video loader output and avoid dtype mismatches.
  • Disable problematic Kornia Elastic and Perspective video transforms that fail or OOM with batched video tensors.
  • Update torchvision video reading to use pts_unit='sec' and correct tensor layout handling to avoid warnings and shape issues.

Enhancements:

  • Expose the selected video device in benchmark metadata for video runs.
  • Propagate video device selection through the CLI, runner, and GCP run paths for consistent configuration.
  • Refresh README video benchmark table with updated performance numbers for AlbumentationsX video transforms.

Build:

  • Adjust virtual environment dependency installation to use a video-focused base requirements file for torchvision and Kornia video benchmarks when needed.

Documentation:

  • Update README video benchmark table values to reflect the latest measurements.

Chores:

  • Reorganize output video benchmark JSON files into a cpu subdirectory to separate CPU video benchmark results.

@ternaus ternaus requested a review from Copilot March 1, 2026 10:41
@sourcery-ai
Copy link

sourcery-ai bot commented Mar 1, 2026

Reviewer's Guide

Updates the video benchmark pipeline to support explicit CPU/GPU selection, align kornia video tensor dtypes and torchvision video loading behavior, work around problematic kornia transforms for batched video, adjust virtualenv base requirements for torch-based video libs, and refresh the README video benchmark table plus output JSON layout to reflect new CPU runs.

Sequence diagram for video benchmark execution with explicit video_device and env override

sequenceDiagram
    actor User
    participant CLI as BenchmarkCLI
    participant Venv as VirtualEnvManager
    participant RunnerProc as BenchmarkRunnerMain
    participant BR as BenchmarkRunner
    participant Utils as VideoUtils
    participant Kornia as KorniaVideoImpl

    User->>CLI: cmd_run --media video --video-device cpu
    CLI->>Venv: _run_single(media=video, video_device=cpu)
    Venv->>Venv: _ensure_venv(library, media)
    Venv->>RunnerProc: python benchmark/runner.py --media video --video-device cpu

    RunnerProc->>RunnerProc: build_parser()
    RunnerProc->>RunnerProc: parse_args()
    RunnerProc->>RunnerProc: media_type = MediaType.VIDEO
    RunnerProc->>RunnerProc: setenv BENCHMARK_VIDEO_DEVICE=cpu

    RunnerProc->>BR: BenchmarkRunner(..., media_type=VIDEO, video_device=cpu)
    User->>BR: run(output_path)

    BR->>Utils: _load_videos()
    Utils->>Utils: import torch
    Utils->>Utils: device = torch.device(cpu)
    Utils->>Utils: gpu_available = False
    Utils-->>BR: list of video tensors

    note over Kornia,RunnerProc: In kornia_video_impl
    RunnerProc->>Kornia: import kornia_video_impl
    Kornia->>Kornia: _dev = BENCHMARK_VIDEO_DEVICE
    Kornia->>Kornia: device = torch.device(cpu)

    BR->>Kornia: create_transform(spec)
    Kornia-->>BR: transform or None

    loop for each run
        BR->>Kornia: create_tensor(frame_batch, device=cpu)
        Kornia-->>BR: tensor float16
        BR->>BR: call_fn(transform, video)
    end

    BR->>RunnerProc: results, metadata(video_device=cpu)
    RunnerProc-->>User: write results JSON and summary
Loading

Updated class diagram for BenchmarkRunner, CLI, and kornia video backend

classDiagram
    class BenchmarkRunner {
        +str library
        +Path data_dir
        +list[Any] transforms
        +Callable call_fn
        +MediaType media_type
        +int num_items
        +int num_runs
        +int max_warmup
        +float warmup_threshold
        +int min_warmup_windows
        +int num_channels
        +str~None _video_device
        +__init__(library, data_dir, transforms, call_fn, media_type, num_items, num_runs, max_warmup, warmup_threshold, min_warmup_windows, num_channels, video_device)
        +run(output_path) dict~str,Any~
        -_load_videos() list~Any~
        +filter_transforms(transforms, filter_names) list~TransformSpec~
    }

    class RunnerMain {
        +main()
        -build_parser() argparse_ArgumentParser
    }

    class BenchmarkCLI {
        +build_parser() argparse_ArgumentParser
        +cmd_run(args)
        -_ensure_venv(library, media, repo_root) Path
        -_run_single(library, media, repo_root, specs_file, output_dir, num_items, num_runs, max_warmup, warmup_threshold, min_warmup_windows, transforms_filter, verbose, num_channels, video_device)
        -_cmd_run_gcp(args, repo_root, local_output_dir)
    }

    class KorniaVideoImpl {
        +torch_device device
        +str LIBRARY
        +call(transform, video) Any
        +create_tensor(data, device) torch_Tensor
        +create_transform(spec) Any
    }

    class VideoUtils {
        +read_video_cv2(path) np_ndarray
        +read_video_torch(path) Any
    }

    class MediaType {
        <<enumeration>>
        IMAGE
        VIDEO
    }

    BenchmarkCLI --> RunnerMain : invokes python module
    RunnerMain --> BenchmarkRunner : constructs
    BenchmarkRunner --> VideoUtils : uses _load_videos
    RunnerMain --> KorniaVideoImpl : imports when library == kornia
    BenchmarkRunner --> KorniaVideoImpl : uses create_transform and create_tensor
    RunnerMain --> MediaType : selects
    BenchmarkCLI --> MediaType : parses from args

    class RequirementsManager {
        +_ensure_venv(library, media, repo_root) Path
    }

    RequirementsManager <|-- BenchmarkCLI
    RequirementsManager : selects base_req

    class BaseRequirements {
        +requirements_txt
        +requirements_base_video_torch_txt
    }

    RequirementsManager --> BaseRequirements : chooses file based on media and library
Loading

File-Level Changes

Change Details Files
Allow forcing video benchmark device and propagate it through runner, CLI, and environment.
  • Add optional video_device parameter to BenchmarkRunner, store it, and include it in video benchmark metadata.
  • Update _load_videos to derive torch device from video_device when provided while still checking CUDA availability.
  • Extend CLI/main argument parsing with --video-device, wire it into BenchmarkRunner construction, and set BENCHMARK_VIDEO_DEVICE env var for video runs.
  • Plumb video_device through benchmark.cli _run_single, cmd_run, and GCP run helpers so local and remote invocations can pass the flag.
benchmark/runner.py
benchmark/cli.py
Adjust kornia video implementation device selection and tensor dtype, and disable problematic transforms for video.
  • Change kornia_video_impl device selection to honor BENCHMARK_VIDEO_DEVICE when present before falling back to CUDA-if-available logic.
  • Simplify create_tensor helper to always create float16 tensors (matching kornia video loader) and accept numpy arrays or simple float lists.
  • Return None for Elastic and Perspective specs with comments explaining OOM and type issues when used with batched video tensors, effectively skipping them in kornia video benchmarks.
  • Switch RandomRotate90 transform to RandomRotation with degrees=(0.0, 360.0) instead of times=(0, 3) for video usage.
benchmark/transforms/kornia_video_impl.py
Improve torchvision video reading behavior for benchmarks.
  • Call torchvision.io.read_video with pts_unit='sec' to avoid wrong-result warnings and align with recommended API usage.
  • Update the docstring for read_video_torch to document the pts_unit change and returned shape before permutation.
benchmark/utils.py
Use a torch-specific base requirements file for video benchmarks with torchvision or kornia to avoid libavdevice conflicts.
  • In _ensure_venv, choose requirements-base-video-torch.txt instead of the default base requirements when media is video and library is torchvision or kornia.
  • Add the new requirements-base-video-torch.txt file to the repo as the base for these video torch environments.
benchmark/cli.py
requirements/requirements-base-video-torch.txt
Refresh video benchmark results artifacts and documentation for the new CPU video runs.
  • Update README video benchmark table numbers and albumentationsx version tag from 2.0.18 to 2.0.20 to reflect new measurements, including speedup recalculations.
  • Reorganize video benchmark outputs under output_videos/cpu/ and add new CPU result JSON files for albumentationsx and torchvision video benchmarks, replacing/augmenting the old location.
README.md
output_videos/albumentationsx_video_results.json
output_videos/cpu/albumentationsx_video_results.json
output_videos/cpu/torchvision_video_results.json

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In BenchmarkRunner._load_videos, when --video-device cuda is requested but torch.cuda.is_available() is false, device is still set to CUDA; consider either failing fast with a clear error or falling back to CPU so you don't hit a runtime device error later.
  • For Kornia Elastic and Perspective you now return None from create_transform; if the rest of the pipeline treats this the same as a valid transform it may silently skip these cases—consider explicitly marking them as unsupported (e.g., via specs or filtering) so the omission is visible to callers.
  • The updated create_tensor always forces float16, including on CPU runs; if some Kornia ops are slower or less well supported in FP16 on CPU, you might want to keep CPU tensors in float32 and handle any necessary casts at the video loading boundary instead.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `BenchmarkRunner._load_videos`, when `--video-device cuda` is requested but `torch.cuda.is_available()` is false, `device` is still set to CUDA; consider either failing fast with a clear error or falling back to CPU so you don't hit a runtime device error later.
- For Kornia `Elastic` and `Perspective` you now return `None` from `create_transform`; if the rest of the pipeline treats this the same as a valid transform it may silently skip these cases—consider explicitly marking them as unsupported (e.g., via specs or filtering) so the omission is visible to callers.
- The updated `create_tensor` always forces `float16`, including on CPU runs; if some Kornia ops are slower or less well supported in FP16 on CPU, you might want to keep CPU tensors in `float32` and handle any necessary casts at the video loading boundary instead.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

- Fail fast when --video-device cuda requested but CUDA unavailable
- Add unsupported_transforms to output for Kornia Elastic/Perspective
- Use float32 on CPU, float16 on GPU for Kornia create_tensor and video loading

Made-with: Cursor
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the benchmark tooling and artifacts to support CPU video benchmarking (alongside existing GPU-focused results), adds a torch-video base requirements set to avoid macOS PyAV/OpenCV conflicts, and updates some video-loading and kornia video transform behaviors.

Changes:

  • Add --video-device {cpu,cuda} plumbing through benchmark.clibenchmark.runner and persist the setting in results metadata.
  • Update torchvision video loading to pass pts_unit="sec" to avoid torchvision warnings.
  • Add new/updated video benchmark result JSONs (including CPU runs) and refresh the README’s embedded video benchmark table.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
requirements/requirements-base-video-torch.txt New “no-opencv” base requirements for torch-based video venvs (torchvision/kornia) to avoid macOS conflicts.
benchmark/cli.py Uses the new base requirements for torch video venvs; forwards --video-device into the runner (local + GCP).
benchmark/runner.py Adds --video-device option, device selection logic, and writes video_device into metadata.
benchmark/utils.py Uses torchvision.io.read_video(..., pts_unit="sec") for video loading.
benchmark/transforms/kornia_video_impl.py Attempts to respect BENCHMARK_VIDEO_DEVICE; disables problematic transforms; changes RandomRotate90 behavior.
output_videos/cpu/torchvision_video_results.json New CPU torchvision video benchmark output artifact.
output_videos/cpu/albumentationsx_video_results.json New CPU albumentationsx video benchmark output artifact.
output_videos/albumentationsx_video_results.json Updates existing albumentationsx video results (version + numbers).
README.md Refreshes the embedded video benchmark comparison table (albumentationsx column/version).
Comments suppressed due to low confidence (2)

benchmark/cli.py:110

  • Selecting a base requirements file without OpenCV won’t remove an already-installed opencv-python(-headless) from an existing venv (since this uses uv pip install, not a sync). That means the macOS libavdevice conflict may still occur unless the venv is recreated. Consider switching to uv pip sync (or explicitly uninstalling OpenCV / recreating the venv when the base requirements change) so the environment actually matches the intended base set.
    # Torch video venvs use a base without opencv to avoid duplicate libavdevice (av vs cv2)
    if media == "video" and library in ("torchvision", "kornia"):
        base_req = repo_root / "requirements" / "requirements-base-video-torch.txt"
    else:
        base_req = repo_root / "requirements" / "requirements.txt"
    subprocess.run(
        [str(python), "-m", "uv", "pip", "install", "-q", "-U", "-r", str(base_req)],
        check=True,
    )

benchmark/runner.py:557

  • BENCHMARK_VIDEO_DEVICE is set directly from --video-device. If a user passes --video-device cuda on a host without CUDA, kornia_video_impl will still pick torch.device('cuda') at import time and crash when moving tensors/transforms. Consider guarding this with torch.cuda.is_available() (fall back to CPU with a clear warning/error) before exporting the env var.
        default=3,
        help=(
            "Number of image channels. Must be a multiple of 3. "

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Get device (respect BENCHMARK_VIDEO_DEVICE for CPU-only runs)
_dev = (os.environ.get("BENCHMARK_VIDEO_DEVICE") or "").strip().lower()
device = torch.device(_dev if _dev in ("cpu", "cuda") else ("cuda" if torch.cuda.is_available() else "cpu"))
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If BENCHMARK_VIDEO_DEVICE is set to "cuda" on a machine without CUDA, this will still select torch.device("cuda"), and subsequent .to(device) calls will crash. Consider validating torch.cuda.is_available() when _dev == "cuda" and falling back to CPU (and/or emitting a clear warning).

Suggested change
device = torch.device(_dev if _dev in ("cpu", "cuda") else ("cuda" if torch.cuda.is_available() else "cpu"))
if _dev == "cpu":
device = torch.device("cpu")
elif _dev == "cuda":
if torch.cuda.is_available():
device = torch.device("cuda")
else:
# Fall back to CPU if CUDA is requested but not available
print(
"BENCHMARK_VIDEO_DEVICE is set to 'cuda' but CUDA is not available; falling back to CPU.",
)
device = torch.device("cpu")
else:
# Default behavior: prefer CUDA if available, otherwise CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Copilot uses AI. Check for mistakes.
register_unsupported_reason(
LIBRARY,
"Elastic",
"RandomElasticTransform per-frame displacement fields cause OOM when T frames treated as batch",
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says this transform will be processed frame-by-frame and repeated, but the implementation returns None, which marks the transform as unsupported (it will be excluded entirely). Either implement the described fallback behavior or adjust the comment to reflect that Elastic is currently disabled for videos.

Suggested change
"RandomElasticTransform per-frame displacement fields cause OOM when T frames treated as batch",
"Elastic video transform is currently disabled: RandomElasticTransform per-frame displacement fields cause OOM when T frames are treated as a batch",

Copilot uses AI. Check for mistakes.
Comment on lines 308 to 313
@@ -316,7 +313,7 @@ def create_transform(spec: TransformSpec) -> Any | None:
).to(device)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RandomRotate90 is expected to rotate by multiples of 90°, but this uses RandomRotation(degrees=(0.0, 360.0)), which produces arbitrary angles and changes the transform semantics vs the spec/name and other libraries. Consider using a 90°-step rotation op (or sampling from {0,90,180,270} and applying a deterministic rotation).

Copilot uses AI. Check for mistakes.
Comment on lines +169 to +174
if self._video_device:
device = torch.device(self._video_device)
gpu_available = device.type == "cuda" and torch.cuda.is_available()
if device.type == "cuda" and not torch.cuda.is_available():
raise RuntimeError(
"--video-device cuda was requested but torch.cuda.is_available() is False. "
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When video_device is explicitly set to cuda but CUDA isn’t actually available, this silently proceeds with device=torch.device('cuda') while leaving gpu_available=False (so tensors stay on CPU). That can lead to confusing behavior/metadata (and can still break libraries that honor BENCHMARK_VIDEO_DEVICE). Consider explicitly erroring or warning and falling back to CPU when video_device=='cuda' but torch.cuda.is_available() is false.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants