feat(benchmark): add video CPU benchmarks and kornia transforms#45
feat(benchmark): add video CPU benchmarks and kornia transforms#45
Conversation
Made-with: Cursor
Reviewer's GuideUpdates the video benchmark pipeline to support explicit CPU/GPU selection, align kornia video tensor dtypes and torchvision video loading behavior, work around problematic kornia transforms for batched video, adjust virtualenv base requirements for torch-based video libs, and refresh the README video benchmark table plus output JSON layout to reflect new CPU runs. Sequence diagram for video benchmark execution with explicit video_device and env overridesequenceDiagram
actor User
participant CLI as BenchmarkCLI
participant Venv as VirtualEnvManager
participant RunnerProc as BenchmarkRunnerMain
participant BR as BenchmarkRunner
participant Utils as VideoUtils
participant Kornia as KorniaVideoImpl
User->>CLI: cmd_run --media video --video-device cpu
CLI->>Venv: _run_single(media=video, video_device=cpu)
Venv->>Venv: _ensure_venv(library, media)
Venv->>RunnerProc: python benchmark/runner.py --media video --video-device cpu
RunnerProc->>RunnerProc: build_parser()
RunnerProc->>RunnerProc: parse_args()
RunnerProc->>RunnerProc: media_type = MediaType.VIDEO
RunnerProc->>RunnerProc: setenv BENCHMARK_VIDEO_DEVICE=cpu
RunnerProc->>BR: BenchmarkRunner(..., media_type=VIDEO, video_device=cpu)
User->>BR: run(output_path)
BR->>Utils: _load_videos()
Utils->>Utils: import torch
Utils->>Utils: device = torch.device(cpu)
Utils->>Utils: gpu_available = False
Utils-->>BR: list of video tensors
note over Kornia,RunnerProc: In kornia_video_impl
RunnerProc->>Kornia: import kornia_video_impl
Kornia->>Kornia: _dev = BENCHMARK_VIDEO_DEVICE
Kornia->>Kornia: device = torch.device(cpu)
BR->>Kornia: create_transform(spec)
Kornia-->>BR: transform or None
loop for each run
BR->>Kornia: create_tensor(frame_batch, device=cpu)
Kornia-->>BR: tensor float16
BR->>BR: call_fn(transform, video)
end
BR->>RunnerProc: results, metadata(video_device=cpu)
RunnerProc-->>User: write results JSON and summary
Updated class diagram for BenchmarkRunner, CLI, and kornia video backendclassDiagram
class BenchmarkRunner {
+str library
+Path data_dir
+list[Any] transforms
+Callable call_fn
+MediaType media_type
+int num_items
+int num_runs
+int max_warmup
+float warmup_threshold
+int min_warmup_windows
+int num_channels
+str~None _video_device
+__init__(library, data_dir, transforms, call_fn, media_type, num_items, num_runs, max_warmup, warmup_threshold, min_warmup_windows, num_channels, video_device)
+run(output_path) dict~str,Any~
-_load_videos() list~Any~
+filter_transforms(transforms, filter_names) list~TransformSpec~
}
class RunnerMain {
+main()
-build_parser() argparse_ArgumentParser
}
class BenchmarkCLI {
+build_parser() argparse_ArgumentParser
+cmd_run(args)
-_ensure_venv(library, media, repo_root) Path
-_run_single(library, media, repo_root, specs_file, output_dir, num_items, num_runs, max_warmup, warmup_threshold, min_warmup_windows, transforms_filter, verbose, num_channels, video_device)
-_cmd_run_gcp(args, repo_root, local_output_dir)
}
class KorniaVideoImpl {
+torch_device device
+str LIBRARY
+call(transform, video) Any
+create_tensor(data, device) torch_Tensor
+create_transform(spec) Any
}
class VideoUtils {
+read_video_cv2(path) np_ndarray
+read_video_torch(path) Any
}
class MediaType {
<<enumeration>>
IMAGE
VIDEO
}
BenchmarkCLI --> RunnerMain : invokes python module
RunnerMain --> BenchmarkRunner : constructs
BenchmarkRunner --> VideoUtils : uses _load_videos
RunnerMain --> KorniaVideoImpl : imports when library == kornia
BenchmarkRunner --> KorniaVideoImpl : uses create_transform and create_tensor
RunnerMain --> MediaType : selects
BenchmarkCLI --> MediaType : parses from args
class RequirementsManager {
+_ensure_venv(library, media, repo_root) Path
}
RequirementsManager <|-- BenchmarkCLI
RequirementsManager : selects base_req
class BaseRequirements {
+requirements_txt
+requirements_base_video_torch_txt
}
RequirementsManager --> BaseRequirements : chooses file based on media and library
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
BenchmarkRunner._load_videos, when--video-device cudais requested buttorch.cuda.is_available()is false,deviceis still set to CUDA; consider either failing fast with a clear error or falling back to CPU so you don't hit a runtime device error later. - For Kornia
ElasticandPerspectiveyou now returnNonefromcreate_transform; if the rest of the pipeline treats this the same as a valid transform it may silently skip these cases—consider explicitly marking them as unsupported (e.g., via specs or filtering) so the omission is visible to callers. - The updated
create_tensoralways forcesfloat16, including on CPU runs; if some Kornia ops are slower or less well supported in FP16 on CPU, you might want to keep CPU tensors infloat32and handle any necessary casts at the video loading boundary instead.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `BenchmarkRunner._load_videos`, when `--video-device cuda` is requested but `torch.cuda.is_available()` is false, `device` is still set to CUDA; consider either failing fast with a clear error or falling back to CPU so you don't hit a runtime device error later.
- For Kornia `Elastic` and `Perspective` you now return `None` from `create_transform`; if the rest of the pipeline treats this the same as a valid transform it may silently skip these cases—consider explicitly marking them as unsupported (e.g., via specs or filtering) so the omission is visible to callers.
- The updated `create_tensor` always forces `float16`, including on CPU runs; if some Kornia ops are slower or less well supported in FP16 on CPU, you might want to keep CPU tensors in `float32` and handle any necessary casts at the video loading boundary instead.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
- Fail fast when --video-device cuda requested but CUDA unavailable - Add unsupported_transforms to output for Kornia Elastic/Perspective - Use float32 on CPU, float16 on GPU for Kornia create_tensor and video loading Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
This PR extends the benchmark tooling and artifacts to support CPU video benchmarking (alongside existing GPU-focused results), adds a torch-video base requirements set to avoid macOS PyAV/OpenCV conflicts, and updates some video-loading and kornia video transform behaviors.
Changes:
- Add
--video-device {cpu,cuda}plumbing throughbenchmark.cli→benchmark.runnerand persist the setting in results metadata. - Update torchvision video loading to pass
pts_unit="sec"to avoid torchvision warnings. - Add new/updated video benchmark result JSONs (including CPU runs) and refresh the README’s embedded video benchmark table.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| requirements/requirements-base-video-torch.txt | New “no-opencv” base requirements for torch-based video venvs (torchvision/kornia) to avoid macOS conflicts. |
| benchmark/cli.py | Uses the new base requirements for torch video venvs; forwards --video-device into the runner (local + GCP). |
| benchmark/runner.py | Adds --video-device option, device selection logic, and writes video_device into metadata. |
| benchmark/utils.py | Uses torchvision.io.read_video(..., pts_unit="sec") for video loading. |
| benchmark/transforms/kornia_video_impl.py | Attempts to respect BENCHMARK_VIDEO_DEVICE; disables problematic transforms; changes RandomRotate90 behavior. |
| output_videos/cpu/torchvision_video_results.json | New CPU torchvision video benchmark output artifact. |
| output_videos/cpu/albumentationsx_video_results.json | New CPU albumentationsx video benchmark output artifact. |
| output_videos/albumentationsx_video_results.json | Updates existing albumentationsx video results (version + numbers). |
| README.md | Refreshes the embedded video benchmark comparison table (albumentationsx column/version). |
Comments suppressed due to low confidence (2)
benchmark/cli.py:110
- Selecting a base requirements file without OpenCV won’t remove an already-installed
opencv-python(-headless)from an existing venv (since this usesuv pip install, not a sync). That means the macOS libavdevice conflict may still occur unless the venv is recreated. Consider switching touv pip sync(or explicitly uninstalling OpenCV / recreating the venv when the base requirements change) so the environment actually matches the intended base set.
# Torch video venvs use a base without opencv to avoid duplicate libavdevice (av vs cv2)
if media == "video" and library in ("torchvision", "kornia"):
base_req = repo_root / "requirements" / "requirements-base-video-torch.txt"
else:
base_req = repo_root / "requirements" / "requirements.txt"
subprocess.run(
[str(python), "-m", "uv", "pip", "install", "-q", "-U", "-r", str(base_req)],
check=True,
)
benchmark/runner.py:557
BENCHMARK_VIDEO_DEVICEis set directly from--video-device. If a user passes--video-device cudaon a host without CUDA, kornia_video_impl will still picktorch.device('cuda')at import time and crash when moving tensors/transforms. Consider guarding this withtorch.cuda.is_available()(fall back to CPU with a clear warning/error) before exporting the env var.
default=3,
help=(
"Number of image channels. Must be a multiple of 3. "
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | ||
| # Get device (respect BENCHMARK_VIDEO_DEVICE for CPU-only runs) | ||
| _dev = (os.environ.get("BENCHMARK_VIDEO_DEVICE") or "").strip().lower() | ||
| device = torch.device(_dev if _dev in ("cpu", "cuda") else ("cuda" if torch.cuda.is_available() else "cpu")) |
There was a problem hiding this comment.
If BENCHMARK_VIDEO_DEVICE is set to "cuda" on a machine without CUDA, this will still select torch.device("cuda"), and subsequent .to(device) calls will crash. Consider validating torch.cuda.is_available() when _dev == "cuda" and falling back to CPU (and/or emitting a clear warning).
| device = torch.device(_dev if _dev in ("cpu", "cuda") else ("cuda" if torch.cuda.is_available() else "cpu")) | |
| if _dev == "cpu": | |
| device = torch.device("cpu") | |
| elif _dev == "cuda": | |
| if torch.cuda.is_available(): | |
| device = torch.device("cuda") | |
| else: | |
| # Fall back to CPU if CUDA is requested but not available | |
| print( | |
| "BENCHMARK_VIDEO_DEVICE is set to 'cuda' but CUDA is not available; falling back to CPU.", | |
| ) | |
| device = torch.device("cpu") | |
| else: | |
| # Default behavior: prefer CUDA if available, otherwise CPU | |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
| register_unsupported_reason( | ||
| LIBRARY, | ||
| "Elastic", | ||
| "RandomElasticTransform per-frame displacement fields cause OOM when T frames treated as batch", |
There was a problem hiding this comment.
The comment says this transform will be processed frame-by-frame and repeated, but the implementation returns None, which marks the transform as unsupported (it will be excluded entirely). Either implement the described fallback behavior or adjust the comment to reflect that Elastic is currently disabled for videos.
| "RandomElasticTransform per-frame displacement fields cause OOM when T frames treated as batch", | |
| "Elastic video transform is currently disabled: RandomElasticTransform per-frame displacement fields cause OOM when T frames are treated as a batch", |
| @@ -316,7 +313,7 @@ def create_transform(spec: TransformSpec) -> Any | None: | |||
| ).to(device) | |||
There was a problem hiding this comment.
RandomRotate90 is expected to rotate by multiples of 90°, but this uses RandomRotation(degrees=(0.0, 360.0)), which produces arbitrary angles and changes the transform semantics vs the spec/name and other libraries. Consider using a 90°-step rotation op (or sampling from {0,90,180,270} and applying a deterministic rotation).
| if self._video_device: | ||
| device = torch.device(self._video_device) | ||
| gpu_available = device.type == "cuda" and torch.cuda.is_available() | ||
| if device.type == "cuda" and not torch.cuda.is_available(): | ||
| raise RuntimeError( | ||
| "--video-device cuda was requested but torch.cuda.is_available() is False. " |
There was a problem hiding this comment.
When video_device is explicitly set to cuda but CUDA isn’t actually available, this silently proceeds with device=torch.device('cuda') while leaving gpu_available=False (so tensors stay on CPU). That can lead to confusing behavior/metadata (and can still break libraries that honor BENCHMARK_VIDEO_DEVICE). Consider explicitly erroring or warning and falling back to CPU when video_device=='cuda' but torch.cuda.is_available() is false.
Made-with: Cursor
Summary by Sourcery
Add CPU-targeted video benchmark support and improve Kornia/torchvision video transform handling in the benchmarking pipeline.
New Features:
Bug Fixes:
Enhancements:
Build:
Documentation:
Chores: