Skip to content

Add torchvision to benchmarks. #76

@ternaus

Description

@ternaus

Benchmark torchvision as a potential backend in albucore low-level ops

Context

In albucore we maintain a set of low-level helpers (e.g. hflip, multiply_by_constant, etc.) that dynamically select the fastest implementation depending on shape, dtype, and layout, currently choosing between:

  • OpenCV
  • NumPy
  • SimSIMD (where applicable)

These helpers are performance-critical and sit at the bottom of the AlbumentationsX stack, so backend selection is intentionally pragmatic and benchmark-driven.

Given how widely torchvision is used — and the fact that many users already depend on it transitively — it’s worth evaluating whether torchvision ops should be considered as an additional backend for some of these primitives.

This issue proposes a systematic benchmark, not a commitment.


Goal

Answer a single question with data:

Are there specific low-level operations, shapes, or dtypes where torchvision is meaningfully faster or more robust than our existing OpenCV / NumPy / SimSIMD paths?

If yes → we can consider adding torchvision as an optional backend.
If no → we document the result and move on.


Scope of benchmarking

Suggested initial candidates (non-exhaustive):

  • hflip / vflip
  • elementwise ops (e.g. multiply/add by constant)
  • simple type-preserving transforms
  • batched inputs (N, H, W, C)
  • multi-channel images (C > 4)
  • common dtypes: uint8, float32

Dimensions to explicitly test:

  • contiguous vs non-contiguous memory
  • small vs large images
  • CPU execution only (no CUDA)

Comparison targets

For each operation:

  • OpenCV implementation
  • NumPy implementation
  • SimSIMD (where available)
  • torchvision functional equivalent

Metrics:

  • wall-clock time
  • allocation behavior (extra copies?)
  • constraints on shape / dtype
  • semantic equivalence (exactness vs approximation)

Non-goals

  • No GPU / CUDA benchmarks in this issue
  • No API changes proposed yet
  • No commitment to add torchvision as a hard dependency

Acceptance criteria

This issue is considered resolved when:

  • Benchmarks are reproducible and documented
  • We have a clear table of “torchvision wins / losses”
  • A decision is made: add torchvision backend or explicitly reject

Either outcome is useful.


Why this matters

At this level of the stack, performance differences compound.
If torchvision provides a measurable win in specific regimes (e.g. large tensors, specific layouts), we should know — and if it doesn’t, we should stop wondering.

Benchmarks > intuition.


References

  • Existing backend selection logic in albucore
  • torchvision functional transforms documentation
  • prior OpenCV / NumPy / SimSIMD benchmarks in the repo

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions