Benchmark torchvision as a potential backend in albucore low-level ops
Context
In albucore we maintain a set of low-level helpers (e.g. hflip, multiply_by_constant, etc.) that dynamically select the fastest implementation depending on shape, dtype, and layout, currently choosing between:
- OpenCV
- NumPy
- SimSIMD (where applicable)
These helpers are performance-critical and sit at the bottom of the AlbumentationsX stack, so backend selection is intentionally pragmatic and benchmark-driven.
Given how widely torchvision is used — and the fact that many users already depend on it transitively — it’s worth evaluating whether torchvision ops should be considered as an additional backend for some of these primitives.
This issue proposes a systematic benchmark, not a commitment.
Goal
Answer a single question with data:
Are there specific low-level operations, shapes, or dtypes where torchvision is meaningfully faster or more robust than our existing OpenCV / NumPy / SimSIMD paths?
If yes → we can consider adding torchvision as an optional backend.
If no → we document the result and move on.
Scope of benchmarking
Suggested initial candidates (non-exhaustive):
hflip / vflip
- elementwise ops (e.g. multiply/add by constant)
- simple type-preserving transforms
- batched inputs (N, H, W, C)
- multi-channel images (C > 4)
- common dtypes:
uint8, float32
Dimensions to explicitly test:
- contiguous vs non-contiguous memory
- small vs large images
- CPU execution only (no CUDA)
Comparison targets
For each operation:
- OpenCV implementation
- NumPy implementation
- SimSIMD (where available)
- torchvision functional equivalent
Metrics:
- wall-clock time
- allocation behavior (extra copies?)
- constraints on shape / dtype
- semantic equivalence (exactness vs approximation)
Non-goals
- No GPU / CUDA benchmarks in this issue
- No API changes proposed yet
- No commitment to add torchvision as a hard dependency
Acceptance criteria
This issue is considered resolved when:
- Benchmarks are reproducible and documented
- We have a clear table of “torchvision wins / losses”
- A decision is made: add torchvision backend or explicitly reject
Either outcome is useful.
Why this matters
At this level of the stack, performance differences compound.
If torchvision provides a measurable win in specific regimes (e.g. large tensors, specific layouts), we should know — and if it doesn’t, we should stop wondering.
Benchmarks > intuition.
References
- Existing backend selection logic in
albucore
- torchvision functional transforms documentation
- prior OpenCV / NumPy / SimSIMD benchmarks in the repo
Benchmark torchvision as a potential backend in
albucorelow-level opsContext
In
albucorewe maintain a set of low-level helpers (e.g.hflip,multiply_by_constant, etc.) that dynamically select the fastest implementation depending on shape, dtype, and layout, currently choosing between:These helpers are performance-critical and sit at the bottom of the AlbumentationsX stack, so backend selection is intentionally pragmatic and benchmark-driven.
Given how widely torchvision is used — and the fact that many users already depend on it transitively — it’s worth evaluating whether torchvision ops should be considered as an additional backend for some of these primitives.
This issue proposes a systematic benchmark, not a commitment.
Goal
Answer a single question with data:
If yes → we can consider adding torchvision as an optional backend.
If no → we document the result and move on.
Scope of benchmarking
Suggested initial candidates (non-exhaustive):
hflip/vflipuint8,float32Dimensions to explicitly test:
Comparison targets
For each operation:
Metrics:
Non-goals
Acceptance criteria
This issue is considered resolved when:
Either outcome is useful.
Why this matters
At this level of the stack, performance differences compound.
If torchvision provides a measurable win in specific regimes (e.g. large tensors, specific layouts), we should know — and if it doesn’t, we should stop wondering.
Benchmarks > intuition.
References
albucore