Image and Video Augmentation Library Benchmarks

A comprehensive benchmarking suite for comparing the performance of popular image and video augmentation libraries including AlbumentationsX, torchvision, and Kornia.

GitAds Sponsored

Table of Contents

Image and Video Augmentation Library Benchmarks

Overview

This benchmark suite measures the throughput and performance characteristics of common augmentation operations across different libraries. It features:

Benchmarks for both image and video augmentation
Adaptive warmup to ensure stable measurements
Multiple runs for statistical significance
Detailed performance metrics and system information
Thread control settings for consistent performance
Support for multiple image/video formats and loading methods

Benchmark Types

Image Benchmarks

The image benchmarks compare the performance of various libraries on standard image transformations. All benchmarks are run on a single CPU thread to ensure consistent and comparable results.

Transform	albumentationsx 2.0.18 [img/s]	kornia 0.8.2 [img/s]	torchvision 0.25.0 [img/s]	Speedup (albx/fastest other)
Affine	1428 ± 2	-	264 ± 16	5.40x
AutoContrast	1666 ± 15	576 ± 18	178 ± 2	2.89x
Blur	7592 ± 285	365 ± 8	-	20.80x
Brightness	12784 ± 1017	2276 ± 169	1681 ± 21	5.62x
CLAHE	633 ± 3	109 ± 2	-	5.81x
CenterCrop128	115895 ± 4274	-	203348 ± 7429	0.57x
ChannelDropout	12420 ± 866	3065 ± 179	-	4.05x
ChannelShuffle	8075 ± 291	1446 ± 115	4290 ± 303	1.88x
ColorJitter	1132 ± 23	100 ± 3	88 ± 3	11.33x
Contrast	14165 ± 104	2159 ± 193	870 ± 26	6.56x
CornerIllumination	468 ± 11	350 ± 4	-	1.34x
Equalize	1243 ± 6	310 ± 17	588 ± 17	2.11x
Erasing	26411 ± 4926	776 ± 45	10421 ± 629	2.53x
GaussianBlur	2429 ± 9	353 ± 13	124 ± 17	6.89x
GaussianIllumination	772 ± 17	428 ± 16	-	1.80x
GaussianNoise	343 ± 4	121 ± 2	-	2.82x
Grayscale	20430 ± 2245	1574 ± 77	2206 ± 179	9.26x
HorizontalFlip	13654 ± 353	1128 ± 42	2234 ± 27	6.11x
Hue	1917 ± 31	123 ± 7	-	15.55x
Invert	32495 ± 6354	4412 ± 293	22891 ± 2484	1.42x
JpegCompression	1321 ± 9	117 ± 5	826 ± 11	1.60x
LinearIllumination	485 ± 9	849 ± 22	-	0.57x
LongestMaxSize	3840 ± 68	481 ± 36	-	7.99x
MotionBlur	4385 ± 110	117 ± 6	-	37.55x
Normalize	1602 ± 9	1173 ± 39	947 ± 33	1.37x
OpticalDistortion	801 ± 2	193 ± 4	-	4.14x
Pad	47542 ± 820	-	4480 ± 129	10.61x
Perspective	1173 ± 3	170 ± 5	217 ± 8	5.40x
PhotoMetricDistort	943 ± 18	-	80 ± 3	11.74x
PlankianJitter	3138 ± 69	1578 ± 100	-	1.99x
PlasmaBrightness	170 ± 8	76 ± 2	-	2.24x
PlasmaContrast	156 ± 2	75 ± 6	-	2.07x
PlasmaShadow	196 ± 2	211 ± 5	-	0.93x
Posterize	13203 ± 680	709 ± 27	17723 ± 1380	0.74x
RGBShift	2252 ± 23	1787 ± 71	-	1.26x
Rain	2064 ± 15	1591 ± 61	-	1.30x
RandomCrop128	113953 ± 2731	2802 ± 40	112838 ± 2384	1.01x
RandomGamma	13280 ± 1279	226 ± 5	-	58.64x
RandomResizedCrop	4322 ± 9	579 ± 6	789 ± 27	5.48x
Resize	3502 ± 52	648 ± 15	271 ± 4	5.40x
Rotate	2981 ± 11	330 ± 7	319 ± 8	9.02x
SaltAndPepper	613 ± 4	450 ± 5	-	1.36x
Saturation	1328 ± 45	132 ± 4	-	10.09x
Sharpen	2251 ± 15	263 ± 14	274 ± 9	8.20x
Shear	1290 ± 9	358 ± 11	-	3.60x
SmallestMaxSize	2621 ± 31	375 ± 10	-	6.99x
Snow	723 ± 5	129 ± 4	-	5.60x
Solarize	12811 ± 785	262 ± 3	1117 ± 35	11.47x
ThinPlateSpline	89 ± 2	61 ± 2	-	1.45x
VerticalFlip	31055 ± 325	2387 ± 58	26928 ± 4799	1.15x

Multi-Channel Image Benchmarks (9ch)

Benchmarks on 9-channel images (3x stacked RGB) to test OpenCV chunking and library support for >4 channels.

Transform	albumentationsx 2.0.18 [img/s]	kornia 0.8.2 [img/s]	torchvision 0.25.0 [img/s]	Speedup (albx/fastest other)
Affine	640 ± 7	228 ± 3	143 ± 3	2.81x
AutoContrast	436 ± 4	374 ± 3	-	1.17x
Blur	2307 ± 34	186 ± 3	-	12.37x
Brightness	3746 ± 22	1350 ± 40	-	2.77x
CenterCrop128	48885 ± 772	-	223574 ± 5049	0.22x
ChannelDropout	5853 ± 152	2179 ± 95	-	2.69x
ChannelShuffle	2282 ± 54	929 ± 25	1600 ± 41	1.43x
Contrast	3756 ± 76	1346 ± 31	-	2.79x
CornerIllumination	209 ± 2	181 ± 3	-	1.16x
Erasing	9957 ± 240	426 ± 10	4321 ± 384	2.30x
GaussianBlur	757 ± 4	188 ± 2	49 ± 6	4.03x
GaussianIllumination	251 ± 1	212 ± 15	-	1.18x
GaussianNoise	96 ± 2	65 ± 0	-	1.47x
HorizontalFlip	2436 ± 204	2286 ± 557	15102 ± 3640	0.16x
Invert	9859 ± 1220	2774 ± 169	15806 ± 3070	0.62x
LinearIllumination	146 ± 2	491 ± 12	-	0.30x
LongestMaxSize	835 ± 17	376 ± 2	-	2.22x
MotionBlur	1489 ± 22	63 ± 1	-	23.66x
Normalize	386 ± 4	1402 ± 64	795 ± 22	0.28x
OpticalDistortion	466 ± 4	157 ± 4	-	2.97x
Pad	8573 ± 797	-	9112 ± 704	0.94x
Perspective	581 ± 7	149 ± 1	129 ± 2	3.91x
PlasmaBrightness	86 ± 0	24 ± 1	-	3.55x
PlasmaContrast	69 ± 1	24 ± 1	-	2.85x
PlasmaShadow	127 ± 1	224 ± 2	-	0.57x
Posterize	4088 ± 111	317 ± 16	12018 ± 1989	0.34x
RandomCrop128	47928 ± 912	2566 ± 75	124539 ± 2345	0.38x
RandomGamma	4161 ± 170	83 ± 0	-	50.43x
RandomResizedCrop	970 ± 8	309 ± 2	297 ± 3	3.14x
Resize	744 ± 6	297 ± 3	194 ± 1	2.51x
Rotate	1729 ± 53	172 ± 1	152 ± 10	10.06x
Sharpen	723 ± 3	140 ± 6	-	5.16x
Shear	658 ± 6	250 ± 2	163 ± 6	2.63x
SmallestMaxSize	583 ± 7	187 ± 3	-	3.12x
Solarize	4048 ± 129	339 ± 4	456 ± 11	8.88x
ThinPlateSpline	79 ± 2	62 ± 0	-	1.27x
VerticalFlip	8577 ± 77	2296 ± 118	15409 ± 890	0.56x

Video Benchmarks

The video benchmarks compare CPU-based processing (AlbumentationsX) with GPU-accelerated processing (Kornia) for video transformations. The benchmarks use the UCF101 dataset, which contains realistic videos from 101 action categories.

Transform	albumentationsx (video) 2.0.20 [vid/s]	kornia (video) 0.8.0 [vid/s]	torchvision (video) 0.21.0 [vid/s]	Speedup (albx/fastest other)
Affine	17 ± 1	21 ± 0	453 ± 0	0.04x
AutoContrast	13 ± 1	21 ± 0	578 ± 17	0.02x
Blur	52 ± 4	21 ± 0	-	2.53x
Brightness	58 ± 3	22 ± 0	756 ± 435	0.08x
CenterCrop128	574 ± 7	70 ± 1	1133 ± 235	0.51x
ChannelDropout	66 ± 2	22 ± 0	-	3.02x
ChannelShuffle	47 ± 3	20 ± 0	958 ± 0	0.05x
ColorJitter	10 ± 1	19 ± 0	69 ± 0	0.15x
Contrast	50 ± 7	22 ± 0	547 ± 13	0.09x
CornerIllumination	5 ± 0	3 ± 0	-	2.10x
Elastic	5 ± 0	-	127 ± 1	0.04x
Equalize	9 ± 1	4 ± 0	192 ± 1	0.05x
Erasing	63 ± 3	-	255 ± 7	0.25x
GaussianBlur	23 ± 0	22 ± 0	543 ± 11	0.04x
GaussianIllumination	7 ± 0	20 ± 0	-	0.37x
GaussianNoise	3 ± 0	22 ± 0	-	0.13x
Grayscale	65 ± 4	22 ± 0	838 ± 467	0.08x
HorizontalFlip	55 ± 1	22 ± 0	978 ± 49	0.06x
Hue	15 ± 1	20 ± 0	-	0.77x
Invert	63 ± 5	22 ± 0	843 ± 176	0.07x
LinearIllumination	5 ± 0	4 ± 0	-	1.23x
MedianBlur	18 ± 0	8 ± 0	-	2.13x
Normalize	12 ± 1	22 ± 0	461 ± 0	0.03x
Pad	59 ± 3	-	760 ± 338	0.08x
Perspective	15 ± 0	-	435 ± 0	0.03x
PlankianJitter	21 ± 3	11 ± 0	-	1.92x
PlasmaBrightness	1 ± 0	17 ± 0	-	0.06x
PlasmaContrast	1 ± 0	17 ± 0	-	0.07x
PlasmaShadow	1 ± 0	19 ± 0	-	0.07x
Posterize	44 ± 6	-	631 ± 15	0.07x
RGBShift	20 ± 2	22 ± 0	-	0.90x
Rain	23 ± 1	4 ± 0	-	6.01x
RandomCrop128	541 ± 9	65 ± 0	1133 ± 15	0.48x
RandomGamma	43 ± 4	22 ± 0	-	1.99x
RandomResizedCrop	15 ± 1	6 ± 0	182 ± 16	0.08x
Resize	15 ± 0	6 ± 0	140 ± 35	0.11x
Rotate	27 ± 1	22 ± 0	534 ± 0	0.05x
SaltAndPepper	7 ± 0	9 ± 0	-	0.78x
Saturation	9 ± 1	37 ± 0	-	0.23x
Sharpen	23 ± 1	18 ± 0	420 ± 9	0.05x
Solarize	51 ± 2	21 ± 0	628 ± 6	0.08x
ThinPlateSpline	1 ± 0	45 ± 1	-	0.03x
VerticalFlip	69 ± 2	22 ± 0	978 ± 5	0.07x

Performance Highlights

Image Augmentation Performance

See the full benchmark table above for image results.

Video Augmentation Performance

See the full benchmark table above for video results.

Requirements

The benchmark automatically creates isolated virtual environments for each library and installs the necessary dependencies. Base requirements:

Python 3.10+
uv (for fast package installation)
Disk space for virtual environments
Image/video dataset in a supported format

Supported Libraries

Each library's specific dependencies are managed through separate requirements files in the requirements/ directory.

Setup

Getting Started

For testing and comparison purposes, you can use standard datasets:

For image benchmarks:

wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
tar -xf ILSVRC2012_img_val.tar -C /path/to/your/target/directory

For video benchmarks:

# UCF101 dataset
wget https://www.crcv.ucf.edu/data/UCF101/UCF101.rar
unrar x UCF101.rar -d /path/to/your/target/directory

Using Your Own Data

We strongly recommend running the benchmarks on your own dataset that matches your use case:

Use images/videos that are representative of your actual workload
Consider sizes and formats you typically work with
Include edge cases specific to your application

This will give you more relevant performance metrics for your specific use case.

Running Benchmarks

All benchmarks use the unified CLI: python -m benchmark.cli run. Use --media for image vs video, --multichannel for 9-channel image benchmarks, and --libraries to restrict to one or more libraries.

RGB image benchmarks (all libraries)

python -m benchmark.cli run -d /path/to/images -o /path/to/output

RGB image benchmarks (single library)

python -m benchmark.cli run -d /path/to/images -o /path/to/output --libraries albumentationsx
python -m benchmark.cli run -d /path/to/images -o /path/to/output --libraries torchvision
python -m benchmark.cli run -d /path/to/images -o /path/to/output --libraries kornia

Multi-channel image benchmarks (9ch, all libraries)

python -m benchmark.cli run -d /path/to/images -o /path/to/output --multichannel

Multi-channel image benchmarks (9ch, single library)

python -m benchmark.cli run -d /path/to/images -o /path/to/output --multichannel --libraries albumentationsx
python -m benchmark.cli run -d /path/to/images -o /path/to/output --multichannel --libraries torchvision
python -m benchmark.cli run -d /path/to/images -o /path/to/output --multichannel --libraries kornia

Video benchmarks (all libraries)

python -m benchmark.cli run -d /path/to/videos -o /path/to/output --media video

Video benchmarks (single library)

python -m benchmark.cli run -d /path/to/videos -o /path/to/output --media video --libraries albumentationsx
python -m benchmark.cli run -d /path/to/videos -o /path/to/output --media video --libraries torchvision
python -m benchmark.cli run -d /path/to/videos -o /path/to/output --media video --libraries kornia

After running benchmarks, update the README tables with:

./tools/update_docs.sh
# Or with custom result dirs:
./tools/update_docs.sh --image-results output/ --video-results output_videos/

Using Custom Transforms

To benchmark transforms, create a Python file defining LIBRARY and CUSTOM_TRANSFORMS:

# my_transforms.py
import albumentations as A

# Specify the library
LIBRARY = "albumentationsx"

CUSTOM_TRANSFORMS = [
    # Test different parameters of the same transform
    A.ToGray(method="weighted_average", p=1),
    A.ToGray(method="pca", p=1),

    # Different noise levels
    A.GaussNoise(var_limit=(10.0, 50.0), p=1),
    A.GaussNoise(var_limit=(100.0, 200.0), p=1),

    # Any other transforms...
    A.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.2, p=1),
]

Then run:

python -m benchmark.cli run -d /path/to/videos -o output/ --media video --spec my_transforms.py

The results will show each transform with all its parameters:

ToGray(method=weighted_average, p=1)
ToGray(method=pca, p=1)
GaussNoise(var_limit=(10.0, 50.0), mean=0, p=1, per_channel=True)

See examples/custom_video_specs_template.py and example_direct_transforms.py for more examples.

To analyze parametric results:

python tools/analyze_parametric_results.py parametric_results.json

This will show:

Best and worst configurations for each transform
Performance differences between parameter choices
Optimal settings for your use case

Methodology

The benchmark methodology is designed to ensure fair and reproducible comparisons:

Data Loading: Data is loaded using library-specific loaders to ensure optimal format compatibility
Warmup Phase: Adaptive warmup until performance variance stabilizes
Measurement Phase: Multiple runs with statistical analysis
Environment Control: Consistent thread settings and hardware utilization

Contributing

Contributions are welcome! If you'd like to add support for a new library, improve the benchmarking methodology, or fix issues, please submit a pull request.

When contributing, please:

Follow the existing code style
Add tests for new functionality
Update documentation as needed
Ensure all tests pass

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Image and Video Augmentation Library Benchmarks

GitAds Sponsored

Overview

Benchmark Types

Image Benchmarks

Multi-Channel Image Benchmarks (9ch)

Video Benchmarks

Performance Highlights

Image Augmentation Performance

Video Augmentation Performance

Requirements

Supported Libraries

Setup

Getting Started

Using Your Own Data

Running Benchmarks

RGB image benchmarks (all libraries)

RGB image benchmarks (single library)

Multi-channel image benchmarks (9ch, all libraries)

Multi-channel image benchmarks (9ch, single library)

Video benchmarks (all libraries)

Video benchmarks (single library)

Using Custom Transforms

Methodology

Contributing

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Image and Video Augmentation Library Benchmarks

GitAds Sponsored

Overview

Benchmark Types

Image Benchmarks

Multi-Channel Image Benchmarks (9ch)

Video Benchmarks

Performance Highlights

Image Augmentation Performance

Video Augmentation Performance

Requirements

Supported Libraries

Setup

Getting Started

Using Your Own Data

Running Benchmarks

RGB image benchmarks (all libraries)

RGB image benchmarks (single library)

Multi-channel image benchmarks (9ch, all libraries)

Multi-channel image benchmarks (9ch, single library)

Video benchmarks (all libraries)

Video benchmarks (single library)

Using Custom Transforms

Methodology

Contributing