A comprehensive benchmarking suite for comparing the performance of popular image and video augmentation libraries including AlbumentationsX, torchvision, and Kornia.
Table of Contents
This benchmark suite measures the throughput and performance characteristics of common augmentation operations across different libraries. It features:
- Benchmarks for both image and video augmentation
- Adaptive warmup to ensure stable measurements
- Multiple runs for statistical significance
- Detailed performance metrics and system information
- Thread control settings for consistent performance
- Support for multiple image/video formats and loading methods
The image benchmarks compare the performance of various libraries on standard image transformations. All benchmarks are run on a single CPU thread to ensure consistent and comparable results.
| Transform | albumentationsx 2.0.18 [img/s] | kornia 0.8.2 [img/s] | torchvision 0.25.0 [img/s] | Speedup (albx/fastest other) |
|---|---|---|---|---|
| Affine | 1428 ± 2 | - | 264 ± 16 | 5.40x |
| AutoContrast | 1666 ± 15 | 576 ± 18 | 178 ± 2 | 2.89x |
| Blur | 7592 ± 285 | 365 ± 8 | - | 20.80x |
| Brightness | 12784 ± 1017 | 2276 ± 169 | 1681 ± 21 | 5.62x |
| CLAHE | 633 ± 3 | 109 ± 2 | - | 5.81x |
| CenterCrop128 | 115895 ± 4274 | - | 203348 ± 7429 | 0.57x |
| ChannelDropout | 12420 ± 866 | 3065 ± 179 | - | 4.05x |
| ChannelShuffle | 8075 ± 291 | 1446 ± 115 | 4290 ± 303 | 1.88x |
| ColorJitter | 1132 ± 23 | 100 ± 3 | 88 ± 3 | 11.33x |
| Contrast | 14165 ± 104 | 2159 ± 193 | 870 ± 26 | 6.56x |
| CornerIllumination | 468 ± 11 | 350 ± 4 | - | 1.34x |
| Equalize | 1243 ± 6 | 310 ± 17 | 588 ± 17 | 2.11x |
| Erasing | 26411 ± 4926 | 776 ± 45 | 10421 ± 629 | 2.53x |
| GaussianBlur | 2429 ± 9 | 353 ± 13 | 124 ± 17 | 6.89x |
| GaussianIllumination | 772 ± 17 | 428 ± 16 | - | 1.80x |
| GaussianNoise | 343 ± 4 | 121 ± 2 | - | 2.82x |
| Grayscale | 20430 ± 2245 | 1574 ± 77 | 2206 ± 179 | 9.26x |
| HorizontalFlip | 13654 ± 353 | 1128 ± 42 | 2234 ± 27 | 6.11x |
| Hue | 1917 ± 31 | 123 ± 7 | - | 15.55x |
| Invert | 32495 ± 6354 | 4412 ± 293 | 22891 ± 2484 | 1.42x |
| JpegCompression | 1321 ± 9 | 117 ± 5 | 826 ± 11 | 1.60x |
| LinearIllumination | 485 ± 9 | 849 ± 22 | - | 0.57x |
| LongestMaxSize | 3840 ± 68 | 481 ± 36 | - | 7.99x |
| MotionBlur | 4385 ± 110 | 117 ± 6 | - | 37.55x |
| Normalize | 1602 ± 9 | 1173 ± 39 | 947 ± 33 | 1.37x |
| OpticalDistortion | 801 ± 2 | 193 ± 4 | - | 4.14x |
| Pad | 47542 ± 820 | - | 4480 ± 129 | 10.61x |
| Perspective | 1173 ± 3 | 170 ± 5 | 217 ± 8 | 5.40x |
| PhotoMetricDistort | 943 ± 18 | - | 80 ± 3 | 11.74x |
| PlankianJitter | 3138 ± 69 | 1578 ± 100 | - | 1.99x |
| PlasmaBrightness | 170 ± 8 | 76 ± 2 | - | 2.24x |
| PlasmaContrast | 156 ± 2 | 75 ± 6 | - | 2.07x |
| PlasmaShadow | 196 ± 2 | 211 ± 5 | - | 0.93x |
| Posterize | 13203 ± 680 | 709 ± 27 | 17723 ± 1380 | 0.74x |
| RGBShift | 2252 ± 23 | 1787 ± 71 | - | 1.26x |
| Rain | 2064 ± 15 | 1591 ± 61 | - | 1.30x |
| RandomCrop128 | 113953 ± 2731 | 2802 ± 40 | 112838 ± 2384 | 1.01x |
| RandomGamma | 13280 ± 1279 | 226 ± 5 | - | 58.64x |
| RandomResizedCrop | 4322 ± 9 | 579 ± 6 | 789 ± 27 | 5.48x |
| Resize | 3502 ± 52 | 648 ± 15 | 271 ± 4 | 5.40x |
| Rotate | 2981 ± 11 | 330 ± 7 | 319 ± 8 | 9.02x |
| SaltAndPepper | 613 ± 4 | 450 ± 5 | - | 1.36x |
| Saturation | 1328 ± 45 | 132 ± 4 | - | 10.09x |
| Sharpen | 2251 ± 15 | 263 ± 14 | 274 ± 9 | 8.20x |
| Shear | 1290 ± 9 | 358 ± 11 | - | 3.60x |
| SmallestMaxSize | 2621 ± 31 | 375 ± 10 | - | 6.99x |
| Snow | 723 ± 5 | 129 ± 4 | - | 5.60x |
| Solarize | 12811 ± 785 | 262 ± 3 | 1117 ± 35 | 11.47x |
| ThinPlateSpline | 89 ± 2 | 61 ± 2 | - | 1.45x |
| VerticalFlip | 31055 ± 325 | 2387 ± 58 | 26928 ± 4799 | 1.15x |
Benchmarks on 9-channel images (3x stacked RGB) to test OpenCV chunking and library support for >4 channels.
| Transform | albumentationsx 2.0.18 [img/s] | kornia 0.8.2 [img/s] | torchvision 0.25.0 [img/s] | Speedup (albx/fastest other) |
|---|---|---|---|---|
| Affine | 640 ± 7 | 228 ± 3 | 143 ± 3 | 2.81x |
| AutoContrast | 436 ± 4 | 374 ± 3 | - | 1.17x |
| Blur | 2307 ± 34 | 186 ± 3 | - | 12.37x |
| Brightness | 3746 ± 22 | 1350 ± 40 | - | 2.77x |
| CenterCrop128 | 48885 ± 772 | - | 223574 ± 5049 | 0.22x |
| ChannelDropout | 5853 ± 152 | 2179 ± 95 | - | 2.69x |
| ChannelShuffle | 2282 ± 54 | 929 ± 25 | 1600 ± 41 | 1.43x |
| Contrast | 3756 ± 76 | 1346 ± 31 | - | 2.79x |
| CornerIllumination | 209 ± 2 | 181 ± 3 | - | 1.16x |
| Erasing | 9957 ± 240 | 426 ± 10 | 4321 ± 384 | 2.30x |
| GaussianBlur | 757 ± 4 | 188 ± 2 | 49 ± 6 | 4.03x |
| GaussianIllumination | 251 ± 1 | 212 ± 15 | - | 1.18x |
| GaussianNoise | 96 ± 2 | 65 ± 0 | - | 1.47x |
| HorizontalFlip | 2436 ± 204 | 2286 ± 557 | 15102 ± 3640 | 0.16x |
| Invert | 9859 ± 1220 | 2774 ± 169 | 15806 ± 3070 | 0.62x |
| LinearIllumination | 146 ± 2 | 491 ± 12 | - | 0.30x |
| LongestMaxSize | 835 ± 17 | 376 ± 2 | - | 2.22x |
| MotionBlur | 1489 ± 22 | 63 ± 1 | - | 23.66x |
| Normalize | 386 ± 4 | 1402 ± 64 | 795 ± 22 | 0.28x |
| OpticalDistortion | 466 ± 4 | 157 ± 4 | - | 2.97x |
| Pad | 8573 ± 797 | - | 9112 ± 704 | 0.94x |
| Perspective | 581 ± 7 | 149 ± 1 | 129 ± 2 | 3.91x |
| PlasmaBrightness | 86 ± 0 | 24 ± 1 | - | 3.55x |
| PlasmaContrast | 69 ± 1 | 24 ± 1 | - | 2.85x |
| PlasmaShadow | 127 ± 1 | 224 ± 2 | - | 0.57x |
| Posterize | 4088 ± 111 | 317 ± 16 | 12018 ± 1989 | 0.34x |
| RandomCrop128 | 47928 ± 912 | 2566 ± 75 | 124539 ± 2345 | 0.38x |
| RandomGamma | 4161 ± 170 | 83 ± 0 | - | 50.43x |
| RandomResizedCrop | 970 ± 8 | 309 ± 2 | 297 ± 3 | 3.14x |
| Resize | 744 ± 6 | 297 ± 3 | 194 ± 1 | 2.51x |
| Rotate | 1729 ± 53 | 172 ± 1 | 152 ± 10 | 10.06x |
| Sharpen | 723 ± 3 | 140 ± 6 | - | 5.16x |
| Shear | 658 ± 6 | 250 ± 2 | 163 ± 6 | 2.63x |
| SmallestMaxSize | 583 ± 7 | 187 ± 3 | - | 3.12x |
| Solarize | 4048 ± 129 | 339 ± 4 | 456 ± 11 | 8.88x |
| ThinPlateSpline | 79 ± 2 | 62 ± 0 | - | 1.27x |
| VerticalFlip | 8577 ± 77 | 2296 ± 118 | 15409 ± 890 | 0.56x |
The video benchmarks compare CPU-based processing (AlbumentationsX) with GPU-accelerated processing (Kornia) for video transformations. The benchmarks use the UCF101 dataset, which contains realistic videos from 101 action categories.
| Transform | albumentationsx (video) 2.0.20 [vid/s] | kornia (video) 0.8.0 [vid/s] | torchvision (video) 0.21.0 [vid/s] | Speedup (albx/fastest other) |
|---|---|---|---|---|
| Affine | 17 ± 1 | 21 ± 0 | 453 ± 0 | 0.04x |
| AutoContrast | 13 ± 1 | 21 ± 0 | 578 ± 17 | 0.02x |
| Blur | 52 ± 4 | 21 ± 0 | - | 2.53x |
| Brightness | 58 ± 3 | 22 ± 0 | 756 ± 435 | 0.08x |
| CenterCrop128 | 574 ± 7 | 70 ± 1 | 1133 ± 235 | 0.51x |
| ChannelDropout | 66 ± 2 | 22 ± 0 | - | 3.02x |
| ChannelShuffle | 47 ± 3 | 20 ± 0 | 958 ± 0 | 0.05x |
| ColorJitter | 10 ± 1 | 19 ± 0 | 69 ± 0 | 0.15x |
| Contrast | 50 ± 7 | 22 ± 0 | 547 ± 13 | 0.09x |
| CornerIllumination | 5 ± 0 | 3 ± 0 | - | 2.10x |
| Elastic | 5 ± 0 | - | 127 ± 1 | 0.04x |
| Equalize | 9 ± 1 | 4 ± 0 | 192 ± 1 | 0.05x |
| Erasing | 63 ± 3 | - | 255 ± 7 | 0.25x |
| GaussianBlur | 23 ± 0 | 22 ± 0 | 543 ± 11 | 0.04x |
| GaussianIllumination | 7 ± 0 | 20 ± 0 | - | 0.37x |
| GaussianNoise | 3 ± 0 | 22 ± 0 | - | 0.13x |
| Grayscale | 65 ± 4 | 22 ± 0 | 838 ± 467 | 0.08x |
| HorizontalFlip | 55 ± 1 | 22 ± 0 | 978 ± 49 | 0.06x |
| Hue | 15 ± 1 | 20 ± 0 | - | 0.77x |
| Invert | 63 ± 5 | 22 ± 0 | 843 ± 176 | 0.07x |
| LinearIllumination | 5 ± 0 | 4 ± 0 | - | 1.23x |
| MedianBlur | 18 ± 0 | 8 ± 0 | - | 2.13x |
| Normalize | 12 ± 1 | 22 ± 0 | 461 ± 0 | 0.03x |
| Pad | 59 ± 3 | - | 760 ± 338 | 0.08x |
| Perspective | 15 ± 0 | - | 435 ± 0 | 0.03x |
| PlankianJitter | 21 ± 3 | 11 ± 0 | - | 1.92x |
| PlasmaBrightness | 1 ± 0 | 17 ± 0 | - | 0.06x |
| PlasmaContrast | 1 ± 0 | 17 ± 0 | - | 0.07x |
| PlasmaShadow | 1 ± 0 | 19 ± 0 | - | 0.07x |
| Posterize | 44 ± 6 | - | 631 ± 15 | 0.07x |
| RGBShift | 20 ± 2 | 22 ± 0 | - | 0.90x |
| Rain | 23 ± 1 | 4 ± 0 | - | 6.01x |
| RandomCrop128 | 541 ± 9 | 65 ± 0 | 1133 ± 15 | 0.48x |
| RandomGamma | 43 ± 4 | 22 ± 0 | - | 1.99x |
| RandomResizedCrop | 15 ± 1 | 6 ± 0 | 182 ± 16 | 0.08x |
| Resize | 15 ± 0 | 6 ± 0 | 140 ± 35 | 0.11x |
| Rotate | 27 ± 1 | 22 ± 0 | 534 ± 0 | 0.05x |
| SaltAndPepper | 7 ± 0 | 9 ± 0 | - | 0.78x |
| Saturation | 9 ± 1 | 37 ± 0 | - | 0.23x |
| Sharpen | 23 ± 1 | 18 ± 0 | 420 ± 9 | 0.05x |
| Solarize | 51 ± 2 | 21 ± 0 | 628 ± 6 | 0.08x |
| ThinPlateSpline | 1 ± 0 | 45 ± 1 | - | 0.03x |
| VerticalFlip | 69 ± 2 | 22 ± 0 | 978 ± 5 | 0.07x |
See the full benchmark table above for image results.
See the full benchmark table above for video results.
The benchmark automatically creates isolated virtual environments for each library and installs the necessary dependencies. Base requirements:
- Python 3.10+
- uv (for fast package installation)
- Disk space for virtual environments
- Image/video dataset in a supported format
Each library's specific dependencies are managed through separate requirements files in the requirements/ directory.
For testing and comparison purposes, you can use standard datasets:
For image benchmarks:
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
tar -xf ILSVRC2012_img_val.tar -C /path/to/your/target/directoryFor video benchmarks:
# UCF101 dataset
wget https://www.crcv.ucf.edu/data/UCF101/UCF101.rar
unrar x UCF101.rar -d /path/to/your/target/directoryWe strongly recommend running the benchmarks on your own dataset that matches your use case:
- Use images/videos that are representative of your actual workload
- Consider sizes and formats you typically work with
- Include edge cases specific to your application
This will give you more relevant performance metrics for your specific use case.
All benchmarks use the unified CLI: python -m benchmark.cli run. Use --media for image vs video, --multichannel for 9-channel image benchmarks, and --libraries to restrict to one or more libraries.
python -m benchmark.cli run -d /path/to/images -o /path/to/outputpython -m benchmark.cli run -d /path/to/images -o /path/to/output --libraries albumentationsx
python -m benchmark.cli run -d /path/to/images -o /path/to/output --libraries torchvision
python -m benchmark.cli run -d /path/to/images -o /path/to/output --libraries korniapython -m benchmark.cli run -d /path/to/images -o /path/to/output --multichannelpython -m benchmark.cli run -d /path/to/images -o /path/to/output --multichannel --libraries albumentationsx
python -m benchmark.cli run -d /path/to/images -o /path/to/output --multichannel --libraries torchvision
python -m benchmark.cli run -d /path/to/images -o /path/to/output --multichannel --libraries korniapython -m benchmark.cli run -d /path/to/videos -o /path/to/output --media videopython -m benchmark.cli run -d /path/to/videos -o /path/to/output --media video --libraries albumentationsx
python -m benchmark.cli run -d /path/to/videos -o /path/to/output --media video --libraries torchvision
python -m benchmark.cli run -d /path/to/videos -o /path/to/output --media video --libraries korniaAfter running benchmarks, update the README tables with:
./tools/update_docs.sh
# Or with custom result dirs:
./tools/update_docs.sh --image-results output/ --video-results output_videos/To benchmark transforms, create a Python file defining LIBRARY and CUSTOM_TRANSFORMS:
# my_transforms.py
import albumentations as A
# Specify the library
LIBRARY = "albumentationsx"
CUSTOM_TRANSFORMS = [
# Test different parameters of the same transform
A.ToGray(method="weighted_average", p=1),
A.ToGray(method="pca", p=1),
# Different noise levels
A.GaussNoise(var_limit=(10.0, 50.0), p=1),
A.GaussNoise(var_limit=(100.0, 200.0), p=1),
# Any other transforms...
A.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.2, p=1),
]Then run:
python -m benchmark.cli run -d /path/to/videos -o output/ --media video --spec my_transforms.pyThe results will show each transform with all its parameters:
ToGray(method=weighted_average, p=1)ToGray(method=pca, p=1)GaussNoise(var_limit=(10.0, 50.0), mean=0, p=1, per_channel=True)
See examples/custom_video_specs_template.py and example_direct_transforms.py for more examples.
To analyze parametric results:
python tools/analyze_parametric_results.py parametric_results.jsonThis will show:
- Best and worst configurations for each transform
- Performance differences between parameter choices
- Optimal settings for your use case
The benchmark methodology is designed to ensure fair and reproducible comparisons:
- Data Loading: Data is loaded using library-specific loaders to ensure optimal format compatibility
- Warmup Phase: Adaptive warmup until performance variance stabilizes
- Measurement Phase: Multiple runs with statistical analysis
- Environment Control: Consistent thread settings and hardware utilization
Contributions are welcome! If you'd like to add support for a new library, improve the benchmarking methodology, or fix issues, please submit a pull request.
When contributing, please:
- Follow the existing code style
- Add tests for new functionality
- Update documentation as needed
- Ensure all tests pass