Summary
OpenEXR supports different tile sizes in its file's data layout. A smaller tile size provides better granularity of the image data, facilitate localized pixel read/write with less overhead of unwanted pixels. A larger tile size, when compression is involved, provides larger, uninterrupted encoding & decoding contexts and therefore facilitates better performance when the entire image is being read or written.
It is well-expected that performance of read/write entire images would be lower when the tile size is small, compared to larger tiles or scanline data layouts. (For the scanline data layout, each compression encoder can decide on their own how much data it takes in as one continuous encoding context). But it is unclear how much performance degradation would be introduced by shrinking the tile size, and it is also unclear that how much the compression ratio would suffer when tile sizes are reduced.
Here, I provide an easy-to-reproduce experiment to measure these factors. We demonstrate that the performance degradation of whole image read/write is linearly correlated to the tile's diameter, and the compression ratio also took a hit when the tile size is small. See the "Results" section.
Experiment setup
One should have both oiio and exr 's binary tools under the current PATH of the command line. Specifically, the following tools are used in this experiment:
oiiotool
exrmaketiled
exrinfo
exrmetrics
The following experiments are done using a fresh build of openEXR 3.4.4 and OpenImageIO 3.1.8.
First, create a 4K half-RGBA exr image with scanline data layout and no compression. The image's content is white noise. Let's also verify the file's data layout that it is exactly what we would like.
$ oiiotool --pattern noise:type=uniform:min=0:max=1 4096x2160 4 -d half --compression none -o whitenoise_4k.exr
$ exrinfo -v whitenoise_4k.exr
File 'whitenoise_4k.exr': ver 2 flags longnames
parts: 1
part 1: <single>
Software: string 'OpenImageIO 3.1.8.0 : 1FC23AF3783083E5519C95F8D3DD1B975C666586'
capDate: string '2026:01:09 11:51:37'
channels: chlist 4 channels
'A': half samp 1 1
'B': half samp 1 1
'G': half samp 1 1
'R': half samp 1 1
compression: compression 'none' (0x00)
dataWindow: box2i [ 0, 0 - 4095 2159 ] 4096 x 2160
displayWindow: box2i [ 0, 0 - 4095 2159 ] 4096 x 2160
lineOrder: lineOrder 0 (increasing)
pixelAspectRatio: float 1
screenWindowCenter: v2f [ 0, 0 ]
screenWindowWidth: float 1
The next step is using the following bash script to convert this scanline image into all the tile sizes we would like to test -- that is, from 8 to 1024, on every power of 2.
#!/bin/bash
for size in 8 16 32 64 128 256 512 1024; do
echo "Making tile size: $size"
exrmaketiled -t "$size" "$size" -z none "whitenoise_4k.exr" "whitenoise_tiled${size}.exr"
done
Note that all files generated have no compression, so they're all approximately the same sime of around 70Mbs (2bytes per sample, 4 sample per pixle, 4096x2160 pixles).
Then, we run exrmetrics on all the result files, including the original whitenoise_4k.exr which represents the scanline data layout.
#!/bin/bash
for size in 8 16 32 64 128 256 512 1024; do
echo "Benchmarking tile size: $size"
echo "Starting time:"
date +"%Y-%m-%d %H:%M:%S.%3N"
exrmetrics -z all "whitenoise_tiled${size}.exr" > "result_tile${size}.json"
done
echo "Benchmarking scanline exr"
echo "Starting time:"
date +"%Y-%m-%d %H:%M:%S.%3N"
exrmetrics -z all "whitenoise_4k.exr" > "result_scanline.json"
On my AMD Ryzen 5 PRO 4650U laptop under WSL2 in Windows 11, the output timing is the following. Note that this experiment is single thread.
Benchmarking tile size: 8
Starting time:
2026-01-09 14:35:58.220
Benchmarking tile size: 16
Starting time:
2026-01-09 14:54:02.882
Benchmarking tile size: 32
Starting time:
2026-01-09 14:58:49.307
Benchmarking tile size: 64
Starting time:
2026-01-09 15:00:25.546
Benchmarking tile size: 128
Starting time:
2026-01-09 15:01:09.968
Benchmarking tile size: 256
Starting time:
2026-01-09 15:01:38.774
Benchmarking tile size: 512
Starting time:
2026-01-09 15:02:03.815
Benchmarking tile size: 1024
Starting time:
2026-01-09 15:02:25.136
Benchmarking scanline exr
Starting time:
2026-01-09 15:02:45.905
One can see that, without going into details, that the tile8 benchmark run took a whooping 18 minutes to finish! Oops!
Results
I used some jupyter notebook to collage all json files into one csv file, which is also uploaded in this issue page. Then, I summoned some Google Gemini AI power and worked with the bots for the following visualization.
Read timing
First, let's plot the average "read time" of all compression types against the tile size.
It almost look like logarithmic to me. Let's change the vertical axis into log scale.
Now, the horizontal axis are power of 2s in terms of the tile's diameter (8x8, 16x16, 32x32, etc). A straight line in log scale means the performance curve in general is still linear to the tile's diameter, and sublinear to (linear to the sqrt of) the number of pixel inside the tile.
Next, let's break down the performance by compression types:
Also changing the vertical axis to be logarithmic:
There are some behaviours amongst the compression types, but in general everyone still follows the previous observation. One can see that these lines are mostly flat, and evenly spaced on a log scale chart.
The most curious part here, however, is that the none compression type still follows the performance degradation curve against the tile size. There are no codec involved when there is not even a compression, but why the read-whole-image operation still slows down so much when the tile size getting smaller? This seems to indicate that the performance degradation for smaller tile sizes at least have a factor in the exr framework itself, instead all to blame on the compression codecs.
Compression ratio
Now let's take a look at compression ratio. In these plots, the compression ratio is defined as the uncompressed size divided by the compressed size, so the compression is more efficient (better) when the ratio is larger. Here is the average compression ratio plotted against tile size:
Here is a detail view of all compression types' compression ratio performance against tile size:
There are quite some impressive jiggles here and there that may worth technical deep dives, if some compression types are of special interest of a particular studio or project. There also seems to be a bug somewhere in the pipeline that did not report rle's compression ratio and thus rle become a flat 1.0 line on the bottom, same as none compression. But one general observation is that tile size does significantly impact compression ratio performance, on average to almost 20%.
Even the b44 compression, which is supposed to be fixed rate, end up being slightly worse when there are many tiles due to the overhead.
Take aways
The performance degradations are expected, but was larger than what I thought. In particular, the read performance of small-tiled images follows the diameter of the tile for all compression types, or even when there is no compression involved. 18+ mins to finish exrmetrics single run on a 70MB 4K image sounds quite slow for a modern standard, even considering this experiment is single thread. The compression ratio, on average, can also take a 20% hit when the tile size is small (e.g. 8x8).
My result csv file
Note that rle's output size seems to all be identical to the original uncompressed file size. This might be a bug in the pipeline or it's because the source is white noise, so Run Length Encoding performs particularly bad on it.
exr_tiled_exrmetric_results.csv
Summary
OpenEXR supports different tile sizes in its file's data layout. A smaller tile size provides better granularity of the image data, facilitate localized pixel read/write with less overhead of unwanted pixels. A larger tile size, when compression is involved, provides larger, uninterrupted encoding & decoding contexts and therefore facilitates better performance when the entire image is being read or written.
It is well-expected that performance of read/write entire images would be lower when the tile size is small, compared to larger tiles or scanline data layouts. (For the scanline data layout, each compression encoder can decide on their own how much data it takes in as one continuous encoding context). But it is unclear how much performance degradation would be introduced by shrinking the tile size, and it is also unclear that how much the compression ratio would suffer when tile sizes are reduced.
Here, I provide an easy-to-reproduce experiment to measure these factors. We demonstrate that the performance degradation of whole image read/write is linearly correlated to the tile's diameter, and the compression ratio also took a hit when the tile size is small. See the "Results" section.
Experiment setup
One should have both
oiioandexr's binary tools under the currentPATHof the command line. Specifically, the following tools are used in this experiment:oiiotoolexrmaketiledexrinfoexrmetricsThe following experiments are done using a fresh build of openEXR 3.4.4 and OpenImageIO 3.1.8.
First, create a 4K half-RGBA exr image with scanline data layout and no compression. The image's content is white noise. Let's also verify the file's data layout that it is exactly what we would like.
The next step is using the following bash script to convert this scanline image into all the tile sizes we would like to test -- that is, from 8 to 1024, on every power of 2.
Note that all files generated have no compression, so they're all approximately the same sime of around 70Mbs (2bytes per sample, 4 sample per pixle, 4096x2160 pixles).
Then, we run
exrmetricson all the result files, including the originalwhitenoise_4k.exrwhich represents the scanline data layout.On my AMD Ryzen 5 PRO 4650U laptop under WSL2 in Windows 11, the output timing is the following. Note that this experiment is single thread.
One can see that, without going into details, that the tile8 benchmark run took a whooping 18 minutes to finish! Oops!
Results
I used some jupyter notebook to collage all
jsonfiles into one csv file, which is also uploaded in this issue page. Then, I summoned some Google Gemini AI power and worked with the bots for the following visualization.Read timing
First, let's plot the average "read time" of all compression types against the tile size.
It almost look like logarithmic to me. Let's change the vertical axis into log scale.
Now, the horizontal axis are power of 2s in terms of the tile's diameter (8x8, 16x16, 32x32, etc). A straight line in log scale means the performance curve in general is still linear to the tile's diameter, and sublinear to (linear to the
sqrtof) the number of pixel inside the tile.Next, let's break down the performance by compression types:
Also changing the vertical axis to be logarithmic:
There are some behaviours amongst the compression types, but in general everyone still follows the previous observation. One can see that these lines are mostly flat, and evenly spaced on a log scale chart.
The most curious part here, however, is that the
nonecompression type still follows the performance degradation curve against the tile size. There are no codec involved when there is not even a compression, but why the read-whole-image operation still slows down so much when the tile size getting smaller? This seems to indicate that the performance degradation for smaller tile sizes at least have a factor in the exr framework itself, instead all to blame on the compression codecs.Compression ratio
Now let's take a look at compression ratio. In these plots, the compression ratio is defined as the uncompressed size divided by the compressed size, so the compression is more efficient (better) when the ratio is larger. Here is the average compression ratio plotted against tile size:
Here is a detail view of all compression types' compression ratio performance against tile size:
There are quite some impressive jiggles here and there that may worth technical deep dives, if some compression types are of special interest of a particular studio or project. There also seems to be a bug somewhere in the pipeline that did not report
rle's compression ratio and thusrlebecome a flat1.0line on the bottom, same asnonecompression. But one general observation is that tile size does significantly impact compression ratio performance, on average to almost 20%.Even the
b44compression, which is supposed to be fixed rate, end up being slightly worse when there are many tiles due to the overhead.Take aways
The performance degradations are expected, but was larger than what I thought. In particular, the read performance of small-tiled images follows the diameter of the tile for all compression types, or even when there is no compression involved. 18+ mins to finish
exrmetricssingle run on a 70MB 4K image sounds quite slow for a modern standard, even considering this experiment is single thread. The compression ratio, on average, can also take a 20% hit when the tile size is small (e.g. 8x8).My result csv file
Note that
rle's output size seems to all be identical to the original uncompressed file size. This might be a bug in the pipeline or it's because the source is white noise, so Run Length Encoding performs particularly bad on it.exr_tiled_exrmetric_results.csv