Discussion: compressed block table sizes

Hello!

I just wanted to share some data and thoughts that stem from my experiments with using different tables sizes in `CompressedBlock`.  This is very exploratory/preliminary and I hope that we can discuss various ideas and directions before we commit to any particular solution (a random internet [blogpost](https://www.lesswrong.com/posts/uHYYA32CKgKT3FagE/hold-off-on-proposing-solutions) tells me that this may result in a better outcome :-P).

One experiment I've done, is using the default table sizes (4096/512) when the estimated image bytes (width x height x samples-per-pixel x bits-per-sample / 8) is above a certain threshold, but using smaller tables (512/128) otherwise (see the Chromium CL [here](https://chromium-review.googlesource.com/c/chromium/src/+/6130905)).  The results I've got (see section "Measurements 2024-12-30 - 2025-01-02" in [my doc](https://docs.google.com/document/d/16P0I_4AglenbkU1IBM1Qfe1zAQrMXa5NoEazZqOTHZE/edit?tab=t.0#heading=h.ndj6fuoz1j90)) have been positive, but the magnitude of the improvement has been disappointing to me. 

The results above were also surprisingly flat - I've expected that small images will significantly benefit from small tables (and big images from big/default tables).  One hypothesis that could explain that is that image size is not a good predictor for the size of zlib compressed blocks - e.g. maybe some big images can use lots of relatively short compressed-zlib-blocks.  So I tried another experiment to gather this kind of data on my [old 2023 corpus](https://github.com/image-rs/image-png/discussions/416#discussioncomment-7436871) of ~1650 PNG images from top 500 websites (see also the tool bits [here](https://github.com/image-rs/fdeflate/commit/2a36428f6bb2178865ebdf498f8a4c5595f02606) and [here](https://github.com/image-rs/image-png/commit/f87aa4e74b2ad22c7de374a857732f4dbfb85479)) - the results can be found in [a spreadsheet here](https://docs.google.com/spreadsheets/d/1KxWVDIVYmRetvoa7rJ2apadVWZKtH11F2qj8dE2GjM4/edit?usp=sharing).  I think the following bits of data are interesting:

- There is quite a wide range of compressed block sizes.  Even when looking at 100 biggest images, the block sizes range from ~3kB (at 10%-ile) to ~44kB (at 90%-ile).  
- Some images use a mix of compressed blocks with 1) fixed/default-symbol-encoding and 2) custom Huffman trees.
- Some images use uncompressed blocks

I also think that it is a bit icky that in my experiments the public API of `fdeflate` "leaks" the implementation detail of Huffman table sizes.  One idea to avoid this is to:

- Decouple `CompressedBlock` and `fn read_compressed` from `Decompressor`, so that `Decompressor` can _internally_ choose to use small or big table sizes (with dynamic dispatch via something like `Box<dyn CompressedBlockRead[er]>`).  I think that moving `fn read_compressed` to `impl...CompressedBlock` can be made easier by packaging/encapsulating bits of `Decompressor` (to make it easier to pass them as `&mut` reference to `fn read_compressed`) - for example maybe `buffer`+`nbits` can become fields of `BitBuffer` `struct` and `queued_rle`+`queued_backref` can become variants of `enum QueuedOutput`. 
- Add `fdeflate::Decompressor::set_output_size_estimate(estimate: usize)` which can be used to decide the _initial_ table sizes.  (Note that `png::ZlibStream` already has such estimate available - it calls it `max_total_output`.)
- Track the size of the last compressed blocks and switch the table sizes if that size is below/above a certain threshold.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discussion: compressed block table sizes #45

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discussion: compressed block table sizes #45

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions