Skip to content

support palette expansion, CIE L*a*b*, transparency mask, and lzw end…#386

Open
aarshivv wants to merge 3 commits into
image-rs:mainfrom
aarshivv:palette-cielab-transparency-lzw
Open

support palette expansion, CIE L*a*b*, transparency mask, and lzw end…#386
aarshivv wants to merge 3 commits into
image-rs:mainfrom
aarshivv:palette-cielab-transparency-lzw

Conversation

@aarshivv
Copy link
Copy Markdown

Adds support for three previously unsupported photometric interpretations and makes LZW decoding more tolerant of non-compliant streams.

Palette (RGBPalette, photometric=3)

New public method Decoder::expand_palette_to_rgb() that converts raw palette indices returned by read_image() into 8-bit RGB using the image's color map. Handles 1, 2, 4, 8, and 16-bit indices with proper row-aligned sub-byte unpacking.

The color map and ColorType::Palette were already wired up — this fills in the missing expansion step that callers need to get usable pixel data.

CIE Lab* (photometric=8)

8-bit 3-channel CIE Lab* images now decode as Lab(8). The signed a/b channels are re-biased to unsigned (+128) during chunk expansion, matching the ICCLab encoding (photometric=9). Callers can handle both the same way.

Only 8-bit is supported. Other depths still return UnsupportedError since the spec gets complicated with the Decode tag for ITU Lab*.

TransparencyMask (photometric=4)

Treated as grayscale (Gray(bits_per_sample)) instead of returning UnsupportedError. It's a single-channel mask — grayscale is the natural mapping.

LZW end marker tolerance

Some TIFF encoders don't write the LZW end-of-information code. Previously this caused an UnexpectedEof error even when all pixel data had been successfully decoded. Now if NoProgress is reported but output was already produced in that call, it's treated as EOF.

Testing

All existing tests pass (301 tests, --all-features). No new dependencies.

Fixes #202
Related to #217, #364

… marker tolerance

Add expand_palette_to_rgb() for converting palette indices to RGB using the color map (1/2/4/8/16-bit indices, row-aligned sub-byte).

Accept CIE L*a*b* (photometric=8) as Lab(8) for 8-bit 3-channel images. Re-bias a/b from signed to unsigned during expand_chunk so callers handle CIELab and ICCLab the same way.

Treat TransparencyMask as grayscale instead of erroring.

Handle missing LZW end-of-information code gracefully when output was already produced.
@197g
Copy link
Copy Markdown
Member

197g commented Apr 11, 2026

Too much in one PR. Neither end-marker tolerance nor Lab come with tests, it does not attempt to answer the question for lab of heterogeneous SampleType (like at least a link to a reference or so), and I don't get what gap is being closed (take LinkedIn language to LinkedIn) in palette expansion—it's not accessing internal information here, simple transformations (e.g. merging planes) have been implemented outside already and the implementation just isn't particularly good either.

@aarshivv
Copy link
Copy Markdown
Author

Fair points, thanks, I will split this up and add tests.

For Lab: the re-biasing follows the TIFF6 spec Section 23 (pp 110-115) and the Adobe TIFF Technical Notes (2002). CIELab (photometric=8) stores a/b as signed int8 (-128..127), ICCLab (photometric=9) stores them as unsigned with +128 bias. The byte-level conversion is just wrapping_add(128).

Refs:

The heterogeneous SampleType concern from the existing comment is valid for the general case, but for 8-bit the signed/unsigned distinction is purely interpretive — the bytes are the same size. The re-biasing converts CIELab encoding to ICCLab encoding so callers don't need to care which variant it was.

On palette — you're right, it doesn't touch anything private. Happy to drop it. Callers can combine color_map() + read_image() themselves.

Want me to split into per-feature PRs, or just trim this one down to the parts that make sense together?

@197g
Copy link
Copy Markdown
Member

197g commented Apr 11, 2026

The reference to CieLab is clear, what's not clear is how to interpret SampleType if it occurs in the file (or if not, since the 'default' by the spec is UInt which it very clearly is not). All the samples that I have just omit it entirely for that color type yet for 16-bit color that seems odd (f16 and u/i16 would both be reasonable to me). Sometimes Tiff features are quite ad-hoc and non-orthogonal (LogLuv compression for instance) but the maintenance burden of that is unclear. Since heterogeneous sample types are not well-supported anywhere I know of, a source recommending that the tags should not be used in combination with CIELab would be enough reasoning for me but I'd like to read some (preferably authoritative) resource about the meaning of the combination instead of ignoring its existence and having to do weird backport support when inevitably someone with a file with that tag gets data in the wrong color type (or, uses the bytes API and just gets garbage data very silently).

@aarshivv
Copy link
Copy Markdown
Author

CIELab: I don't have a good answer for the SampleFormat interaction and no test files to validate against. For what it's worth, libtiff doesn't check SampleFormat for CIELab either (tif_getimage.c), but that's not enough to justify the maintenance risk here.

Palette: agreed, callers already have color_map() + read_image(), nothing here needs internal access.

I'll trim this down to TransparencyMask + LZW tolerance with tests for both. Will update shortly

@197g
Copy link
Copy Markdown
Member

197g commented Apr 11, 2026

libtiff is better than nothing. That part of the PR augmented with code-comments referencing the behavior through stable links or docs would be nice.

Re: Lwz tolerance, we're already using weezl's mode that yields on a full buffer fill; I seem to remember that any remaining decompression stalls indicate a short stream that did not write all required samples. That should be checked with the PR since we probably do want to error in this case. We had discussed some recovery for partially bad files and this would fit the bill but ignoring missing data here should not be the default mode imo.

@aarshivv
Copy link
Copy Markdown
Author

Got it, I'll keep CIELab in with code comments linking to libtiff's handling. Will use the gitlab docs mirror for stable URLs.

On LZW — to clarify what the change actually does: it only affects the case where the last batch of valid decompressed bytes and NoProgress happen in the same read() call. Without it, those bytes are thrown away. read_exact in expand_chunk still enforces the full expected byte count downstream, so a genuinely short stream still errors — read_exact calls read() again, gets 0 (NoProgress with consumed_out == 0), and returns UnexpectedEof as before.

We hit this with a batch of scanned TIFFs where a bunch of LZW streams had all pixel data present but no end-of-information code.
The files were failing to decode despite the image data being complete.

Does that change your concern, or would you still rather keep LZW strict and handle this kind of recovery as a separate discussion?

Either way, the PR will have TransparencyMask + CIELab + tests.
LZW depending on your call.

@197g
Copy link
Copy Markdown
Member

197g commented Apr 11, 2026

The fault seems to lie with using Read for this, no? There are a number of other reasons for getting rid of this chosen abstraction; for instance transparently expanding subsampling is not reasonable to do with it either; it were faster if the API could re-initialize the decoder state instead of throwing it away and constructing a new one; we should be able to rely on the out buffer being in whole units of the sample type; almost all correct usage uses read_exact not read anyways..

@aarshivv
Copy link
Copy Markdown
Author

That makes sense — patching read() is the wrong layer for this.

I'll drop the LZW change. PR will be just TransparencyMask + CIELab with tests for both.

For reference, libtiff had the same problem — missing EOI markers caused crashes fixed in 4.5.1 (https://libtiff.gitlab.io/libtiff/releases/v4.5.1.html).
Their tif_lzw.c has a LZW_CHECKEOS path for this

Our production pipeline processes lot of document TIFFs — mostly scanned/faxed pages originally handled through Aspose (Java). We've seen the missing EOI issue across a good number of files in that dataset. We're working around it in our own fallback layer for now, but might be worth a separate issue when the decoder abstraction gets reworked.

@aarshivv
Copy link
Copy Markdown
Author

@197g , can you check the PR and let me know if anything needs adjusting.

karijd added a commit to unlock-ceo/image-tiff that referenced this pull request Apr 14, 2026
Some TIFF encoders (e.g., UK Environment Agency LiDAR DTM tiles)
omit the LZW end-of-information code. Previously `LZWReader::read`
returned UnexpectedEof on `NoProgress` even when that same call had
already decoded valid pixel bytes — discarding them and failing the
whole image.

This change returns any output produced in a `NoProgress` call.
A subsequent read that produces zero output still errors, so a
genuinely truncated stream is still caught by `read_exact`.

Matches the LZW portion of image-rs#386 (dropped upstream
pending a wider decoder-API rework). Carried by madri until upstream
lands an equivalent fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Grey scale image decoder color type : InterpretationWithBits(RGBPalette, [8])

2 participants