support palette expansion, CIE L*a*b*, transparency mask, and lzw end…#386
support palette expansion, CIE L*a*b*, transparency mask, and lzw end…#386aarshivv wants to merge 3 commits into
Conversation
… marker tolerance Add expand_palette_to_rgb() for converting palette indices to RGB using the color map (1/2/4/8/16-bit indices, row-aligned sub-byte). Accept CIE L*a*b* (photometric=8) as Lab(8) for 8-bit 3-channel images. Re-bias a/b from signed to unsigned during expand_chunk so callers handle CIELab and ICCLab the same way. Treat TransparencyMask as grayscale instead of erroring. Handle missing LZW end-of-information code gracefully when output was already produced.
|
Too much in one PR. Neither end-marker tolerance nor Lab come with tests, it does not attempt to answer the question for lab of heterogeneous SampleType (like at least a link to a reference or so), and I don't get what gap is being closed (take LinkedIn language to LinkedIn) in palette expansion—it's not accessing internal information here, simple transformations (e.g. merging planes) have been implemented outside already and the implementation just isn't particularly good either. |
|
Fair points, thanks, I will split this up and add tests. For Lab: the re-biasing follows the TIFF6 spec Section 23 (pp 110-115) and the Adobe TIFF Technical Notes (2002). CIELab (photometric=8) stores a/b as signed int8 (-128..127), ICCLab (photometric=9) stores them as unsigned with +128 bias. The byte-level conversion is just wrapping_add(128). Refs:
The heterogeneous SampleType concern from the existing comment is valid for the general case, but for 8-bit the signed/unsigned distinction is purely interpretive — the bytes are the same size. The re-biasing converts CIELab encoding to ICCLab encoding so callers don't need to care which variant it was. On palette — you're right, it doesn't touch anything private. Happy to drop it. Callers can combine color_map() + read_image() themselves. Want me to split into per-feature PRs, or just trim this one down to the parts that make sense together? |
|
The reference to CieLab is clear, what's not clear is how to interpret |
|
CIELab: I don't have a good answer for the SampleFormat interaction and no test files to validate against. For what it's worth, libtiff doesn't check SampleFormat for CIELab either (tif_getimage.c), but that's not enough to justify the maintenance risk here. Palette: agreed, callers already have color_map() + read_image(), nothing here needs internal access. I'll trim this down to TransparencyMask + LZW tolerance with tests for both. Will update shortly |
|
Re: Lwz tolerance, we're already using weezl's mode that yields on a full buffer fill; I seem to remember that any remaining decompression stalls indicate a short stream that did not write all required samples. That should be checked with the PR since we probably do want to error in this case. We had discussed some recovery for partially bad files and this would fit the bill but ignoring missing data here should not be the default mode imo. |
|
Got it, I'll keep CIELab in with code comments linking to libtiff's handling. Will use the gitlab docs mirror for stable URLs. On LZW — to clarify what the change actually does: it only affects the case where the last batch of valid decompressed bytes and NoProgress happen in the same read() call. Without it, those bytes are thrown away. read_exact in expand_chunk still enforces the full expected byte count downstream, so a genuinely short stream still errors — read_exact calls read() again, gets 0 (NoProgress with consumed_out == 0), and returns UnexpectedEof as before. We hit this with a batch of scanned TIFFs where a bunch of LZW streams had all pixel data present but no end-of-information code. Does that change your concern, or would you still rather keep LZW strict and handle this kind of recovery as a separate discussion? Either way, the PR will have TransparencyMask + CIELab + tests. |
|
The fault seems to lie with using |
|
That makes sense — patching read() is the wrong layer for this. I'll drop the LZW change. PR will be just TransparencyMask + CIELab with tests for both. For reference, libtiff had the same problem — missing EOI markers caused crashes fixed in 4.5.1 (https://libtiff.gitlab.io/libtiff/releases/v4.5.1.html). Our production pipeline processes lot of document TIFFs — mostly scanned/faxed pages originally handled through Aspose (Java). We've seen the missing EOI issue across a good number of files in that dataset. We're working around it in our own fallback layer for now, but might be worth a separate issue when the decoder abstraction gets reworked. |
|
@197g , can you check the PR and let me know if anything needs adjusting. |
Some TIFF encoders (e.g., UK Environment Agency LiDAR DTM tiles) omit the LZW end-of-information code. Previously `LZWReader::read` returned UnexpectedEof on `NoProgress` even when that same call had already decoded valid pixel bytes — discarding them and failing the whole image. This change returns any output produced in a `NoProgress` call. A subsequent read that produces zero output still errors, so a genuinely truncated stream is still caught by `read_exact`. Matches the LZW portion of image-rs#386 (dropped upstream pending a wider decoder-API rework). Carried by madri until upstream lands an equivalent fix.
Adds support for three previously unsupported photometric interpretations and makes LZW decoding more tolerant of non-compliant streams.
Palette (RGBPalette, photometric=3)
New public method
Decoder::expand_palette_to_rgb()that converts raw palette indices returned byread_image()into 8-bit RGB using the image's color map. Handles 1, 2, 4, 8, and 16-bit indices with proper row-aligned sub-byte unpacking.The color map and
ColorType::Palettewere already wired up — this fills in the missing expansion step that callers need to get usable pixel data.CIE Lab* (photometric=8)
8-bit 3-channel CIE Lab* images now decode as
Lab(8). The signed a/b channels are re-biased to unsigned (+128) during chunk expansion, matching the ICCLab encoding (photometric=9). Callers can handle both the same way.Only 8-bit is supported. Other depths still return
UnsupportedErrorsince the spec gets complicated with the Decode tag for ITU Lab*.TransparencyMask (photometric=4)
Treated as grayscale (
Gray(bits_per_sample)) instead of returningUnsupportedError. It's a single-channel mask — grayscale is the natural mapping.LZW end marker tolerance
Some TIFF encoders don't write the LZW end-of-information code. Previously this caused an
UnexpectedEoferror even when all pixel data had been successfully decoded. Now ifNoProgressis reported but output was already produced in that call, it's treated as EOF.Testing
All existing tests pass (301 tests,
--all-features). No new dependencies.Fixes #202
Related to #217, #364