Skip to content

Commit 45c2bd0

Browse files
committed
Render inline images (BI/ID/EI) and clean up FQN code-style issues
Implements the inline-image roadmap item: instead of pre-stripping inline image blocks, the preprocessor now promotes each one into a synthetic Image XObject and substitutes a `/__inline_image__N Do` invocation into the content stream. The rest of the renderer treats it exactly like a regular Image XObject and reuses the existing buildGrayImage / buildRgbImage / buildCmykImage / ImageIO decode paths. Two framing strategies are used so the parser doesn't get confused by binary data: - For DCT / DCTDecode / JPXDecode filters, find the JPEG end-of-image marker (FFD9) instead of scanning for "EI" bounded by whitespace, since JPEG payloads routinely contain byte sequences that look like EI by accident. - For other filters (including no filter and FlateDecode), keep the whitespace-bounded EI heuristic but stop trimming "trailing whitespace" greedily -- image bytes can legitimately be 0x00 or 0x0A and the spec guarantees exactly one whitespace byte before EI. Abbreviated dict keys (/W, /H, /BPC, /CS, /F) and full names (/Width, /Height, ...) are both accepted; abbreviated colorspace values (/G, /RGB, /CMYK) and full names map to component counts. Tests: - inlineImageRendersAtCtmLocation builds a 2x2 DeviceGray inline image with a [black, white; white, black] checker, scales it 120x via a cm, and asserts the rendered page contains dark pixels in the right region. - jpegInlineImageDecodes uses PdfContentByte.addImage(image, ..., true) to embed a green JPEG as an inline image, then asserts the rendered page contains green pixels. README's status section now says inline images render, and the limitations list no longer mentions them. Also addresses Codacy's "unnecessary fully qualified name" warning on java.util.List / java.util.Set usage. The class now imports List, Set, Arrays, ByteArrayOutputStream, StandardCharsets and Rectangle2D directly instead of inlining the FQNs; 7 call sites simplified. Module test suite: 86 tests, 0 failures.
1 parent 2db7c3f commit 45c2bd0

3 files changed

Lines changed: 392 additions & 45 deletions

File tree

openpdf-renderer/README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -130,12 +130,15 @@ renderer falls back to a generic Java2D family picked by PostScript-name
130130
heuristics — glyph widths from the PDF font are still respected,
131131
but shapes are only approximate.
132132

133-
Inline images (`BI`/`ID`/`EI`) are stripped from the content stream before
134-
parsing — they aren't rendered, but they don't derail the rest of the
135-
page either. Shading (`sh`), pattern / shading colors and type 3 font glyph
136-
operators are silently ignored. Pages that rely heavily on those features
137-
may render with missing content. Adding more operators is a localized change
138-
in `OpenPdfCorePageRenderer`.
133+
Inline images (`BI`/`ID`/`EI`) are now rendered: a preprocess pass promotes
134+
each inline image into a synthetic Image XObject (with JPEG framing detected
135+
by the JPEG `FFD9` end-of-image marker when the filter is `DCTDecode` to
136+
sidestep the ambiguous whitespace-bounded `EI` heuristic), then the rest of
137+
the renderer treats it like any other XObject. Uncompressed, Flate-decoded
138+
and JPEG inline images are supported. Shading (`sh`), pattern / shading
139+
colors and type 3 font glyph operators are silently ignored. Pages that
140+
rely heavily on those features may render with missing content. Adding more
141+
operators is a localized change in `OpenPdfCorePageRenderer`.
139142

140143
For pages that need features outside this supported subset and you want
141144
pixel-perfect output today, the deprecated `PDFFile` / `PDFPage.getImage(...)`
@@ -158,9 +161,6 @@ The legacy in-tree parser still wins on real-world PDFs that exercise:
158161
no ICC profile, no UCR/BG. Anything color-managed will look noticeably
159162
wrong. Real fix: respect the ICCBased profile via `java.awt.color.ICC_Profile`.
160163
- **Pattern and shading paint** (`pattern`, `sh`). Ignored.
161-
- **Inline images.** Currently dropped; the parser-level strip keeps the
162-
rest of the page rendering. Re-implementing them on the existing raster
163-
helpers is straightforward.
164164
- **Soft masks (`SMask`) and transparency groups.** Ignored; image alpha
165165
honors `ca` only, not per-pixel masks.
166166
- **Indexed / Separation / DeviceN color spaces** for images and paths.

0 commit comments

Comments
 (0)