Skip to content

Merging pages causes corrupt output (Filter FlateDecode is set, but content is unencoded) #3236

Open
@larsga

Description

@larsga

I'm stamping a template onto an existing document. The existing document was also produced by PyPDF, but by a different organization. The input PDF has issues, according to different tools I've used to look at it.

I'm reading the input pages one by one, then stamping onto each a page from a different PDF. The fourth page is displayed as blank, and the reason is that the PDF for that page is corrupt. If I skip the first three pages the fourth page is OK.

Environment

I'm using Python 3.12.3 on macOS-12.7.1-x86_64-i386-64bit. PyPDF is installed via pip:

pypdf==5.4.0, crypt_provider=('cryptography', '42.0.8'), PIL=10.3.0

Code + PDF

This reproduces the problem:

import pypdf

origin = pypdf.PdfReader('1.pdf')
template = pypdf.PdfReader('MinimalJob.niso.xml.stamp.pdf')
writer = pypdf.PdfWriter()

for ix in range(len(origin.pages)):
    page = origin.get_page(ix)
    stamp = template.get_page(1)
    page.merge_page(stamp, False, True)
    writer.add_page(page)

writer.write('out.pdf')

The two input files are attached. I'm checking for permission to add those to your tests. Will come back to you on that.

1.pdf

MinimalJob.niso.xml.stamp.pdf

Traceback

There is no traceback. PyPDF is perfectly happy to output the file, but if I try to read that file with MuPDF I get:

MuPDF error: library error: zlib error: incorrect header check

A PDF comparison tool (Antenna House Regression Test Suite) chokes on the output PDF.

MacOS Preview displays the last page as empty, because of the corruption issue.

pdftk refuses to process the file, because it finds errors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    PdfWriterThe PdfWriter component is affectedis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions