Description
I'm stamping a template onto an existing document. The existing document was also produced by PyPDF, but by a different organization. The input PDF has issues, according to different tools I've used to look at it.
I'm reading the input pages one by one, then stamping onto each a page from a different PDF. The fourth page is displayed as blank, and the reason is that the PDF for that page is corrupt. If I skip the first three pages the fourth page is OK.
Environment
I'm using Python 3.12.3 on macOS-12.7.1-x86_64-i386-64bit. PyPDF is installed via pip:
pypdf==5.4.0, crypt_provider=('cryptography', '42.0.8'), PIL=10.3.0
Code + PDF
This reproduces the problem:
import pypdf
origin = pypdf.PdfReader('1.pdf')
template = pypdf.PdfReader('MinimalJob.niso.xml.stamp.pdf')
writer = pypdf.PdfWriter()
for ix in range(len(origin.pages)):
page = origin.get_page(ix)
stamp = template.get_page(1)
page.merge_page(stamp, False, True)
writer.add_page(page)
writer.write('out.pdf')
The two input files are attached. I'm checking for permission to add those to your tests. Will come back to you on that.
Traceback
There is no traceback. PyPDF is perfectly happy to output the file, but if I try to read that file with MuPDF I get:
MuPDF error: library error: zlib error: incorrect header check
A PDF comparison tool (Antenna House Regression Test Suite) chokes on the output PDF.
MacOS Preview displays the last page as empty, because of the corruption issue.
pdftk refuses to process the file, because it finds errors.