Skip to content

Only last %%EOF is considered, possibly not detecting valid startxref #3238

Open
@TVR1023

Description

@TVR1023

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Windows-11-10.0.26100-SP0

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.4.0, crypt_provider=('cryptography', '44.0.2'), PIL=10.4.0

Code + PDF

This is a minimal, complete example that shows the issue:

>>> theFile = r"C:\Users\tvrom\Documents\eqbPDFChartPlus.pdf"
>>> from pypdf import PdfReader 
>>> reader = PdfReader(theFile)

Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!

eqbPDFChartPlus.pdf

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "C:\Users\tvrom\AppData\Local\Programs\Python\Python312\Lib\site-packages\pypdf\_reader.py", line 136, in __init__
    self._initialize_stream(stream)
  File "C:\Users\tvrom\AppData\Local\Programs\Python\Python312\Lib\site-packages\pypdf\_reader.py", line 158, in _initialize_stream
    self.read(stream)
  File "C:\Users\tvrom\AppData\Local\Programs\Python\Python312\Lib\site-packages\pypdf\_reader.py", line 594, in read
    startxref = self._find_startxref_pos(stream)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\tvrom\AppData\Local\Programs\Python\Python312\Lib\site-packages\pypdf\_reader.py", line 726, in _find_startxref_pos
    raise PdfReadError("startxref not found")
pypdf.errors.PdfReadError: startxref not found

Metadata

Metadata

Assignees

No one assigned

    Labels

    PdfReaderThe PdfReader component is affectedis-robustness-issueFrom a users perspective, this is about robustness

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions