Skip to content

Assert fails when getting the mediabox property for certain PDFs #2991

Open
@Paethon

Description

@Paethon

Hi

We are processing quite a lot of PDFs, and from time to time we see the following assert fail on specific PDFs when trying to get the mediabox property of a page.

assert len(arr) == 4

Here is the content of page:

{'/Contents': [IndirectObject(34, 0, 131870331607200)],
 '/CropBox': [0, 0, 595, 841, 0, 0, 595, 841],
 '/MediaBox': [0, 0, 595, 841, 0, 0, 595, 841],
 '/Parent': IndirectObject(1, 0, 131870331607200),
 '/Resources': IndirectObject(5, 0, 131870331607200),
 '/Type': '/Page'}

Is this "just" a malformed PDF (it opens without problem in a wide range of pdf readers)? Unfortunately, I can't share the PDF, since it contains sensitive customer information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    is-robustness-issueFrom a users perspective, this is about robustnessneeds-example-codeThe issue needs a minimal and complete (e.g. all imports) example showing the problemneeds-pdfThe issue needs a PDF file to show the problem

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions