ROB: Handle /Pages node without /Kids during flattening#3825
Open
gaoflow wants to merge 1 commit into
Open
Conversation
A page tree node typed as /Pages but missing the /Kids entry (for example a malformed document advertising /Count 0 with no children) caused _flatten to raise a bare KeyError: '/Kids' while iterating the kids. Treat a missing /Kids as an empty array so such files report 0 pages instead of crashing. Closes py-pdf#3811
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3825 +/- ##
=======================================
Coverage 97.73% 97.73%
=======================================
Files 55 55
Lines 10417 10418 +1
Branches 1931 1931
=======================================
+ Hits 10181 10182 +1
Misses 130 130
Partials 106 106 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
stefan6419846
requested changes
Jun 2, 2026
| inherit[attr] = pages[attr] | ||
| pages_reference = getattr(pages, "indirect_reference", object()) | ||
| for page in cast(ArrayObject, pages[PagesAttributes.KIDS]): | ||
| # A malformed /Pages node may be missing /Kids (for example a page |
Collaborator
There was a problem hiding this comment.
I do not think that this comment contributes enough to be useful here.
| # A malformed /Pages node may be missing /Kids (for example a page | ||
| # tree advertising "/Count 0" without any children). Treat it as | ||
| # having no kids instead of raising a bare KeyError here (#3811). | ||
| kids = ( |
Collaborator
There was a problem hiding this comment.
This can be written in a simpler way:
Suggested change
| kids = ( | |
| kids = pages.get(PagesAttributes.KIDS, ArrayObject()) |
| if PagesAttributes.KIDS in pages | ||
| else ArrayObject() | ||
| ) | ||
| for page in cast(ArrayObject, kids): |
Collaborator
There was a problem hiding this comment.
While we are at it (and although not required here), I would recommend replacing this cast and increase the resilience here.
What I mean is that we should use an empty ArrayObject if the kids are a NullObject and raise a proper exception if we see anything different from an ArrayObject for the iteration.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reading a PDF whose page tree has a
/Pagesnode typed/Type /Pagesbut without a/Kidsentry (e.g. a malformed document advertising/Count 0and no children) raises a bareKeyError: '/Kids'instead of being handled gracefully. This is the issue reported in #3811.Cause
In
_flatten, the node type is decided like this:The existing fallback only treats a node as a single page when
/Typeis missing. When/Typeis explicitly/Pagesbut/Kidsis absent, the code still enters the/Pagesbranch and iteratespages[PagesAttributes.KIDS], which raisesKeyError.Fix
Treat a missing
/Kidsas an empty array, so a/Pagescontainer with no children simply contributes no pages.len(reader.pages)then returns0for such a document instead of crashing. Documents that do have/Kidsare unaffected (the key is still looked up exactly as before, preserving indirect-reference resolution).Reproduction
Tests
Added
test_flatten__pages_without_kids, which removes/Kidsfrom a real document's/Pagesnode, sets/Count 0, and assertslen(reader.pages) == 0. It fails withKeyError: '/Kids'onmainand passes with this change. Existing multi-page documents still report the correct page counts.Closes #3811