ROB: Handle /Pages node without /Kids during flattening by gaoflow · Pull Request #3825 · py-pdf/pypdf

gaoflow · 2026-06-02T06:10:10Z

Summary

Reading a PDF whose page tree has a /Pages node typed /Type /Pages but without a /Kids entry (e.g. a malformed document advertising /Count 0 and no children) raises a bare KeyError: '/Kids' instead of being handled gracefully. This is the issue reported in #3811.

Cause

In _flatten, the node type is decided like this:

if PagesAttributes.TYPE in pages:
    t = cast(str, pages[PagesAttributes.TYPE])
elif PagesAttributes.KIDS not in pages:   # only reclassifies when /Type is absent
    t = "/Page"
else:
    t = "/Pages"

The existing fallback only treats a node as a single page when /Type is missing. When /Type is explicitly /Pages but /Kids is absent, the code still enters the /Pages branch and iterates pages[PagesAttributes.KIDS], which raises KeyError.

Fix

Treat a missing /Kids as an empty array, so a /Pages container with no children simply contributes no pages. len(reader.pages) then returns 0 for such a document instead of crashing. Documents that do have /Kids are unaffected (the key is still looked up exactly as before, preserving indirect-reference resolution).

Reproduction

from pypdf import PdfReader
# /Pages object: << /Type /Pages /Count 0 >>  (no /Kids)
reader = PdfReader("file.pdf")
print(len(reader.pages))   # before: KeyError: '/Kids'; after: 0

Tests

Added test_flatten__pages_without_kids, which removes /Kids from a real document's /Pages node, sets /Count 0, and asserts len(reader.pages) == 0. It fails with KeyError: '/Kids' on main and passes with this change. Existing multi-page documents still report the correct page counts.

Closes #3811

A page tree node typed as /Pages but missing the /Kids entry (for example a malformed document advertising /Count 0 with no children) caused _flatten to raise a bare KeyError: '/Kids' while iterating the kids. Treat a missing /Kids as an empty array so such files report 0 pages instead of crashing. Closes py-pdf#3811

codecov · 2026-06-02T06:18:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.73%. Comparing base (52545c5) to head (5c04709).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3825   +/-   ##
=======================================
  Coverage   97.73%   97.73%           
=======================================
  Files          55       55           
  Lines       10417    10418    +1     
  Branches     1931     1931           
=======================================
+ Hits        10181    10182    +1     
  Misses        130      130           
  Partials      106      106

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

stefan6419846 · 2026-06-02T08:13:02Z

                    inherit[attr] = pages[attr]
            pages_reference = getattr(pages, "indirect_reference", object())
-            for page in cast(ArrayObject, pages[PagesAttributes.KIDS]):
+            # A malformed /Pages node may be missing /Kids (for example a page


I do not think that this comment contributes enough to be useful here.

stefan6419846 · 2026-06-02T08:14:06Z

+            # A malformed /Pages node may be missing /Kids (for example a page
+            # tree advertising "/Count 0" without any children). Treat it as
+            # having no kids instead of raising a bare KeyError here (#3811).
+            kids = (


This can be written in a simpler way:

Suggested change

kids = (

kids = pages.get(PagesAttributes.KIDS, ArrayObject())

stefan6419846 · 2026-06-02T08:16:29Z

+                if PagesAttributes.KIDS in pages
+                else ArrayObject()
+            )
+            for page in cast(ArrayObject, kids):


While we are at it (and although not required here), I would recommend replacing this cast and increase the resilience here.

What I mean is that we should use an empty ArrayObject if the kids are a NullObject and raise a proper exception if we see anything different from an ArrayObject for the iteration.

stefan6419846 requested changes Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROB: Handle /Pages node without /Kids during flattening#3825

ROB: Handle /Pages node without /Kids during flattening#3825
gaoflow wants to merge 1 commit into
py-pdf:mainfrom
gaoflow:fix-3811-pages-without-kids

gaoflow commented Jun 2, 2026

Uh oh!

codecov Bot commented Jun 2, 2026

Uh oh!

stefan6419846 Jun 2, 2026

Uh oh!

stefan6419846 Jun 2, 2026

Uh oh!

stefan6419846 Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	kids = (
	kids = pages.get(PagesAttributes.KIDS, ArrayObject())

Conversation

gaoflow commented Jun 2, 2026

Summary

Cause

Fix

Reproduction

Tests

Uh oh!

codecov Bot commented Jun 2, 2026

Codecov Report

Uh oh!

stefan6419846 Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

stefan6419846 Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

stefan6419846 Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants