-
Notifications
You must be signed in to change notification settings - Fork 1.5k
ROB: Handle missing font bounding boxes gracefully #3600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3600 +/- ##
=======================================
Coverage 97.35% 97.35%
=======================================
Files 55 55
Lines 9815 9816 +1
Branches 1791 1792 +1
=======================================
+ Hits 9555 9556 +1
Misses 153 153
Partials 107 107 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Some PDFs (e.g., Google Docs exports with emojis) have fonts without valid bounding boxes in their font descriptors. pypdf now gracefully uses the default bbox when /FontBBox is missing instead of crashing with KeyError: 'bbox'.
Adds test_extract_text_with_missing_font_bbox() which tests that PDFs with fonts lacking valid bounding boxes don't cause KeyError. Includes the missing_font_bbox.pdf test file from Google Docs that originally triggered the issue.
218bf2a to
5acf8a7
Compare
stefan6419846
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I have left some comments, which hopefully should be rather straightforward to solve.
Fixes crash when extracting text from PDFs with fonts missing bounding boxes.
Problem
pypdf crashed with
KeyError: 'bbox'when encountering fonts without a/FontBBoxentry in their descriptor. This commonly occurs with PDFs exported from Google Docs containing emoji characters.Solution
Modified
_parse_font_descriptor()in_font.pyto check if the bbox key exists before accessing it.Changes
/FontBBoxgracefully in font descriptor parsingCloses #3599