question #90

zoraniy · 2025-12-16T22:06:28Z

zoraniy
Dec 16, 2025

Hey, I was digging through the repo and noticed the recent switch from PyPDF2 to pypdf. I’m curious—did this actually fix real-world PDF parsing issues, or did it just trade one set of edge-case bugs for another? Specifically, how does the project handle messy PDFs with broken text order or weird encodings now?

Answered by Binivert

Dec 16, 2025

Yeah, that’s a fair concern. The move to pypdf wasn’t just cosmetic—it was mainly about stability and long-term maintenance since PyPDF2 was deprecated and increasingly flaky with newer PDFs. That said, no PDF library fully “solves” broken text order or bad encodings; those problems live inside the PDFs themselves. What this project does better now is fail more predictably and give cleaner extraction in common cases, instead of silently mangling text. For truly messy PDFs, the limitation is acknowledged rather than hidden, and the modular setup makes it easier to improve or swap out the PDF layer again if something better comes along.

View full answer

Binivert · 2025-12-16T22:07:35Z

Binivert
Dec 16, 2025

Yeah, that’s a fair concern. The move to pypdf wasn’t just cosmetic—it was mainly about stability and long-term maintenance since PyPDF2 was deprecated and increasingly flaky with newer PDFs. That said, no PDF library fully “solves” broken text order or bad encodings; those problems live inside the PDFs themselves. What this project does better now is fail more predictably and give cleaner extraction in common cases, instead of silently mangling text. For truly messy PDFs, the limitation is acknowledged rather than hidden, and the modular setup makes it easier to improve or swap out the PDF layer again if something better comes along.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

question #90

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

question #90

Uh oh!

zoraniy Dec 16, 2025

Replies: 1 comment

Uh oh!

Binivert Dec 16, 2025

zoraniy
Dec 16, 2025

Binivert
Dec 16, 2025