fix: manage PDFium backend resource lifecycles to avoid SIGSEGV/SIGTRAP crashes#3180
Merged
Conversation
pypdfium2's to_pil() shares native buffer memory for RGBA/RGBX/L formats via frombuffer(). The chained render().to_pil().resize() pattern allowed the PdfBitmap to reach refcount 0 mid-expression, causing GC to invoke FPDFBitmap_Destroy and free the native buffer while PIL still held a dangling pointer to it — resulting in non-deterministic SIGSEGV crashes in concurrent scenarios. Fix: store the bitmap explicitly, copy the PIL image to detach it from the shared native buffer, then close the bitmap under the lock before proceeding with the resize on the independent copy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
…live-page tracking Introduces ManagedPdfiumDocumentBackend / ManagedPdfiumPageBackend base classes that both PDF backends now inherit from. Key changes: - Live pages are tracked in a set on the document; document unload waits for all pages to be released before tearing down native handles. - Page and document unload now call explicit .close() on native PDFium objects under the lock, rather than just nulling Python references. This makes teardown deterministic rather than relying on GC finalizers which can fire from any thread without the lock. - text_page is explicitly closed before _ppage to respect the PDFium parent/child handle hierarchy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Condition, Lock, _live_pages set, _closing flag, and owner back-ref on pages were remnants of the Group-3b pipeline defensive shutdown that was not included here. The pipeline always unloads page backends before calling document.unload(), so _close_live_pages() was always a no-op and notify_all() had zero waiters. Reduced ManagedPdfiumDocumentBackend/ManagedPdfiumPageBackend to just a _closed guard and the abstract _close_native_* dispatch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Contributor
|
✅ DCO Check Passed Thanks @cau-git, all your commits are properly signed off. 🎉 |
Contributor
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
3 tasks
I, Christoph Auer <cau@zurich.ibm.com>, hereby add my Signed-off-by to this commit: b3f4e66 I, Christoph Auer <cau@zurich.ibm.com>, hereby add my Signed-off-by to this commit: 79b1894 I, Christoph Auer <cau@zurich.ibm.com>, hereby add my Signed-off-by to this commit: b389c82 I, Christoph Auer <cau@zurich.ibm.com>, hereby add my Signed-off-by to this commit: 5e3510f Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
PeterStaar-IBM
approved these changes
Mar 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes PDFium-backed resource lifecycle management to prevent leaked native handles and make cleanup deterministic across our PDF backends.
It introduces shared managed lifecycle helpers for PDFium document/page backends, updates both the
pypdfium2anddocling-parsebackends to use them, and explicitly closes renderedPdfBitmapinstances after copying images into PIL.Changes
ManagedPdfiumDocumentBackendandManagedPdfiumPageBackendas shared lifecycle wrapperspypdfium2backend cleanup to explicitly close:docling-parsebackend cleanup to explicitly close:PdfBitmaprender results after converting them to PIL imagesuv.lockWhy
The previous lifecycle handling could leave native PDFium resources open longer than intended and relied on less explicit cleanup behavior. This change makes ownership and teardown clearer, safer, and consistent across both PDF backends.
Notes
supersedes #3172