Skip to content

refactor: don't import unstructured-inference via partition.pdf#4284

Open
artdent wants to merge 1 commit intoUnstructured-IO:mainfrom
artdent:pdf-no-ocr
Open

refactor: don't import unstructured-inference via partition.pdf#4284
artdent wants to merge 1 commit intoUnstructured-IO:mainfrom
artdent:pdf-no-ocr

Conversation

@artdent
Copy link

@artdent artdent commented Mar 16, 2026

This introduces a new extra, "pdf-no-ocr", which just depends on the PDF libraries that don't require running an image model. This avoids the heavyweight dependency on unstructured-inference.

There's a new test target that ensures PDFs can be opened with just a dependency on the slimmer target. I only moved over one complete test file, but with more work, test_pdf.py could be split into image and non-image tests.

Fixes #2128.

This introduces a new extra, "pdf-no-ocr", which just depends on the
PDF libraries that don't require running an image model. This avoids
the heavyweight dependency on unstructured-inference.

Fixes Unstructured-IO#2128.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat/Allow PDF partitioning without unstructured_inference

1 participant