bug/Partition-PDF-empty-elements

**Describe the bug**
Partition PDF with 'fast' strategy returns an empty list of elements when OCR is not needed. Text is returned instantly with other libraries like PyMuPDF.

**Reproduction**
```
from unstructured.partition.pdf import partition_pdf
import pymupdf

fname = 'file.PDF'

elements = partition_pdf(filename=fname, strategy='fast')
elements
Out[18]: []

with pymupdf.open(fname) as doc:
     text = chr(12).join([page.get_text() for page in doc])
Out: ...many pages of text
```

**Expected behavior**
Partition PDF should return chunks of text without running OCR when PDF has embedded text


**Environment Info**
Please run `python scripts/collect_env.py` and paste the output here. 

```
OS version:  Linux-5.14.0-427.26.1.el9_4.x86_64-x86_64-with-glibc2.34
Python version:  3.12.8
unstructured version:  0.16.15
unstructured-inference version:  0.8.1
pytesseract is not installed
Torch version:  2.5.1
Detectron2 is not installed
PaddleOCR is not installed
Libmagic version: file-5.39
magic file from /etc/magic:/usr/share/misc/magic
Traceback (most recent call last):
...
FileNotFoundError: [Errno 2] No such file or directory: 'libreoffice'

```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug/Partition-PDF-empty-elements #3885

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug/Partition-PDF-empty-elements #3885

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions