Version 0.0.71 breaks ocr_only pdf recognition on some documents

**Describe the bug**
Starting with version 0.0.71 ocr_only pdf processing of some documents yields an empty result. Exactly the same code works fine with version 0.0.70.

**To Reproduce**
1. `docker run --platform linux/x86_64 -p 8000:8000 -d --rm --name unstructured-api downloads.unstructured.io/unstructured-io/unstructured-api:0.0.71`
2. 
```
curl -X 'POST' \
  'http://127.0.0.1:8000/general/v0/general' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'files=@4.pdf' \
  -F 'strategy=ocr_only' \
  -F 'languages=deu'
```
3. Now pull version 0.0.70 and try the same. In the first case the result is empty, in the second the result is correct and non-empty.

- Filetype: PDF
- Any additional API parameters: -

**Environment:**
 - self-hosted API
 - any client yields the same result

**Additional context**
Attaching the problematic PDF.
[4.pdf](https://github.com/user-attachments/files/18424620/4.pdf)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 0.0.71 breaks ocr_only pdf recognition on some documents #485

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Version 0.0.71 breaks ocr_only pdf recognition on some documents #485

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions