Open
Description
Describe the bug
Starting with version 0.0.71 ocr_only pdf processing of some documents yields an empty result. Exactly the same code works fine with version 0.0.70.
To Reproduce
docker run --platform linux/x86_64 -p 8000:8000 -d --rm --name unstructured-api downloads.unstructured.io/unstructured-io/unstructured-api:0.0.71
curl -X 'POST' \
'http://127.0.0.1:8000/general/v0/general' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F '[email protected]' \
-F 'strategy=ocr_only' \
-F 'languages=deu'
- Now pull version 0.0.70 and try the same. In the first case the result is empty, in the second the result is correct and non-empty.
- Filetype: PDF
- Any additional API parameters: -
Environment:
- self-hosted API
- any client yields the same result
Additional context
Attaching the problematic PDF.
4.pdf
Metadata
Metadata
Assignees
Labels
No labels