Skip to content

bug/poor partition output from ocr_only strategies with TIFF image file #3027

Open
@yuming-long

Description

@yuming-long

Describe the bug
on ocr_only partition with TIFF image layout-parser-paper-combined.tiff link, partition output is poor compared to hi_res and only extracted text from first page (out of total 2 pages)

To Reproduce

curl -X 'POST'  'https://api.unstructured.io/general/v0/general'   -H 'accept: application/json'   -H 'Content-Type: multipart/form-data'  -F '[email protected]' -H 'unstructured-api-key: <api_key>' -F 'strategy=ocr_only' | jq -C . | less -R

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingimageIssues related to partitioning image formats like PNG, TIFF, etc.ocrRelated to optical character recognition (OCR).

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions