Skip to content

DOCX doesn't recognize text within the floating textbox #3441

Open
@felixchen464atrc

Description

@felixchen464atrc

Describe the bug
DOCX doesn't recognize text within the floating textbox when partition

To Reproduce
Do partition for this file.
test-text-box (1).docx

Expected behavior
Text in floating text box can be recognized.

Actual behavior
No element in floating textbox is recognized.

Screenshots
image

Environment Info
self-hosted unstructure
unstructure.io-api: 0.0.75
dependency: https://github.com/Unstructured-IO/unstructured-api/blob/main/requirements/base.txt

Additional context
I use python client to make request with those parameters now.

extract_image_block_types=["Image"],
output_format=shared.OutputFormat.APPLICATION_JSON,
encoding="utf-8",
coordinates=True,

Metadata

Metadata

Assignees

No one assigned

    Labels

    docxRelated to Microsoft Word (.docx) file formatenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions