Skip to content

feat/docx-content-control-fields #3553

Open
@homanp

Description

@homanp

Is your feature request related to a problem? Please describe.
Currently content in .docx files that are in content control fields aren't being processed by Unstructured. These fields are commonly used to add valuable metadata, such as headers, to a document.

Describe the solution you'd like
Unstructured should process content in content control fields similarly as it does other elements. Perhaps a new element type would be good.

Describe alternatives you've considered
Unstructured currently relies on a third-party package called docx to handle parsing of content. This lib does not support content control fields. There are other third-party libs that do such as: https://pypi.org/project/docx-form/

Metadata

Metadata

Assignees

Labels

docxRelated to Microsoft Word (.docx) file formatenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions