Open
Description
Is your feature request related to a problem? Please describe.
Currently content in .docx
files that are in content control fields
aren't being processed by Unstructured. These fields are commonly used to add valuable metadata, such as headers, to a document.
Describe the solution you'd like
Unstructured should process content in content control fields
similarly as it does other elements. Perhaps a new element type would be good.
Describe alternatives you've considered
Unstructured currently relies on a third-party package called docx
to handle parsing of content. This lib does not support content control fields. There are other third-party libs that do such as: https://pypi.org/project/docx-form/