I am working with partition_docx() and the document I am processing has the 'subject' document property inserted several places throughout the document.
It appears that partition_docx() does not pick up the text of this property, and it is therefore dropped from the output text.
You can easily recreate this:
- create a new MS Word document
- add the subject property via file->info->properties->advanced properties
- add some text to the document and then insert the subject property into the document via insert->quick parts->field
- parse the document via
partition_docx()
- the subject property text is omitted from the output of
partition_docx()