Open
Description
Describe the bug
I'm trying to partition emails. In some cases, the processing results in a KeyError: 'multipart/mixed'
>>> from unstructured.partition.email import partition_email
>>> partition_email("testcase.txt")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/app/unstructured/partition/email.py", line 73, in partition_email
return list(_EmailPartitioner.iter_elements(ctx=ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/unstructured/partition/email.py", line 333, in _iter_elements
yield from _AttachmentPartitioner.iter_elements(attachment, self._ctx)
File "/app/unstructured/partition/email.py", line 388, in _iter_elements
file = io.BytesIO(self._file_bytes)
^^^^^^^^^^^^^^^^
File "/app/unstructured/utils.py", line 154, in __get__
value = self._fget(obj)
^^^^^^^^^^^^^^^
File "/app/unstructured/partition/email.py", line 423, in _file_bytes
content = self._attachment.get_content()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/email/message.py", line 1124, in get_content
return content_manager.get_content(self, *args, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/email/contentmanager.py", line 25, in get_content
raise KeyError(content_type)
KeyError: 'multipart/mixed'
To Reproduce
The easiest way to reproduce the behavior for me is to try to partition a PGP signed but not encrypted email. I have attached an anonymized example.
Expected behavior
I expect the email to be parsed and partitioned into its parts.
Environment Info
docker image downloads.unstructured.io/unstructured-io/unstructured:0.16.20, also :0.15.14 and :latest
Additional context
Add any other context about the problem here.