-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Describe the bug
I have some code which takes uploaded files and passes them into the langchain UnstructuredLoader, which as you can see from my error log down below is calling Unstructured's partition function. When the uploaded file is a zip file I'm using Python's built-in zipfile module to load the contents into file-like objects. I've tried with several different text files with the same results. In all cases I'm passing a file-like object into Unstructured.
- Uploading the text file directly: success
- Uploading a zip file containing PDF, DOCX, PNG etc.: success
- Uploading a zip file containing the working text file:
2025-09-23 16:50:09 File "/shabti/.venv/lib/python3.12/site-packages/unstructured/partition/auto.py", line 292, in partition
2025-09-23 16:50:09 elements = partition(filename=filename, file=file, **partitioning_kwargs)
2025-09-23 16:50:09 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-23 16:50:09 File "/shabti/.venv/lib/python3.12/site-packages/unstructured/partition/common/metadata.py", line 162, in wrapper
2025-09-23 16:50:09 elements = func(*args, **kwargs)
2025-09-23 16:50:09 ^^^^^^^^^^^^^^^^^^^^^
2025-09-23 16:50:09 File "/shabti/.venv/lib/python3.12/site-packages/unstructured/chunking/dispatch.py", line 74, in wrapper
2025-09-23 16:50:09 elements = func(*args, **kwargs)
2025-09-23 16:50:09 ^^^^^^^^^^^^^^^^^^^^^
2025-09-23 16:50:09 File "/shabti/.venv/lib/python3.12/site-packages/unstructured/partition/text.py", line 81, in partition_text
2025-09-23 16:50:09 encoding, file_text = read_txt_file(file=file, encoding=encoding)
2025-09-23 16:50:09 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-23 16:50:09 File "/shabti/.venv/lib/python3.12/site-packages/unstructured/file_utils/encoding.py", line 146, in read_txt_file
2025-09-23 16:50:09 formatted_encoding, file_text = detect_file_encoding(file=file)
2025-09-23 16:50:09 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-23 16:50:09 File "/shabti/.venv/lib/python3.12/site-packages/unstructured/file_utils/encoding.py", line 70, in detect_file_encoding
2025-09-23 16:50:09 byte_data = convert_to_bytes(file)
2025-09-23 16:50:09 ^^^^^^^^^^^^^^^^^^^^^^
2025-09-23 16:50:09 File "/shabti/.venv/lib/python3.12/site-packages/unstructured/partition/common/common.py", line 386, in convert_to_bytes
2025-09-23 16:50:09 raise ValueError("Invalid file-like object type")
2025-09-23 16:50:09 ValueError: Invalid file-like object type
To Reproduce
with zipfile.ZipFile(file) as my_zip:
for info in my_zip.infolist():
loader = UnstructuredLoader(
file=my_zip.open(info),
strategy="auto",
chunking_strategy="by_title",
metadata_filename=info.filename,
)
pages = loader.load()
Expected behavior
The file is able to be loaded
Environment Info
Please run python scripts/collect_env.py and paste the output here.
I can't find where this collect_env.py is in my installation.
I created a Docker image based on astral/uv:python3.12-trixie-slim with unstructured[all-docs]>=0.18.14 in my Python dependencies. I have installed all the recommended system dependencies except libmagic as I am also having some issues with that.