Open
Description
Describe the bug
When trying to parse and chunk a 7mb xls file, the Unstructured server takes exponential memory space and crashes for me beyond 10gb.
To Reproduce
Input file -
with open(file_path, "rb") as f:
files = shared.Files(
content=f.read(),
file_name=file_path,
)
req = operations.PartitionRequest(
partition_parameters=shared.PartitionParameters(
files=files,
chunking_strategy=ChunkingStrategy.BY_TITLE,
strategy=PartitionStrategy.HI_RES,
multipage_sections=False,
)
)
try:
start = time.time()
elements_by_page = {}
print("File name: ", file_path)
partitioned_data = unstructured_client.general.partition(req)
end = time.time() - start
print("Time taken in seconds: ", end)
# print("Partitioned data:", partitioned_data)
tables = 0
for element in partitioned_data.elements:
if element["type"] == "Table" and element['metadata']['text_as_html'] is not None:
tables += 1
print("Total Table counts: ", tables)
except SDKError as sdk_error:
raise sdk_error
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots

Environment Info
Please run python scripts/collect_env.py
and paste the output here.
This will help us understand more about the environment in which the bug occurred.
Additional context
Unstructured Pod Config
Resources:
CPU: 4
Memory: 10000