Open
Description
When processing a PDF file with hi_res in unstructured-api
, an error occurs on HTML table generation (from unstructured-inferece
):
2024-07-24T08:49:18.887448624Z File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured/partition/pdf_image/ocr.py", line 284, in supplement_element_with_table_extraction
2024-07-24T08:49:18.887488006Z text_as_html = "" if tatr_cells == "" else cells_to_html(tatr_cells)
2024-07-24T08:49:18.887503751Z ^^^^^^^^^^^^^^^^^^^^^^^^^
2024-07-24T08:49:18.887511928Z File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured_inference/models/tables.py", line 704, in cells_to_html
2024-07-24T08:49:18.887519618Z cells = sorted(fill_cells(cells), key=lambda k: (min(k["row_nums"]), min(k["column_nums"])))
2024-07-24T08:49:18.887527508Z ^^^^^^^^^^^^^^^^^
2024-07-24T08:49:18.887534601Z File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured_inference/models/tables.py", line 667, in fill_cells
2024-07-24T08:49:18.887542331Z table_rows_no = max({row for cell in cells for row in cell["row_nums"]})
2024-07-24T08:49:18.887549813Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-07-24T08:49:18.887557089Z ValueError: max() arg is an empty sequence
Environment:
Unstructured API 0.0.72 deployed in remote machine
Local deployment lib versions
unstructured==0.14.6
unstructured-client==0.18.0
OS
Ubunut 22.04.02 LTS
Metadata
Metadata
Assignees
Labels
No labels