Skip to content

bug/error on HTML table generation #369

Open
@pawel-kmiecik

Description

@pawel-kmiecik

When processing a PDF file with hi_res in unstructured-api, an error occurs on HTML table generation (from unstructured-inferece):

2024-07-24T08:49:18.887448624Z   File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured/partition/pdf_image/ocr.py", line 284, in supplement_element_with_table_extraction
2024-07-24T08:49:18.887488006Z     text_as_html = "" if tatr_cells == "" else cells_to_html(tatr_cells)
2024-07-24T08:49:18.887503751Z                                                ^^^^^^^^^^^^^^^^^^^^^^^^^
2024-07-24T08:49:18.887511928Z   File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured_inference/models/tables.py", line 704, in cells_to_html
2024-07-24T08:49:18.887519618Z     cells = sorted(fill_cells(cells), key=lambda k: (min(k["row_nums"]), min(k["column_nums"])))
2024-07-24T08:49:18.887527508Z                    ^^^^^^^^^^^^^^^^^
2024-07-24T08:49:18.887534601Z   File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured_inference/models/tables.py", line 667, in fill_cells
2024-07-24T08:49:18.887542331Z     table_rows_no = max({row for cell in cells for row in cell["row_nums"]})
2024-07-24T08:49:18.887549813Z                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-07-24T08:49:18.887557089Z ValueError: max() arg is an empty sequence

Environment:

Unstructured API 0.0.72 deployed in remote machine
Local deployment lib versions
unstructured==0.14.6
unstructured-client==0.18.0
OS
Ubunut 22.04.02 LTS

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions