Skip to content

Commit ed5f2c2

Browse files
authored
fix: handling for empty table recognition (#317)
### Summary Closes #306. Addresses the condition where `recognize` returns an empty list in the `run_prediction` method on `UnstructuredTableTransformerModel`. ### Testing `pytest test_unstructured_inference/models/test_tables.py`
1 parent 2f3ec9e commit ed5f2c2

File tree

5 files changed

+31
-4
lines changed

5 files changed

+31
-4
lines changed

.github/workflows/ci.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ on:
77
branches: [ main ]
88

99
env:
10-
PYTHON_VERSION: 3.8
10+
PYTHON_VERSION: 3.9
1111

1212
jobs:
1313
setup:
@@ -107,7 +107,7 @@ jobs:
107107
test_ingest:
108108
strategy:
109109
matrix:
110-
python-version: ["3.8","3.9","3.10"]
110+
python-version: ["3.9","3.10"]
111111
runs-on: ubuntu-latest
112112
env:
113113
NLTK_DATA: ${{ github.workspace }}/nltk_data

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
## 0.7.23
2+
3+
* fix: added handling in `UnstructuredTableTransformerModel` for if `recognize` returns an empty
4+
list in `run_prediction`.
5+
16
## 0.7.22
27

38
* fix: add logic to handle computation of intersections betwen 2 `Rectangle`s when a `Rectangle` has `None` value in its coordinates

test_unstructured_inference/models/test_tables.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -932,6 +932,21 @@ def test_table_prediction_output_format(
932932
assert expectation in result
933933

934934

935+
def test_table_prediction_runs_with_empty_recognize(
936+
table_transformer,
937+
example_image,
938+
mocker,
939+
mocked_ocr_tokens,
940+
):
941+
mocker.patch.object(tables, "recognize", return_value=[])
942+
mocker.patch.object(
943+
tables.UnstructuredTableTransformerModel,
944+
"get_structure",
945+
return_value=None,
946+
)
947+
assert table_transformer.run_prediction(example_image, ocr_tokens=mocked_ocr_tokens) == ""
948+
949+
935950
def test_table_prediction_with_ocr_tokens(table_transformer, example_image, mocked_ocr_tokens):
936951
prediction = table_transformer.predict(example_image, ocr_tokens=mocked_ocr_tokens)
937952
assert '<table><thead><th rowspan="2">' in prediction
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.7.22" # pragma: no cover
1+
__version__ = "0.7.23" # pragma: no cover

unstructured_inference/models/tables.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,14 @@ def run_prediction(
9696
outputs_structure = self.get_structure(x, pad_for_structure_detection)
9797
if ocr_tokens is None:
9898
raise ValueError("Cannot predict table structure with no OCR tokens")
99-
prediction = recognize(outputs_structure, x, tokens=ocr_tokens)[0]
99+
100+
recognized_table = recognize(outputs_structure, x, tokens=ocr_tokens)
101+
if len(recognized_table) > 0:
102+
prediction = recognized_table[0]
103+
# NOTE(robinson) - This means that the table was not recognized
104+
else:
105+
return ""
106+
100107
if result_format == "html":
101108
# Convert cells to HTML
102109
prediction = cells_to_html(prediction) or ""

0 commit comments

Comments
 (0)