Description
Bug
I have bank statement PDF which has a table for transactions and on this page, there are 6 paragraphs of text that overlaps the table. The convert process never finishes. When I cancel the process, the trackback shows:
...
docling/pipeline/base_pipeline.py", line 45, in execute
conv_res = self._build_document(conv_res)
docling/pipeline/base_pipeline.py", line 163, in _build_document
for p in pipeline_pages: # Must exhaust!
docling/pipeline/base_pipeline.py", line 127, in _apply_on_pages
yield from page_batch
docling/models/page_assemble_model.py", line 68, in call
for page in page_batch:
docling/models/table_structure_model.py", line 257, in call
tf_output = self.tf_predictor.multi_table_predict(
docling_ibm_models/tableformer/data_management/tf_predictor.py", line 485, in multi_table_predict
tf_responses, predict_details = self.predict(
docling_ibm_models/tableformer/data_management/tf_predictor.py", line 815, in predict
matching_details = self._post_processor.process(
docling_ibm_models/tableformer/data_management/matching_post_processor.py", line 1353, in process
aligned_table_cells2 = self._align_table_cells_to_pdf(
docling_ibm_models/tableformer/data_management/matching_post_processor.py", line 559, in _align_table_cells_to_pdf
x1s.append(found_cell["bbox"][0])
Steps to reproduce
Upload a PDF with table (with headers and vertical lines separating columns and have text paragraphs the overlaps all the columns.
Docling version
2.28.4
Python version
3.10.15
