Content is lost during table conversion.

When attempting to convert a PDF document containing tables, I noticed that the content of one cell is missing. 
Here is my PDF document: [pdf_10.pdf](https://github.com/user-attachments/files/20497784/pdf_10.pdf). And it lost the cell of "护发类". For reference, I'm using the following parameters:

  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'files=@pdf_10.pdf;type=application/pdf' \
  -F 'from_formats=pdf' \
  -F 'to_formats=md' \
  -F 'ocr_engine=rapidocr' \
  -F 'force_ocr=true' \
  -F 'table_mode=accurate' \
  -F 'pdf_backend=dlparse_v4'

When I track the code, I find it uses [tablemodel04_rs.py](https://github.com/docling-project/docling-ibm-models/blob/main/docling_ibm_models/tableformer/models/table04_rs/tablemodel04_rs.py)(url) to process the tables. I think the problem is with the model. Would it be possible to fix this, like using another model?

Thanks in advance!





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content is lost during table conversion. #109

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Content is lost during table conversion. #109

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions