Skip to content

Content is lost during table conversion. #109

@skye77777

Description

@skye77777

When attempting to convert a PDF document containing tables, I noticed that the content of one cell is missing.
Here is my PDF document: pdf_10.pdf. And it lost the cell of "护发类". For reference, I'm using the following parameters:

-H 'accept: application/json'
-H 'Content-Type: multipart/form-data'
-F 'files=@pdf_10.pdf;type=application/pdf'
-F 'from_formats=pdf'
-F 'to_formats=md'
-F 'ocr_engine=rapidocr'
-F 'force_ocr=true'
-F 'table_mode=accurate'
-F 'pdf_backend=dlparse_v4'

When I track the code, I find it uses tablemodel04_rs.py(url) to process the tables. I think the problem is with the model. Would it be possible to fix this, like using another model?

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions