Skip to content

corrupted text #2

@xudesheng

Description

@xudesheng

Use this PDF as an example: https://arxiv.org/pdf/2407.01481

"efficient" and "difficult" are both corrupted in the same pattern: "ffi" was recognized as HEX "EF 81 8E."

image

I don't see the same issue if I use the pdfium library to grab the text. "ffi" is just one example.

speed-up and ecient use of resources is essential
One of the more dicult aspects of High Performance Computing

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions