Replies: 2 comments 2 replies
-
Check this pdf as an example A similar feature : |
Beta Was this translation helpful? Give feedback.
-
force ocr has an important use case of fixing broken text encodings or documents that their entire font rendered as vectors ("render text as curves"). Code cannot tell if a vector represents text, but that does matter to users. For your use case you should use If someone wanted to pull out the relevant code from Inkscape and integrate with ocrmypdf I'd consider it. |
Beta Was this translation helpful? Give feedback.
-
Inkscape have the ability to convert to smooth vectors all the typography, would be wholesome if OCRmypdf would take that as a base and paste the ocr results in it. Very similar to adobe clearscan
What we get:
Use cases:
-When we have copyright unknown typography it happens when we copy text it pastes gibberish and squares, --redo-ocr wont touch it because it recognizes as something else and will refuse to go further and --force-ocr will convert it into images which is not ideal, if we manage to paste the vector results instead of images we will improve the product at overall.
Downsides:
-Inkscape convert the typography into shapes (areas) which inherently is way costly than just lines/outlines, but that would change if inkscape add some feature to make it more optimal for us
Beta Was this translation helpful? Give feedback.
All reactions