Open
Description
Hi, we have noticed that in a japanese text, tesseract doubled or also triples some characters which there are not in the text. Maybe we can imagine that tesseract make some management about character that are very similar and put all of them in the output instead of choosing one. There's some way to avoid this problem in the output?
Example:
text in image= エンジンコンポーネント
text read by tesseract= エンジンコンポボーネント
as you can see the charachter ポ is transleted as two charachter ポボ
maybe because both have a similar high score of confidence and tesseract do not decide which one to use but put both in the text. There's a way to avoid this error?