tesseract add similar characters in Japanese text (ambiguity management?)

Hi, we have noticed that in a japanese text, tesseract doubled or also triples some characters which there are not in the text. Maybe we can imagine that tesseract make some management about character that are very similar and put all of them in the output instead of choosing one. There's some way to avoid this problem in the output? 

Example: 
text in image=  エンジンコンポーネント
text read by tesseract= エンジンコンポボーネント

as you can see the charachter ポ is transleted as two charachter ポボ
maybe because both have a similar high score of confidence and tesseract do not decide which one to use but put both in the text. There's a way to avoid this error?




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tesseract add similar characters in Japanese text (ambiguity management?) #1063

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

tesseract add similar characters in Japanese text (ambiguity management?) #1063

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions