Open
Description
Dear Reader,
I've did some comparison with random text.
- Random text, to test the raw engine performance, not dictionaries
- because foreign, perhaps transcripted (foreign) names sometimes look like "sviyazhsk", "kozhva", "jizzax", ...
here is the original random text:
original text
here is the generated image (font: GaramondNo8):
Result:
Filename | Levenshtein distance |
---|---|
abbyy11-English.txt | 5 |
abbyy11-GermanLuxembourg.txt | 2 |
orig.txt | 0 |
v3.04.01 tess3-eng.txt | 1273 |
v3.04.01 tess3-engWithoutDict.txt | 763 |
v4.0.0-beta.2-556-g607e tess4-eng.txt | 222 |
v4.0.0-beta.2-556-g607e tess4-engWithoutDict.txt | 215 |
v4.0.0-beta.2-556-g607e tess4-scriptLatin.txt | 62 |
v4.1.0 ______________ tess4-scriptLatin.txt | 62 |
v4.0.0-beta.2-556-g607e tess4-scriptLatinWithoutDict.txt | 58 |
v4.0.0-beta.2-556-g607e tess4-scriptLatinWithoutDict.txt, ą replaced by q manually | 45 |
Abbyy language "GermanLuxembourg" has no "full dictionary", don't know, what this exactly means, but results are better than "English", because "itsan" would (using English) be recognized as "its an".
engWithoutDict has been made using
combine_tessdata -u ...
rm *-dawg
combine_tessdata ...
Kind regards,
Jochen
Metadata
Assignees
Labels
No labels
Activity