Synthetical comparison with Abbyy

Dear Reader,

I've did some comparison with random text. 
 * Random text, to test the raw engine performance, not dictionaries
 * because foreign, perhaps transcripted (foreign) names sometimes look like "sviyazhsk", "kozhva", "jizzax", ...

here is the original random text:
[original text](https://digi.ub.uni-heidelberg.de/diglitData/v/orig.txt)

here is the generated image (font: GaramondNo8):
![Image of "Scan"](https://digi.ub.uni-heidelberg.de/diglitData/v/orig001.tif)

Result:

Filename | Levenshtein distance
---------|--------------------
abbyy11-English.txt | 5
abbyy11-GermanLuxembourg.txt | 2
orig.txt | 0
v3.04.01 tess3-eng.txt | 1273
v3.04.01 tess3-engWithoutDict.txt | 763
v4.0.0-beta.2-556-g607e tess4-eng.txt | 222
v4.0.0-beta.2-556-g607e tess4-engWithoutDict.txt | 215
v4.0.0-beta.2-556-g607e tess4-scriptLatin.txt | 62
v4.1.0 ______________ tess4-scriptLatin.txt | 62
v4.0.0-beta.2-556-g607e tess4-scriptLatinWithoutDict.txt | 58
v4.0.0-beta.2-556-g607e tess4-scriptLatinWithoutDict.txt, ą replaced by q manually | 45



Abbyy language "GermanLuxembourg" has no "full dictionary", don't know, what this exactly means, but results are better than "English", because "itsan" would (using English) be recognized as "its an".

engWithoutDict has been made using 

```
combine_tessdata -u ...
rm *-dawg 
combine_tessdata ...
```
Kind regards,
Jochen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synthetical comparison with Abbyy #108

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Filename	Levenshtein distance
abbyy11-English.txt	5
abbyy11-GermanLuxembourg.txt	2
orig.txt	0
v3.04.01 tess3-eng.txt	1273
v3.04.01 tess3-engWithoutDict.txt	763
v4.0.0-beta.2-556-g607e tess4-eng.txt	222
v4.0.0-beta.2-556-g607e tess4-engWithoutDict.txt	215
v4.0.0-beta.2-556-g607e tess4-scriptLatin.txt	62
v4.1.0 ______________ tess4-scriptLatin.txt	62
v4.0.0-beta.2-556-g607e tess4-scriptLatinWithoutDict.txt	58
v4.0.0-beta.2-556-g607e tess4-scriptLatinWithoutDict.txt, ą replaced by q manually	45