Open
Description
I've compared these frak models:
ocrd: https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/frak2021/tessdata_best/frak2021-0.905.traineddata from ocrd resmgr
ubma: https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/frak2021/tessdata_fast/frak2021_1.069.traineddata from https://ocr-bw.bib.uni-mannheim.de/faq/
size & md5sum:
-rw-rw-r-- 1 jb jb 3421140 Mär 27 2021 ocrd--frak2021-0.905.traineddata
234e8bb819042f615576bd01aa2419fd ocrd--frak2021-0.905.traineddata
-rw-rw-r-- 1 jb jb 5060763 Dez 9 2021 ubma--frak2021_1.069.traineddata
9405b1603db21cb066e4e7614a405dd4 ubma--frak2021_1.069.traineddata
content after combine_tessdata -u x.traineddata aa
:
jb@nuc:~/models$ LC_ALL=C ls -lh ocrd ubma
ocrd:
total 3.3M
-rw-rw-r-- 1 jb jb 3.3M Dec 21 12:18 aa.lstm
-rw-rw-r-- 1 jb jb 2.8K Dec 21 12:18 aa.lstm-recoder
-rw-rw-r-- 1 jb jb 22K Dec 21 12:18 aa.lstm-unicharset
-rw-rw-r-- 1 jb jb 30 Dec 21 12:18 aa.version
-rw-rw-r-- 1 jb jb 345 Dec 21 12:18 extr.log
ubma:
total 4.9M
-rw-rw-r-- 1 jb jb 432K Dec 21 12:18 aa.lstm
-rw-rw-r-- 1 jb jb 6.3K Dec 21 12:18 aa.lstm-number-dawg
-rw-rw-r-- 1 jb jb 4.5K Dec 21 12:18 aa.lstm-punc-dawg
-rw-rw-r-- 1 jb jb 2.8K Dec 21 12:18 aa.lstm-recoder
-rw-rw-r-- 1 jb jb 22K Dec 21 12:18 aa.lstm-unicharset
-rw-rw-r-- 1 jb jb 4.4M Dec 21 12:18 aa.lstm-word-dawg
-rw-rw-r-- 1 jb jb 30 Dec 21 12:18 aa.version
-rw-rw-r-- 1 jb jb 553 Dec 21 12:18 extr.log
ubma is with .lstm-word-dawg, ocrd is without.
ocrd is 3.3M lstm size, ubma is 432k lstm size.
shouldn't ocrd use the ubma file for fraktur/gothic?