Download Latin.unicharset along with radical-stroke.txt#219
Download Latin.unicharset along with radical-stroke.txt#219Shreeshrii wants to merge 2 commits intotesseract-ocr:masterfrom Shreeshrii:PR6
Conversation
|
All unicharset files for scripts are potentially needed, starting with I usually get the required ones to satisfy the error message(s), but still don't know what happens if they are missing. |
|
I added only Latin and Inherited unicharsets in this list because these are required in almost all cases, even though they don't stop processing like missing radical-stroke.txt. We could add another optional variable for SCRIPT_UNICHARSET, downloading it when it is non-blank.
I think some characters e.g. Arabic accents get dropped in the generated unicharset by unicharset_extractor. That was the reason I built the Inherited.unicharset. |
|
|
||
| $(DATA_DIR)/radical-stroke.txt: | ||
| # wget -O $(DATA_DIR)/Inherited.unicharset 'https://github.com/tesseract-ocr/langdata_lstm/raw/master/Inherited.unicharset' | ||
| wget -O $(DATA_DIR)/Latin.unicharset 'https://github.com/tesseract-ocr/langdata_lstm/raw/master/Latin.unicharset' |
There was a problem hiding this comment.
I'd put that in a separate Makefile target.
There was a problem hiding this comment.
Inherited.unicharset is NOT there in langdata_lstm repo. I created it by copying the lines with Inherited from other unicharsets. But there are some differences in coordinates for same character in different unicharsets, so I am not sure which one is to be used.
There was a problem hiding this comment.
Hi
how can I get the Inherited.unicharset
|
A list of all required |
|
Thanks for the suggestions @stweil and the hint to get the list of required unicharsets from $(OUTPUT_DIR)/unicharset. I am having a hard time putting it together in a separate Makefile target using the list. Would appreciate if you can make the required change. Here is what I have tried so far: |
|
@kba Could you pls. have a look at the change request and maybe come up with a proposal? |
|
I added A simpler way maybe asking the user to specify a script and download that. |
I have tried that in the new Makefile-font2model |
|
Included as part of #230 |
Need another PR to add Inherited.unicharset after tesseract-ocr/langdata_lstm#41 is merged