-
Notifications
You must be signed in to change notification settings - Fork 147
OCR language
By default, Paperwork uses Tesseract for the OCR. If unavailable, it falls back on Cuneiform.
To get better results, OCR tools need to know the language used in the document(s).
The language available in the settings dialog of Paperwork are those understood by the automatically-selected OCR tool (Tesseract or Cuneiform). If your language is not in the list, it means the OCR tool doesn't have the data required to read your language.
Note that Paperwork also automatically use available spellcheckers (aspell, ispell, myspell, etc) to improve the detection of the orientation of the page. It means your spellchecker must have the dictionary corresponding to your language installed. Warning: if no spellcheck is installed or if it doesn't have the required dictionary, Paperwork will try to detect the orientation without spellchecking (--> no error dialog displayed)
# OCR (Tesseract)
$ sudo apt-get install tesseract-ocr tesseract-ocr-<lang>
# Spell checking (myspell)
$ sudo apt-get install myspell myspell-<lang>
# OCR (Tesseract)
$ sudo yum install tesseract tesseract-langpack-<lang>
# Spell checking (aspell)
$ sudo yum install aspell aspell-<lang>
# OCR (Tesseract)
$ sudo apt-get install tesseract-ocr tesseract-ocr-<lang>
# Spell checking (myspell)
$ sudo apt-get install myspell myspell-<lang>