diff --git a/doc/unicharset_extractor.1.asc b/doc/unicharset_extractor.1.asc index 80b68913ec..be5dd41f5b 100644 --- a/doc/unicharset_extractor.1.asc +++ b/doc/unicharset_extractor.1.asc @@ -1,3 +1,7 @@ +:doctype: manpage +:man manual: User Commands +:man source: tesseract + UNICHARSET_EXTRACTOR(1) ======================= @@ -7,34 +11,34 @@ unicharset_extractor - Reads box or plain text files to extract the unicharset. SYNOPSIS -------- -*unicharset_extractor* [--output_unicharset filename] [--norm_mode mode] box_or_text_file [...] +*unicharset_extractor* [*--output_unicharset* _filename_] [*--norm_mode* _mode_] _box_or_text_file_ [...] -Where mode means: +Where _mode_ means: 1=combine graphemes (use for Latin and other simple scripts) 2=split graphemes (use for Indic/Khmer/Myanmar) 3=pure unicode (use for Arabic/Hebrew/Thai/Tibetan) DESCRIPTION ----------- -Tesseract needs to know the set of possible characters it can output. -To generate the unicharset data file, use the unicharset_extractor +*tesseract* needs to know the set of possible characters it can output. +To generate the _unicharset_ data file, use the *unicharset_extractor* program on training pages bounding box files or a plain text file: unicharset_extractor fontfile_1.box fontfile_2.box ... -The unicharset will be put into the file './unicharset' if no output filename is provided. +The unicharset will be put into the file _./unicharset_ if no output filename is provided. -*NOTE* Use the appropriate norm_mode based on the language. +*NOTE*: Use the appropriate *norm_mode* based on the language. SEE ALSO -------- -tesseract(1), unicharset(5) +*tesseract*(1), *unicharset*(5) - +https://tesseract-ocr.github.io/tessdoc/Training-Tesseract.html[] HISTORY ------- -unicharset_extractor first appeared in Tesseract 2.00. +*unicharset_extractor* first appeared in Tesseract 2.00. COPYING -------