- Install Tesseract https://github.com/tesseract-ocr/tesseract/releases
- Install GIT for windows
- Download WinGet - https://learn.microsoft.com/en-us/windows/package-manager/winget/download
winget install ezwinports.make
winget install wget
git clone https://github.com/tesseract-ocr/tesstrain.git
- Create a directory in tesstrain named
Data
- Create a sub directory
1830PalmyraEdition-ground-truth
- Copy OCR training data *.tif and *.gt.txt to that folder
- Make a sibling directory of tesstrain called tessdata
- Copy eng.traineddata into it
- Open GIT Bash
- CD to data directory
- ./trainocr.sh
Models will be trained and the traineddata copied to the correct folder.