This repo is heavily inspired by Gabriel Garcia's Tesseract Tutorial
I recreated it to get a better understanding of how tesstrain works. I've also included training.sh to help with the training.
When creating this project, I trained a model to recognize the Minecraft font, on top of the English one, hence why the MODEL_NAME in training.sh is mc.
Make sure you clone tesseract and tesstrain before running the code
If you want to train a model on top op E.G the English one, you need to place the model from the tessdata repository into tesseract/tessdata.
Here you can either use langdata from Tesseract's langdata repository, or you can use generate-training-text.py, which takes in a list of symbols, and generates random text with it.
Run generate-ground-truth.py and follow instructions given in the terminal. The code uses the training_text file in the data folder.
If you're on windows, you may need to use WSL.
Use the training.sh file and adjust it to your needs. The most important variable to change here is the MODEL_NAME.