This is a handwritten text recognition (HTR) pipeline that operates on scanned pages and applies the following operations:
- Detect words
- Read words
- Download the zipped model weights
- Unzip
- Copy the files (reader.onnx, reader.json, detector.onnx) into the folder
htr_pipeline/models - Go to the root level of the repository (where
setup.pyis located) - Execute
pip install .
- Additionally install matplotlib for plotting:
pip install matplotlib - Go to
scripts/ - Run
python demo.py - The output should look like the plot shown above
- Additionally install gradio:
pip install gradio - Go to the root directory of the repository
- Run
python scripts/gradio_demo.py - Open the URL shown in the output
Import the function read_page to detect and read text.
import cv2
from htr_pipeline import read_page, DetectorConfig, LineClusteringConfig
# read image
img = cv2.imread('data/sample_1.png', cv2.IMREAD_GRAYSCALE)
# detect and read text
read_lines = read_page(img,
DetectorConfig(height=200, enlarge=1),
line_clustering_config=LineClusteringConfig(min_words_per_line=2))
# output text
for read_line in read_lines:
print(' '.join(read_word.text for read_word in read_line))If needed further configurations can be made by passing instances of these classes:
DetectorConfig: configure the detector, the height should be roughly 50px per text-lineLineClusteringConfig: configure the line clustering algorithmReaderConfig: configure the reader
- Better documentation of all the features (e.g., how to use a dictionary) - for now please have a look into the demo scripts to learn about the features of this package
- Add special characters like ".", "?", ...
- Optionally, read the whole line instead of single words
- Improve inference speed

