Open
Description
Environment
- Tesseract Version: 4.1.1 and latest master
- Platform:
Linux gentoo-x230 5.6.18-grsec #2 SMP Tue Jul 7 18:17:17 CEST 2020 x86_64 Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz GenuineIntel GNU/Linux
Current Behavior:
On large images, Tesseract fails like this:
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Warning: Invalid resolution 0 dpi. Using 70 instead.
Image too large: (2559, 37192)
Error during processing.
This is the image in question (large image!):
https://archive.org/download/manualzz-id-765154/765154_jp2.zip/765154_jp2/765154_0017.jp2
Expected Behavior:
Tesseract would process the image without erroring out.
Comments
I don't know where exactly this limitation comes from. I see that the specific error comes from the Otsu thresholding code, but I am not sure if the limit of 2^15
(INT16_MAX) limit is actually also a leptonica maximum size limit.
Perhaps this is not considered a problem and the bug can be closed, but as it stands I am not sure what the best practice would be to OCR the image linked above. Perhaps the limit can be raised some, to 2^16
?