Skip to content

Maximum supported image size #3184

Open
@MerlijnWajer

Description

@MerlijnWajer

Environment

  • Tesseract Version: 4.1.1 and latest master
  • Platform: Linux gentoo-x230 5.6.18-grsec #2 SMP Tue Jul 7 18:17:17 CEST 2020 x86_64 Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz GenuineIntel GNU/Linux

Current Behavior:

On large images, Tesseract fails like this:

Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Warning: Invalid resolution 0 dpi. Using 70 instead.
Image too large: (2559, 37192)
Error during processing.

This is the image in question (large image!):
https://archive.org/download/manualzz-id-765154/765154_jp2.zip/765154_jp2/765154_0017.jp2

Expected Behavior:

Tesseract would process the image without erroring out.

Comments

I don't know where exactly this limitation comes from. I see that the specific error comes from the Otsu thresholding code, but I am not sure if the limit of 2^15 (INT16_MAX) limit is actually also a leptonica maximum size limit.

Perhaps this is not considered a problem and the bug can be closed, but as it stands I am not sure what the best practice would be to OCR the image linked above. Perhaps the limit can be raised some, to 2^16 ?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions