This repository was archived by the owner on Jun 14, 2018. It is now read-only.

Description
Hi,
We are using pyocr to detect labels which is only contains alphanumeric chars and digits.
How I can Apply a specific list of the chars to be detected . ?
I try to :
in libtesseract/__init__py
if "label" in builder.tesseract_configs:
tesseract_raw.set_is_label(handle, True)
and in tesseract_raw.py:
def set_is_label(handle, mode):
global g_libtesseract
assert(g_libtesseract)
if mode:
# wl = b"0123456789ABCDEFGHIJKLMNOPRSTUVYZXW"
wl = b"0123456789ABNOPRSTUVYZXW"
else:
wl = b""
g_libtesseract.TessBaseAPISetVariable(
ctypes.c_void_p(handle),
b"tessedit_char_whitelist",
wl
)
Bu I couldn't succeed ?
Is there anyway to do it more simple way, like:
tool.image_to_string(
Image.open("tmp.png"),
lang="eng",
tessedit_char_whitelist = "0123456789ABNOPRSTUVYZXW"
builder=pyocr.builders.LineBoxBuilder()
)
thanks