TOCR-Pdf-Img-to-Txt

Utilizes the capabilities of Tesseract OCR and improves quality with preprocessing of the images using opencv and other image tools, libraries.

Setup -

Install tesseract and replace the executable file path in the code.
Run the code.
Provide address of the folder which holds the images.
Depending on need, you can get text file from images or a pdf with text.

Disclaimer : Always remember that software based OCR is often not as accurate as done with other instruments. This is an alternate in case you have to write an entire document into word or pdf and want to save time or you want to query the document after converting to txt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!