Skip to content

Latest commit

 

History

History
13 lines (8 loc) · 674 Bytes

README.md

File metadata and controls

13 lines (8 loc) · 674 Bytes

TOCR-Pdf-Img-to-Txt

Utilizes the capabilities of Tesseract OCR and improves quality with preprocessing of the images using opencv and other image tools, libraries.

Setup -

  1. Install tesseract and replace the executable file path in the code.

  2. Run the code.

  3. Provide address of the folder which holds the images.

  4. Depending on need, you can get text file from images or a pdf with text.

Disclaimer : Always remember that software based OCR is often not as accurate as done with other instruments. This is an alternate in case you have to write an entire document into word or pdf and want to save time or you want to query the document after converting to txt.