Skip to content

subratred/OpticalCharacterRecognition

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpticalCharacterRecognition

Using Tesseract, an open source library for performing Optical Character Recognition in Python.

How to use :

This repository contains 2 python scripts for 2 different use cases:

  • For performing OCR on images
  • For performing OCR on PDFs

Run the respective python scripts for respective use-cases.

Dependencies required :

  • Tesseract Core Library
  • PyTesseract (Python wrapper for Tesseract Core)
  • Pillow (For Image Processing)
  • ImageMagick
  • wand(Python binding for ImageMagick)

Tesseract was originally written in C++ and uses an LSTM Network behind the scenes, for more reading and installation guide, you can check out this very helpful blog post. This will explain you the essential stuff. I have also extended this for PDFs to make it more useful for real-world use-case.

About

Using Tesseract, an open source library for performing Optical Character Recognition in Python.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 56.6%
  • JavaScript 25.1%
  • HTML 18.3%