OpticalCharacterRecognition

Using Tesseract, an open source library for performing Optical Character Recognition in Python.

How to use :

This repository contains 2 python scripts for 2 different use cases:

For performing OCR on images
For performing OCR on PDFs

Run the respective python scripts for respective use-cases.

Dependencies required :

Tesseract Core Library
PyTesseract (Python wrapper for Tesseract Core)
Pillow (For Image Processing)
ImageMagick
wand(Python binding for ImageMagick)

Tesseract was originally written in C++ and uses an LSTM Network behind the scenes, for more reading and installation guide, you can check out this very helpful blog post. This will explain you the essential stuff. I have also extended this for PDFs to make it more useful for real-world use-case.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
cli		cli
resources		resources
webapp		webapp
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpticalCharacterRecognition

How to use :

Dependencies required :

About

Uh oh!

Releases

Packages

Languages

subratred/OpticalCharacterRecognition

Folders and files

Latest commit

History

Repository files navigation

OpticalCharacterRecognition

How to use :

Dependencies required :

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages