OCR and Keyword Search Web Application

This web application performs Optical Character Recognition (OCR) on uploaded images containing text in both Hindi and English, and provides a keyword search functionality.

Setup

Install the required dependencies: pip install -r requirements.txt This contains crucial libraries like transformers, gradio, pillow, tesseract, pytesseract
Install Tesseract OCR: For Windows, Download and install from https://github.com/UB-Mannheim/tesseract/wiki
Update the tesseract path in script (this was not needed while deploying to Hugging Face Space but had to use it while running it locally on my machine)

Running Locally

To run the application locally: python app.py

Deployment

To deploy on Hugging Face Spaces:

Created a new Space on Hugging Face.
While creating space, I set the Space SDK to Gradio
Upload the app.py file and created requirements.txt and packages.txt for libraries and packages respectively

Usage

Upload an image containing Hindi and English texts.
Enter a keyword to search within the extracted text.
The application will display the extracted text and search results.

Note: The OCR accuracy may vary depending on the image quality. Might get incorrect readings if the image has hazy words.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
app.py		app.py
iit-r tesseract.docx		iit-r tesseract.docx
packages.txt		packages.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OCR and Keyword Search Web Application

Setup

Running Locally

Deployment

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ayanika02/IIT-Roorkie-Tesseract-OCR

Folders and files

Latest commit

History

Repository files navigation

OCR and Keyword Search Web Application

Setup

Running Locally

Deployment

Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages