Edge AI–Powered OCR for Text Extraction and Form Verification

An Edge AI–driven OCR system that accurately extracts text from both printed and handwritten documents, auto-fills digital forms, and verifies the extracted data — all while running fully offline.
Optimized for edge and low-resource devices, this solution ensures accessibility and performance even in rural or low-connectivity areas, enabling robust and reliable document automation anywhere.

Full Walkthrough of our solution

Problem Statement

Manual text extraction and verification from documents such as ID cards, certificates, and forms is slow, error-prone, and labor-intensive.
Traditional OCR systems often rely on cloud APIs, limiting accessibility in offline or rural setups.
Our solution replaces this with an AI-powered pipeline capable of intelligently detecting, recognizing, mapping, and verifying text fields locally, ensuring high accuracy, privacy, and speed — even on entry-level hardware.

Key Features

1.Intelligent Text Extraction

Upload any image or PDF for automated OCR extraction within seconds.
Uses CRAFT for text detection and TrOCR for text recognition (printed + handwritten).
A custom classifier decides dynamically which TrOCR model (printed or handwritten) to use per region.

2. Multilingual and Model-Aware

Automatic language detection ensures the right TrOCR or multilingual model is used.
Supports English, Arabic, Korean, and other supported scripts.

3.Data Verification

Field-by-field comparison between extracted data and submitted values.
Each field receives:
- match status (MATCH / MISMATCH)
- confidence score (0–1 scale)
- status flag for quick review.

4. Annotated Visual Feedback

Bounding boxes drawn on recognized text regions.
Each box labeled with recognized text and confidence level.
Returned as a base64-encoded annotated image for instant display.

5. Multi-Page and Live Input Support

Handles multi-page PDFs in sequence.
Supports real-time OCR from camera feeds.

6. Offline Intelligent Mapping

Uses Gemma2:2B via Ollama for local, offline mapping from OCR text → structured fields (e.g. Name, DOB, ID No.).
Ensures privacy, no external API calls.

7. Edge & Offline Optimized

Runs seamlessly on standard laptops or edge devices without GPU.
Fully Dockerized for easy deployment and portability.

Tech Stack

Backend

Component	Description
Framework	FastAPI (Python 3.10+)
OCR Detection	CRAFT – Text detection network
OCR Recognition	TrOCR – Transformer-based OCR (Printed + Handwritten)
LLM Parser	Gemma2 (via Ollama) – Offline intelligent field extraction
Verification	SequenceMatcher – String similarity for accuracy scoring

Frontend

Component	Description
Interface	HTML + CSS + JS
Visualization	Confidence overlay and annotation viewer
Hosting	Static via FastAPI `/demo/index.html` endpoint or any web server

DevOps & Deployment

Tool	Purpose
Docker	Full backend containerization
DockerHub	Image repository for deployment
Jenkins	CI/CD automation for build + push pipeline
Render / Vercel	Hosting for backend (FastAPI) and UI

System Architecture

Getting Started

Prerequisites

Python 3.10+
Docker (optional) for container deployment
Ollama installed locally (for Gemma2)
Git

Installation

git clone https://github.com/gouravanirudh05/OCR_Recognition
cd OCR_Recognition
python3 -m venv venv
source venv/bin/activate   
pip install -r requirements.txt

Running the App

Option A – Local (with Ollama running)

ollama serve
ollama pull gemma2:2b
uvicorn app.main:app --reload

Option B – Docker

docker build -t ocr_recognition:latest .
docker run -p 8000:8000 ocr_recognition:latest

Access at: http://127.0.0.1:8000

API Documentation

1.OCR Extraction API

POST /ocr_form

Field	Type	Description
`file`	File	Image or PDF
`fields`	String	Comma-separated field names
`lang`	String	Optional (default: auto)

curl -X POST "http://localhost:8000/ocr_form" \
  -F "file=@document.pdf" \
  -F "fields=Name,Date,Signature" \
  -F "lang=english"

Response Example:

{
  "pages": 3,
  "requested_fields": ["Name", "Date", "Signature"],
  "recognized_texts": ["all", "text", "from", "all", "pages"],
  "structured_output": {
    "Name": "Jane Smith",
    "Date": "2024-12-13",
    "Signature": "Jane Smith"
  },
  "annotated_images": [
    {
      "page": 0,
      "image": "base64_encoded_string"
    },
    {
      "page": 1,
      "image": "base64_encoded_string"
    }
  ],
  "detections": [
    {
      "page": 0,
      "detections": [
        {
          "box": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
          "text": "detected text",
          "confidence": 0.92
        }
      ]
    }
  ]
}

2. Data Verification API

POST /verify_form

Field	Type	Description
`file`	File	Input document
`form_data`	JSON	Fields submitted by user
`lang`	String	Optional language

curl -X POST "http://localhost:8000/verify_form" \
  -F "file=@signed_form.pdf" \
  -F 'form_data={"Name":"John Doe","Email":"john@email.com"}' \
  -F "lang=english"

Response Example:

{
  "fields_verified": 2,
  "verification_results": {
    "Name": {
      "submitted_value": "John Doe",
      "extracted_value": "John Doe",
      "match": true,
      "confidence": 1.0,
      "status": "MATCH"
    },
    "Email": {
      "submitted_value": "john@email.com",
      "extracted_value": "john@gmail.com",
      "match": false,
      "confidence": 0.82,
      "status": "MISMATCH"
    }
  }
}

Challenges & Solutions

Challenge	Solution
Printed vs Handwritten OCR	Integrated a classifier to choose between TrOCR models dynamically.
Blurred / Tilted Images	Added adaptive preprocessing (contrast enhancement, deskew).
Offline Execution	Embedded Ollama for local LLM inference.
Low-end Hardware Support	Reduced dependency footprint, optimized model loading.

CI/CD Pipeline and MOSIP Alignment

Continuous Integration / Continuous Deployment (CI/CD)

The entire system is integrated into a Jenkins-based CI/CD pipeline:

Code Push to GitHub triggers Jenkins.
Jenkinsfile automates:
- Building Docker image (ocr_recognition:latest).
- Running linting & unit tests.
- Logging in to DockerHub.
- Pushing the latest image (gouravanirudh/ocr_recognition:latest).
Deployment stage automatically updates the backend on Render or any edge node.

This ensures:

Consistent builds across all environments.
Rapid deployment and rollback support.
Easy scaling to multiple offline or semi-connected nodes.

Alignment with MOSIP Goals

This solution directly aligns with MOSIP’s mission of enabling inclusive, secure, and privacy-preserving digital identity ecosystems, by:

Ensuring offline-first, edge-ready infrastructure for identity and document processing.
Supporting low-resource environments — crucial for rural digital adoption.
Providing open, interoperable AI modules deployable across MOSIP-integrated systems.
Upholding data sovereignty & privacy, as all OCR + LLM inference happens locally without cloud dependence.

In essence, this project embodies MOSIP’s core philosophy: “Digital inclusion through open, secure, and accessible technology.”

Contributors

Gourav Anirudh
Pramatha Rao

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
FCN		FCN
app		app
assets		assets
craft_text_detector		craft_text_detector
extension		extension
images		images
static		static
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
README.md		README.md
enhanced_preprocessing.py		enhanced_preprocessing.py
requirements.txt		requirements.txt
requirements1.txt		requirements1.txt

Folders and files

Latest commit

History

Repository files navigation

Edge AI–Powered OCR for Text Extraction and Form Verification

Full Walkthrough of our solution

Table of Contents

Problem Statement

Key Features

1.Intelligent Text Extraction

2. Multilingual and Model-Aware

3.Data Verification

4. Annotated Visual Feedback

5. Multi-Page and Live Input Support

6. Offline Intelligent Mapping

7. Edge & Offline Optimized

Tech Stack

Backend

Frontend

DevOps & Deployment

System Architecture

Getting Started

Prerequisites

Installation

Running the App

Option A – Local (with Ollama running)

Option B – Docker

API Documentation

1.OCR Extraction API

2. Data Verification API

Challenges & Solutions

CI/CD Pipeline and MOSIP Alignment

Continuous Integration / Continuous Deployment (CI/CD)

Alignment with MOSIP Goals

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages