Skip to content

gouravanirudh05/MosipDecode-OCR

Repository files navigation

Edge AI–Powered OCR for Text Extraction and Form Verification

An Edge AI–driven OCR system that accurately extracts text from both printed and handwritten documents, auto-fills digital forms, and verifies the extracted data — all while running fully offline.
Optimized for edge and low-resource devices, this solution ensures accessibility and performance even in rural or low-connectivity areas, enabling robust and reliable document automation anywhere.


Full Walkthrough of our solution

Watch the video


Table of Contents


Problem Statement

Manual text extraction and verification from documents such as ID cards, certificates, and forms is slow, error-prone, and labor-intensive.
Traditional OCR systems often rely on cloud APIs, limiting accessibility in offline or rural setups.
Our solution replaces this with an AI-powered pipeline capable of intelligently detecting, recognizing, mapping, and verifying text fields locally, ensuring high accuracy, privacy, and speed — even on entry-level hardware.


Key Features

1.Intelligent Text Extraction

  • Upload any image or PDF for automated OCR extraction within seconds.
  • Uses CRAFT for text detection and TrOCR for text recognition (printed + handwritten).
  • A custom classifier decides dynamically which TrOCR model (printed or handwritten) to use per region.

2. Multilingual and Model-Aware

  • Automatic language detection ensures the right TrOCR or multilingual model is used.
  • Supports English, Arabic, Korean, and other supported scripts.

3.Data Verification

  • Field-by-field comparison between extracted data and submitted values.
  • Each field receives:
    • match status (MATCH / MISMATCH)
    • confidence score (0–1 scale)
    • status flag for quick review.

4. Annotated Visual Feedback

  • Bounding boxes drawn on recognized text regions.
  • Each box labeled with recognized text and confidence level.
  • Returned as a base64-encoded annotated image for instant display.

5. Multi-Page and Live Input Support

  • Handles multi-page PDFs in sequence.
  • Supports real-time OCR from camera feeds.

6. Offline Intelligent Mapping

  • Uses Gemma2:2B via Ollama for local, offline mapping from OCR text → structured fields (e.g. Name, DOB, ID No.).
  • Ensures privacy, no external API calls.

7. Edge & Offline Optimized

  • Runs seamlessly on standard laptops or edge devices without GPU.
  • Fully Dockerized for easy deployment and portability.

Tech Stack

Backend

Component Description
Framework FastAPI (Python 3.10+)
OCR Detection CRAFT – Text detection network
OCR Recognition TrOCR – Transformer-based OCR (Printed + Handwritten)
LLM Parser Gemma2 (via Ollama) – Offline intelligent field extraction
Verification SequenceMatcher – String similarity for accuracy scoring

Frontend

Component Description
Interface HTML + CSS + JS
Visualization Confidence overlay and annotation viewer
Hosting Static via FastAPI /demo/index.html endpoint or any web server

DevOps & Deployment

Tool Purpose
Docker Full backend containerization
DockerHub Image repository for deployment
Jenkins CI/CD automation for build + push pipeline
Render / Vercel Hosting for backend (FastAPI) and UI

System Architecture

System Architecture

Getting Started

Prerequisites

  • Python 3.10+
  • Docker (optional) for container deployment
  • Ollama installed locally (for Gemma2)
  • Git

Installation

git clone https://github.com/gouravanirudh05/OCR_Recognition
cd OCR_Recognition
python3 -m venv venv
source venv/bin/activate   
pip install -r requirements.txt

Running the App

Option A – Local (with Ollama running)

ollama serve
ollama pull gemma2:2b
uvicorn app.main:app --reload

Option B – Docker

docker build -t ocr_recognition:latest .
docker run -p 8000:8000 ocr_recognition:latest

Access at: http://127.0.0.1:8000


API Documentation

1.OCR Extraction API

POST /ocr_form

Field Type Description
file File Image or PDF
fields String Comma-separated field names
lang String Optional (default: auto)
curl -X POST "http://localhost:8000/ocr_form" \
  -F "file=@document.pdf" \
  -F "fields=Name,Date,Signature" \
  -F "lang=english"

Response Example:

{
  "pages": 3,
  "requested_fields": ["Name", "Date", "Signature"],
  "recognized_texts": ["all", "text", "from", "all", "pages"],
  "structured_output": {
    "Name": "Jane Smith",
    "Date": "2024-12-13",
    "Signature": "Jane Smith"
  },
  "annotated_images": [
    {
      "page": 0,
      "image": "base64_encoded_string"
    },
    {
      "page": 1,
      "image": "base64_encoded_string"
    }
  ],
  "detections": [
    {
      "page": 0,
      "detections": [
        {
          "box": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
          "text": "detected text",
          "confidence": 0.92
        }
      ]
    }
  ]
}

2. Data Verification API

POST /verify_form

Field Type Description
file File Input document
form_data JSON Fields submitted by user
lang String Optional language
curl -X POST "http://localhost:8000/verify_form" \
  -F "file=@signed_form.pdf" \
  -F 'form_data={"Name":"John Doe","Email":"john@email.com"}' \
  -F "lang=english"

Response Example:

{
  "fields_verified": 2,
  "verification_results": {
    "Name": {
      "submitted_value": "John Doe",
      "extracted_value": "John Doe",
      "match": true,
      "confidence": 1.0,
      "status": "MATCH"
    },
    "Email": {
      "submitted_value": "john@email.com",
      "extracted_value": "john@gmail.com",
      "match": false,
      "confidence": 0.82,
      "status": "MISMATCH"
    }
  }
}

Challenges & Solutions

Challenge Solution
Printed vs Handwritten OCR Integrated a classifier to choose between TrOCR models dynamically.
Blurred / Tilted Images Added adaptive preprocessing (contrast enhancement, deskew).
Offline Execution Embedded Ollama for local LLM inference.
Low-end Hardware Support Reduced dependency footprint, optimized model loading.

CI/CD Pipeline and MOSIP Alignment

Continuous Integration / Continuous Deployment (CI/CD)

The entire system is integrated into a Jenkins-based CI/CD pipeline:

  1. Code Push to GitHub triggers Jenkins.
  2. Jenkinsfile automates:
    • Building Docker image (ocr_recognition:latest).
    • Running linting & unit tests.
    • Logging in to DockerHub.
    • Pushing the latest image (gouravanirudh/ocr_recognition:latest).
  3. Deployment stage automatically updates the backend on Render or any edge node.

This ensures:

  • Consistent builds across all environments.
  • Rapid deployment and rollback support.
  • Easy scaling to multiple offline or semi-connected nodes.

Alignment with MOSIP Goals

This solution directly aligns with MOSIP’s mission of enabling inclusive, secure, and privacy-preserving digital identity ecosystems, by:

  • Ensuring offline-first, edge-ready infrastructure for identity and document processing.
  • Supporting low-resource environments — crucial for rural digital adoption.
  • Providing open, interoperable AI modules deployable across MOSIP-integrated systems.
  • Upholding data sovereignty & privacy, as all OCR + LLM inference happens locally without cloud dependence.

In essence, this project embodies MOSIP’s core philosophy: “Digital inclusion through open, secure, and accessible technology.”


Contributors

  • Gourav Anirudh
  • Pramatha Rao

About

Secured 2nd Runner up at Mosip Decode Hackathon of IIT Madras

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors