Skip to content

Extract Alt Text and Handwritten Text from Images Using ML Model #9

Open
@rithulkamesh

Description

@rithulkamesh

We're building a smarter, memory-efficient document processing pipeline with the following goals:

  • Extract alt text and handwritten content from images in documents using lightweight ML models.
  • Use lighter vision-language models like Janus-1.3B, Mistral-small, or Gemma 3B (quantized where possible).
  • Integrate QLoRA fine-tuning if a suitable dataset is available.
  • Explore reasoning-based summarization pipelines:
    • Generate a full summary from PDFs or scanned docs using AI (0-loss goal).
    • Pass entire summaries directly into lightweight models for context handling.

Pipeline Goals:

  • Alt text extraction using image captioning models.
  • Handwriting extraction via OCR/HWR (e.g., TrOCR or PaddleOCR).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions