Open
Description
We're building a smarter, memory-efficient document processing pipeline with the following goals:
- Extract alt text and handwritten content from images in documents using lightweight ML models.
- Use lighter vision-language models like
Janus-1.3B
,Mistral-small
, orGemma 3B
(quantized where possible). - Integrate QLoRA fine-tuning if a suitable dataset is available.
- Explore reasoning-based summarization pipelines:
- Generate a full summary from PDFs or scanned docs using AI (0-loss goal).
- Pass entire summaries directly into lightweight models for context handling.
Pipeline Goals:
- Alt text extraction using image captioning models.
- Handwriting extraction via OCR/HWR (e.g., TrOCR or PaddleOCR).
Metadata
Metadata
Assignees
Labels
No labels
Projects
Status
Backlog