Skip to content

Add Option to Skip OCR #4

@rawora-rg

Description

@rawora-rg

Hi,

First of all, thank you for this amazing tool—it’s been incredibly helpful!

I have a feature request regarding the OCR functionality. It would be great to have a .env setting that allows users to skip OCR processing for PDFs that have already undergone OCR. For example, if I feed the software a folder containing PDFs with OCR already applied, I’d like the tool to be faster, and only rename and tag those files, without checking or reprocessing them with OCR a second time.

I hope this would save time and resources, especially for users handling large volumes of pre-OCRed documents. Is this something that could be implemented?

I noticed that your source code seems to check for OCR, but during execution, this check appears to be skipped.
grafik

Perhaps adding an option to completely disable OCR processing could help address this and provide more flexibility for users handling pre-OCRed documents.

Thanks for considering this request, and let me know if I can provide more details.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions