-
-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Hi,
First of all, thank you for this amazing tool—it’s been incredibly helpful!
I have a feature request regarding the OCR functionality. It would be great to have a .env setting that allows users to skip OCR processing for PDFs that have already undergone OCR. For example, if I feed the software a folder containing PDFs with OCR already applied, I’d like the tool to be faster, and only rename and tag those files, without checking or reprocessing them with OCR a second time.
I hope this would save time and resources, especially for users handling large volumes of pre-OCRed documents. Is this something that could be implemented?
I noticed that your source code seems to check for OCR, but during execution, this check appears to be skipped.

Perhaps adding an option to completely disable OCR processing could help address this and provide more flexibility for users handling pre-OCRed documents.
Thanks for considering this request, and let me know if I can provide more details.