Add Option to Skip OCR

Hi,

First of all, thank you for this amazing tool—it’s been incredibly helpful!

I have a feature request regarding the OCR functionality. It would be great to have a .env setting that allows users to skip OCR processing for PDFs that have already undergone OCR. For example, if I feed the software a folder containing PDFs with OCR already applied, I’d like the tool to be faster, and only rename and tag those files, without checking or reprocessing them with OCR a second time.

I hope this would save time and resources, especially for users handling large volumes of pre-OCRed documents. Is this something that could be implemented?

I noticed that your source code [seems to check for OCR](https://github.com/ptmrio/autorename-pdf/blob/e42eb0e24686c94f430ca21eac59b52b738e0ca0/pdf_processor.py#L75), but during execution, this check appears to be skipped.
![grafik](https://github.com/user-attachments/assets/e0f845e5-07a1-41d2-b078-4c09d90c8c59)

Perhaps adding an option to completely disable OCR processing could help address this and provide more flexibility for users handling pre-OCRed documents.

Thanks for considering this request, and let me know if I can provide more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Option to Skip OCR #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Add Option to Skip OCR #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions