Skip to content

Running OCR on embedded images of PDF using Poppler pdfimages or ImageMapping instead of whole pdf pages converted to png? #84

@T-Dane

Description

@T-Dane

Requesting a version of PDF OCR that only runs tesseract OCR on embedded images in PDF instead of capturing the whole page of the PDF.

A lot of my professors use powerpoints converted to PDF, the text is already text, while the screen-grabs they use lack this and could benefit from OCR.

I believe this could save time for others as well as not all PDF documents are purely images and often a combination.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions