Skip to content

Minor ocr_interface.py Error Handling Improvement #3276

Open
@AscendingGrass

Description

@AscendingGrass

In the ocr_interface.py file, it would be nice if the code handles the importlib.import_module(module_name) in the get_instance(...) function

@staticmethod
@functools.lru_cache(maxsize=None)
def get_instance(ocr_agent_module: str) -> "OCRAgent":
    module_name, class_name = ocr_agent_module.rsplit(".", 1)
    if module_name in OCR_AGENT_MODULES_WHITELIST:
        module = importlib.import_module(module_name)
        loaded_class = getattr(module, class_name)
        return loaded_class()
    else:
        raise ValueError(
            f"Environment variable OCR_AGENT module name {module_name}, must be set to a"
            f" whitelisted module part of {OCR_AGENT_MODULES_WHITELIST}.",
        )

I was so confused when I keep getting this error from the get_agent(...) function

ValueError: Environment variable OCR_AGENT must be set to an existing OCR agent module, not unstructured.partition.utils.ocr_models.tesseract_ocr.OCRAgentTesseract.

when after hours of digging it turns out I just haven't installed pandas lol🗿

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestocrRelated to optical character recognition (OCR).

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions