Skip to content

TypeError: get_model() got an unexpected keyword argument 'ocr_languages' when using strategy=hi_res #329

Open
@samqi

Description

@samqi

System Info : python3.10
Environment Details : Google Collab

Error encountered when trying to use elements with pdf using hi_res strategy :
elements = partition_pdf("myPDFfile.pdf", strategy="hi_res")

Error output:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-58-69a6eb81afc1>](https://localhost:8080/#) in <cell line: 1>()
----> 1 elements = partition_pdf("myPDFfile.pdf", strategy="hi_res")

5 frames
[/usr/local/lib/python3.10/dist-packages/unstructured_inference/inference/layout.py](https://localhost:8080/#) in process_file_with_model(filename, model_name, is_image, fixed_layouts, pdf_image_dpi, **kwargs)
    375     model_name."""
    376 
--> 377     model = get_model(model_name, **kwargs)
    378     if isinstance(model, UnstructuredObjectDetectionModel):
    379         detection_model = model

TypeError: get_model() got an unexpected keyword argument 'ocr_languages'

what is this ocr_languages argument??? I am unable to use the table mode preservation with unstructured at all due to this Appreciate any assistance.

p/s: I have already tried loading some older versions of unstructured and unstructured_inference as mention in other gh repo issue but no difference for me.

I followed the blog post, but got stuck from there onwards despite consulting all relevant docs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions