Skip to content

Unstructured not working in deployment environment #4096

@Kushagra0409

Description

@Kushagra0409

I am trying to run unstructured 0.18.11 in a dockerised env, I added the following packages

PyMuPDF==1.26.3
unstructured==0.18.11
langchain-openai==0.1.8
pytesseract==0.3.13
pdfminer.six==20250506
pi-heif=1.0.0
unstructured-inference==1.0.5
pdf2image==1.17.0
unstructured-pytesseract==0.3.15

And the apt-get packages
apt-get install -y tesseract-ocr tesseract-ocr-eng tesseract-ocr-osd libtesseract-dev poppler-utils libmagic-dev

But still I can't get the chunks, it returns output as empty. It works in my local env, but I'm unable to figure out what dependencies are missing for the cloud.

Could someone please help me in what is missing here, like apt get packages, pip packages or env vars. Running in ubuntu container, I use tesseract ocr only.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions