-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Description
I am trying to run unstructured 0.18.11 in a dockerised env, I added the following packages
PyMuPDF==1.26.3
unstructured==0.18.11
langchain-openai==0.1.8
pytesseract==0.3.13
pdfminer.six==20250506
pi-heif=1.0.0
unstructured-inference==1.0.5
pdf2image==1.17.0
unstructured-pytesseract==0.3.15
And the apt-get packages
apt-get install -y tesseract-ocr tesseract-ocr-eng tesseract-ocr-osd libtesseract-dev poppler-utils libmagic-dev
But still I can't get the chunks, it returns output as empty. It works in my local env, but I'm unable to figure out what dependencies are missing for the cloud.
Could someone please help me in what is missing here, like apt get packages, pip packages or env vars. Running in ubuntu container, I use tesseract ocr only.
Metadata
Metadata
Assignees
Labels
No labels