Description
I am getting this weird error where it says Failed to upload document. Please upload an unstructured text document.
I can confirm my api key is valid and has credits
chain-server logs->>>
2024-10-22 03:52:48 WARNING:unstructured:PDF text extraction failed, skip text extraction...
2024-10-22 03:52:48 INFO:unstructured:Processing entire page OCR with tesseract...
2024-10-22 03:53:00 ERROR:example:Failed to ingest document due to exception
2024-10-22 03:53:00 **********************************************************************
2024-10-22 03:53:00 Resource punkt_tab not found.
2024-10-22 03:53:00 Please use the NLTK Downloader to obtain the resource:
2024-10-22 03:53:00
2024-10-22 03:53:00 >>> import nltk
2024-10-22 03:53:00 >>> nltk.download('punkt_tab')
2024-10-22 03:53:00
2024-10-22 03:53:00 For more information see: https://www.nltk.org/data.html
2024-10-22 03:53:00
2024-10-22 03:53:00 Attempted to load tokenizers/punkt_tab/english/
2024-10-22 03:53:00
2024-10-22 03:53:00 Searched in:
2024-10-22 03:53:00 - '/tmp-data/nltk_data/'
2024-10-22 03:53:00 - '/root/nltk_data'
2024-10-22 03:53:00 - '/usr/nltk_data'
2024-10-22 03:53:00 - '/usr/share/nltk_data'
2024-10-22 03:53:00 - '/usr/lib/nltk_data'
2024-10-22 03:53:00 - '/usr/share/nltk_data'
2024-10-22 03:53:00 - '/usr/local/share/nltk_data'
2024-10-22 03:53:00 - '/usr/lib/nltk_data'
2024-10-22 03:53:00 - '/usr/local/lib/nltk_data'
2024-10-22 03:53:00 **********************************************************************
2024-10-22 03:53:00
2024-10-22 03:53:00 ERROR:RetrievalAugmentedGeneration.common.server:Error from POST /documents endpoint. Ingestion of file: /tmp/gradio/92f4570d0bbd4d801f7fbbae0ad13db83f59b1f6518c47156f6cbb0b605472d7/Justin_Silva_Resume.pdf failed with error: Failed to upload document. Please upload an unstructured text document.