-
Hi, I am trying to use the Unstructured file converter to read a local pdf file and convert it into Haystack Document. I had originally gotten a free tier from Unstructured, and used docker to use Unstructured locally. This worked fine, until I used up the free part.. Now, Unstructured is not supporting the free tier any longer. So, I'm using the hosted version. I believe the call below is the correct one to make; for the api key, I am using my personal key ID (not the free tier ID!) which I got after creating an Unstructured account. I thought all I needed was the api key but the below code errors out:
When I run this, and provide the api key the error I get is: Converting files to Haystack Documents: 0it [00:00, ?it/s]INFO: Failed to process a request due to connection error - [Errno 11001] getaddrinfo failed. Attempting retry number 1 after sleep. I then thought I need to provide the api url in addition to the api key, but I've tried both the Unstructured Workflow Endpoint and Unstructured Partition Endpoint (from my Unstructured account) urls in turn, and the code is still erroring out. In one case I got: Converting files to Haystack Documents: 0it [00:00, ?it/s]INFO: HTTP Request: GET https://api.unstructuredapp.io/general/docs "HTTP/1.1 200 OK" In the other case, I got: Converting files to Haystack Documents: 0it [00:00, ?it/s]INFO: HTTP Request: GET https://platform.unstructuredapp.io/general/docs "HTTP/1.1 404 Not Found" I would really appreciate your help resolving this so that I can use Unstructured. Thanks in advance. Sanjay |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 3 replies
-
Hey @sanjayc2 sorry to here you have been having issues using the hosted version of Unstructured. I'm not too familiar with the self hosted version, but I did want to point you towards this PR deepset-ai/haystack-core-integrations#1416 to show that our Also to better help with debugging your code could you share the full code snippet you run? So what |
Beta Was this translation helpful? Give feedback.
-
Hi Sebastian @sjrl, Thanks for getting back to me so quickly. The path is the path to the pdf file on my computer, i.e., path = "../KUBIENT,INC_07_02_2020-EX-10.14-MASTER SERVICES AGREEMENT_Part1.pdf". I must have misused the word "hosted", but I was just trying to use the api key (not running docker), as the Haystack instructions in https://docs.haystack.deepset.ai/reference/integrations-unstructured state: #make sure to either set the environment variable UNSTRUCTURED_API_KEY I ran the above docker run command on my machine (which started up a docker container with an Unstructured image, as seen in my docker desktop) and then re-ran the following code snippet, and entered my personal api key at the prompt:
I still get the "getaddrinfo failed" error. FWIW, I am using unstructured-client 0.29.0. I googled the error, and it looks like it arises when it cannot map the hostname to the IP address. To test if the DNS mapping was working, I ran the below in the python interpreter:
which returned: [(<AddressFamily.AF_INET6: 23>, 0, 0, '', ('::1', 8000, 0, 0)), (<AddressFamily.AF_INET: 2>, 0, 0, '', ('127.0.0.1', 8000))] and then if I run the below: it returns: [(<AddressFamily.AF_INET: 2>, 0, 0, '', ('127.0.0.1', 8000))] As I had noted earlier, when I ran the python code by specifying the api key and the Unstructured Partition Endpoint, I had gotten: ERROR: Server responded with 404 - {"detail":"Not Found"} This 404 "detail": Not Found error is exactly what I see when I click on the link to port 8000 provided in the docker desktop (to verify if the port is working). So maybe the issue is with self-hosted Unstructured on my Windows machine. If it helps, my C:\Windows\System32\drivers\etc\hosts file (which looks to be unchanged since 8/24/2024) looks like this: # This is a sample HOSTS file used by Microsoft TCP/IP for Windows. # localhost name resolution is handled within DNS itself. FWIW, I have been able to run OpenSearch in my self-hosted environment (running Haystack's recommended docker command and then the python code to create the OS datastore and run a custom query) in the last few week without a problem. I am not sure how the integration with Unstructured is different from OpenSearch. Hope this helps. Sanjay |
Beta Was this translation helpful? Give feedback.
-
Hi Sebastian, I ran docker and the converter with the url. That does not work either. I get the following:
I included my api_key, but I still get the 503 error. I believe it says on Unstructured website that the api_key is needed from now on. Also, I think the free tier is no longer available. I should add that I don't mind using a paid service (I already gave them my credit card number), bu it should work! Do you think I will have any success asking this question on the Unstructured Slack community? |
Beta Was this translation helpful? Give feedback.
-
Ok, I tried it just with that url (after running the docker run command). It still gives me the same 503 error. Also, per your request, when I run the docker command, it starts up a unstructured-api docker container. I've attached the screenshot. And when I go the port localhost:800 it says {"detail":"Not Found"} there. To be clear, I do not need to run Unstructured locally inside a docker container (in case I did not set unstructured up correctly on my machine). If I have to use the API key, that's ok too. (btw, I don't think the free unstructured will work for me anymore. I had run it before on my computer in the past (a couple months ago) and used up all the free use allowed by unstructured so it stopped running - I don't have the details). |
Beta Was this translation helpful? Give feedback.
-
I created a new API key and, without running Unstructured locally in a docker container, ran the below command (the url is my partition endpoint). It then worked... once! I hav not been able to get it to work again..
|
Beta Was this translation helpful? Give feedback.
I created a new API key and, without running Unstructured locally in a docker container, ran the below command (the url is my partition endpoint). It then worked... once! I hav not been able to get it to work again..
converter = UnstructuredFileConverter(api_url = "https://api.unstructuredapp.io/general/v0/general", api_key = Secret.from_env_var("UNSTRUCTURED_API_KEY", strict=False), progress_bar=True)