Anyway to invoke LLMs hosted behind an endpoint? #343
Replies: 5 comments 3 replies
-
I know there are some libraries like llama-cpp-python that allow you to host a model behind a server that has the same API spec as openAI. I You can use the openai connector in guidance, you would just need to change the url the openai client is pointing at. I have done this before using the vanilla openai client, not sure how to do it in guidance. |
Beta Was this translation helpful? Give feedback.
-
Hi @rohangpatil , If your endpoint is using hosted via an OpenAI-compatible API, as @alexmadey-oc suggests, you can set the endpoint when you call the OpenAI model's constructor: |
Beta Was this translation helpful? Give feedback.
-
Not only should the |
Beta Was this translation helpful? Give feedback.
-
Need a way to provide an LLM that invokes an API endpoint (not limited to OpenAI) - such as Hugging face inference API? |
Beta Was this translation helpful? Give feedback.
-
Note that this discussion is a bit older (before |
Beta Was this translation helpful? Give feedback.
-
I managed to make it work with llama.cpp. Lama-cpp-python binding are full of bug and won't load on gpu for me. You need to load the llama.cpp server and also the python api_like_oai.py included in llama.cpp github /examples/server directory. This will be a copy of openai api running on your machine, on port 8081 (default) load a python program to call the api like this:
|
Beta Was this translation helpful? Give feedback.
-
Hello,
Very interesting project, controlling LLM output is really crucial for production usecases. That said - is there a way to invoke LLMs that are hosted virtually using REST apis?
Beta Was this translation helpful? Give feedback.
All reactions