Anyway to invoke LLMs hosted behind an endpoint? #343

rohangpatil · 2023-08-15T02:05:39Z

rohangpatil
Aug 15, 2023

Hello,

Very interesting project, controlling LLM output is really crucial for production usecases. That said - is there a way to invoke LLMs that are hosted virtually using REST apis?

alexmadey-oc · 2023-08-17T18:17:30Z

alexmadey-oc
Aug 17, 2023

I know there are some libraries like llama-cpp-python that allow you to host a model behind a server that has the same API spec as openAI. I You can use the openai connector in guidance, you would just need to change the url the openai client is pointing at. I have done this before using the vanilla openai client, not sure how to do it in guidance.

0 replies

emrekiciman · 2023-08-21T15:28:24Z

emrekiciman
Aug 21, 2023

Hi @rohangpatil ,

If your endpoint is using hosted via an OpenAI-compatible API, as @alexmadey-oc suggests, you can set the endpoint when you call the OpenAI model's constructor:

0 replies

zhaozhiming · 2023-09-26T08:51:19Z

zhaozhiming
Sep 26, 2023

Not only should the endpoint be modified, but also the rest_call needs to be configured, and in some cases, even a bit of code hacking may be necessary.

1 reply

niuhuluzhihao Jan 24, 2024

Is there a specific configuration example?

j4k0bk · 2023-10-03T07:43:50Z

j4k0bk
Oct 3, 2023

Need a way to provide an LLM that invokes an API endpoint (not limited to OpenAI) - such as Hugging face inference API?

0 replies

slundberg · 2024-02-01T01:04:41Z

slundberg
Feb 1, 2024
Maintainer

Note that this discussion is a bit older (before v0.1), but we now have a preliminary client server support using guidance.Server and guidance.models.Remote (docs still coming).

1 reply

SuperMasterBlasterLaser Feb 3, 2024

That would be great. Currently launching AWQ 7B model on this lib somehow takes all 24GB VRAM and just hangs when context size is more than 2K tokens, while Text Generation Webui only uses 7GB of VRAM and works with big contexts without any problems.

ebudmada · 2024-02-19T02:36:46Z

ebudmada
Feb 19, 2024

I managed to make it work with llama.cpp. Lama-cpp-python binding are full of bug and won't load on gpu for me. You need to load the llama.cpp server and also the python api_like_oai.py included in llama.cpp github /examples/server directory. This will be a copy of openai api running on your machine, on port 8081 (default)
phind codellama is working for me
./server -m /home/ebudmada/llama.cpp/phind-codellama-34b-v2.Q8_0.gguf -c 4096 -ngl 49
python api_like_OAI.py

load a python program to call the api like this:

import guidance
from guidance import models, gen, select
import time
lm = ''
llama2 = models.OpenAI('text-curie-001',base_url = "http://localhost:8081", api_key='fake_key') 

@guidance
def character_maker(lm, id, description, valid_weapons):
    lm += f"""\
    The following is a character profile for an RPG game in JSON format.
    ```json
    {{
        "id": "{id}",
        "description": "{description}",
        "name": "{gen('name', stop='"')}",
        "age": {gen('age', regex='[0-9]+', stop=',')},
        "armor": "{select(options=['leather', 'chainmail', 'plate'], name='armor')}",
        "weapon": "{select(options=valid_weapons, name='weapon')}",
        "class": "{gen('class', stop='"')}",
        "mantra": "{gen('mantra', stop='"')}",
        "strength": {gen('strength', regex='[0-9]+', stop=',')},
        "items": ["{gen('item', list_append=True, stop='"')}", "{gen('item', list_append=True, stop='"')}", "{gen('item', list_append=True, stop='"')}"]
    }}```"""
    return lm
lm = llama2 + character_maker(1, 'A nimble fighter', ['axe', 'sword', 'bow'])
print(lm)

1 reply

benpipz Sep 4, 2024

thanks for your example buddy,
it seems when calling:
llama2 = models.OpenAI('text-curie-001',base_url = "http://localhost:8081", api_key='fake_key')
with my arguments, an exception is thrown:
could not automaticlly map Phi3_medoum to a tokeniser. please use 'tiktoen.get_encding' to explicitly get the tokeniser you expect.
why do some client require a tokeniser?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anyway to invoke LLMs hosted behind an endpoint? #343

{{title}}

Replies: 5 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Anyway to invoke LLMs hosted behind an endpoint? #343

Replies: 5 comments · 3 replies

slundberg Feb 1, 2024 Maintainer

Replies: 5 comments 3 replies

slundberg
Feb 1, 2024
Maintainer