The user can open the http url print by the following command in a browser.
- Please check the http url for the detailed api usage!!!
- Please check the http url for the detailed api usage!!!
- Please check the http url for the detailed api usage!!!
lmdeploy serve api_server ./workspace --server_name 0.0.0.0 --server_port ${server_port} --instance_num 64 --tp 1We provide some RESTful APIs. Three of them are in OpenAI format.
- /v1/chat/completions
- /v1/models
- /v1/completions
However, we recommend users try
our own api /v1/chat/interactive which provides more arguments for users to modify. The performance is comparatively better.
Note please, if you want to launch multiple requests, you'd better set different session_id for both
/v1/chat/completions and /v1/chat/interactive apis. Or, we will set them random values.
We have integrated the client-side functionalities of these services into the APIClient class. Below are some examples demonstrating how to invoke the api_server service on the client side.
If you want to use the /v1/chat/completions endpoint, you can try the following code:
from lmdeploy.serve.openai.api_client import APIClient
api_client = APIClient('http://{server_ip}:{server_port}')
model_name = api_client.available_models[0]
messages = [{"role": "user", "content": "Say this is a test!"}]
for item in api_client.chat_completions_v1(model=model_name, messages=messages):
print(item)For the /v1/completions endpoint. If you want to use the /v1/completions endpoint, you can try:
from lmdeploy.serve.openai.api_client import APIClient
api_client = APIClient('http://{server_ip}:{server_port}')
model_name = api_client.available_models[0]
for item in api_client.completions_v1(model=model_name, prompt='hi'):
print(item)Lmdeploy supports maintaining session histories on the server for /v1/chat/interactive api. We disable the
feature by default.
- On interactive mode, the chat history is kept on the server. In a multiple rounds of conversation, you should set
interactive_mode = Trueand the samesession_id(can't be -1, it's the default number) to/v1/chat/interactivefor requests. - On normal mode, no chat history is kept on the server.
The interactive mode can be controlled by the interactive_mode boolean parameter. The following is an example of normal mode. If you want to experience the interactive mode, simply pass in interactive_mode=True.
from lmdeploy.serve.openai.api_client import APIClient
api_client = APIClient('http://{server_ip}:{server_port}')
for item in api_client.generate(prompt='hi'):
print(item)May use openapi-generator-cli to convert http://{server_ip}:{server_port}/openapi.json to java/rust/golang client.
Here is an example:
$ docker run -it --rm -v ${PWD}:/local openapitools/openapi-generator-cli generate -i /local/openapi.json -g rust -o /local/rust
$ ls rust/*
rust/Cargo.toml rust/git_push.sh rust/README.md
rust/docs:
ChatCompletionRequest.md EmbeddingsRequest.md HttpValidationError.md LocationInner.md Prompt.md
DefaultApi.md GenerateRequest.md Input.md Messages.md ValidationError.md
rust/src:
apis lib.rs modelscURL is a tool for observing the output of the api.
List Models:
curl http://{server_ip}:{server_port}/v1/modelsInteractive Chat:
curl http://{server_ip}:{server_port}/v1/chat/interactive \
-H "Content-Type: application/json" \
-d '{
"prompt": "Hello! How are you?",
"session_id": 1,
"interactive_mode": true
}'Chat Completions:
curl http://{server_ip}:{server_port}/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "internlm-chat-7b",
"messages": [{"role": "user", "content": "Hello! How are you?"}]
}'Text Completions:
curl http://{server_ip}:{server_port}/v1/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "llama",
"prompt": "two steps to build a house:"
}'There is a client script for restful api server.
# restful_api_url is what printed in api_server.py, e.g. http://localhost:23333
lmdeploy serve api_client api_server_urlYou can also test restful-api through webui.
# api_server_url is what printed in api_server.py, e.g. http://localhost:23333
# server_ip and server_port here are for gradio ui
# example: lmdeploy serve gradio http://localhost:23333 --server_name localhost --server_port 6006
lmdeploy serve gradio api_server_url --server_name ${gradio_ui_ip} --server_port ${gradio_ui_port}-
When user got
"finish_reason":"length", it means the session is too long to be continued. The session length can be modified by passing--session_lento api_server. -
When OOM appeared at the server side, please reduce the number of
instance_numwhen lanching the service. -
When the request with the same
session_idto/v1/chat/interactivegot a empty return value and a negativetokens, please consider settinginteractive_mode=falseto restart the session. -
The
/v1/chat/interactiveapi disables engaging in multiple rounds of conversation by default. The input argumentpromptconsists of either single strings or entire chat histories.