You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/lemonade/server_spec.md
+82Lines changed: 82 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,7 @@ We are also actively investigating and developing [additional endpoints](#additi
9
9
### OpenAI-Compatible Endpoints
10
10
- POST `/api/v0/chat/completions` - Chat Completions (messages -> completion)
11
11
- POST `/api/v0/completions` - Text Completions (prompt -> completion)
12
+
- POST `api/v0/responses` - Chat Completions (prompt|messages -> event)
12
13
- GET `/api/v0/models` - List models available locally
13
14
14
15
### Additional Endpoints
@@ -65,6 +66,7 @@ Chat Completions API. You provide a list of messages and receive a completion. T
65
66
|`stop`| No | Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. Can be a string or an array of strings. | <sub></sub> |
66
67
|`logprobs`| No | Include log probabilities of the output tokens. If true, returns the log probability of each output token. Defaults to false. | <sub></sub> |
67
68
|`temperature`| No | What sampling temperature to use. | <sub></sub> |
69
+
|`tools`| No | A list of tools the model may call. Only available when `stream` is set to `False`. | <sub></sub> |
68
70
|`max_tokens`| No | An upper bound for the number of tokens that can be generated for a completion. Mutually exclusive with `max_completion_tokens`. This value is now deprecated by OpenAI in favor of `max_completion_tokens`| <sub></sub> |
69
71
|`max_completion_tokens`| No | An upper bound for the number of tokens that can be generated for a completion. Mutually exclusive with `max_tokens`. | <sub></sub> |
70
72
@@ -207,6 +209,86 @@ The following format is used for both streaming and non-streaming responses:
Responses API. You provide an input and receive a response. This API will also load the model if it is not already loaded.
217
+
218
+
#### Parameters
219
+
220
+
| Parameter | Required | Description | Status |
221
+
|-----------|----------|-------------|--------|
222
+
|`input`| Yes | A list of dictionaries or a string input for the model to respond to. | <sub></sub> |
223
+
|`model`| Yes | The model to use for the response. | <sub></sub> |
224
+
|`max_output_tokens`| No | The maximum number of output tokens to generate. | <sub></sub> |
225
+
|`temperature`| No | What sampling temperature to use. | <sub></sub> |
226
+
|`stream`| No | If true, tokens will be sent as they are generated. If false, the response will be sent as a single message once complete. Defaults to false. | <sub></sub> |
227
+
228
+
> Note: The value for `model` is either a [Lemonade Server model name](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_models.md), or a checkpoint that has been pre-loaded using the [load endpoint](#get-apiv0load-status).
229
+
230
+
#### Streaming Events
231
+
232
+
The Responses API uses semantic events for streaming. Each event is typed with a predefined schema, so you can listen for events you care about. Our initial implementation only offers support to:
233
+
-`response.created`
234
+
-`response.output_text.delta`
235
+
-`response.completed`
236
+
237
+
For a full list of event types, see the [API reference for streaming](https://platform.openai.com/docs/api-reference/responses-streaming).
curl -X POST http://localhost:8000/api/v0/responses \
258
+
-H "Content-Type: application/json" \
259
+
-d '{
260
+
"model": "Llama-3.2-1B-Instruct-Hybrid",
261
+
"input": "What is the population of Paris?",
262
+
"stream": false
263
+
}'
264
+
```
265
+
266
+
267
+
#### Response format
268
+
269
+
For non-streaming responses:
270
+
271
+
```json
272
+
{
273
+
"id": "0",
274
+
"created_at": 1746225832.0,
275
+
"model": "Llama-3.2-1B-Instruct-Hybrid",
276
+
"object": "response",
277
+
"output": [{
278
+
"id": "0",
279
+
"content": [{
280
+
"annotations": [],
281
+
"text": "Paris has a population of approximately 2.2 million people in the city proper."
282
+
}]
283
+
}]
284
+
}
285
+
```
286
+
287
+
For streaming responses, the API returns a series of events. Refer to [OpenAI streaming guide](https://platform.openai.com/docs/guides/streaming-responses?api-mode=responses) for details.
Returns a list of key models available on the server in an OpenAI-compatible format. We also expanded each model object with the `checkpoint` and `recipe` fields, which may be used to load a model using the `load` endpoint.
|[Open WebUI](https://github.com/open-webui/open-webui)|[How to chat with Lemonade LLMs in Open WebUI](https://ryzenai.docs.amd.com/en/latest/llm/server_interface.html#open-webui-demo)|[Watch Demo](https://www.youtube.com/watch?v=PXNTDZREJ_A)|
12
-
|[Continue](https://www.continue.dev/)|[How to use Lemonade LLMs as a coding assistant in Continue](continue.md)|_coming soon_|
13
-
|[Microsoft AI Toolkit](https://learn.microsoft.com/en-us/windows/ai/toolkit/)|[Experimenting with Lemonade LLMs in VS Code using Microsoft's AI Toolkit](ai-toolkit.md)|_coming soon_|
14
-
|[CodeGPT](https://codegpt.co/)|[How to use Lemonade LLMs as a coding assistant in CodeGPT](codeGPT.md)|_coming soon_|
15
-
[MindCraft](mindcraft.md) | [How to use Lemonade LLMs as a Minecraft agent](mindcraft.md) | _coming soon_ |
16
-
|[wut](https://github.com/shobrook/wut)|[Terminal assistant that uses Lemonade LLMs to explain errors](wut.md)|_coming soon_|
17
-
|[AnythingLLM](https://anythingllm.com/)|[Running agents locally with Lemonade and AnythingLLM](anythingLLM.md)|_coming soon_|
18
-
| [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness) | [A unified framework to test generative language models on a large number of different evaluation tasks.](lm-eval.md) | _coming soon_
11
+
|[Open WebUI](https://github.com/open-webui/open-webui)|[How to chat with Lemonade LLMs in Open WebUI](https://ryzenai.docs.amd.com/en/latest/llm/server_interface.html#open-webui-demo)|[Watch Demo](https://www.youtube.com/watch?v=PXNTDZREJ_A)|
12
+
|[Continue](https://www.continue.dev/)|[How to use Lemonade LLMs as a coding assistant in Continue](continue.md)|[Watch Demo](https://youtu.be/bP_MZnDpbUc?si=hRhLbLEV6V_OGlUt)|
13
+
|[Microsoft AI Toolkit](https://learn.microsoft.com/en-us/windows/ai/toolkit/)|[Experimenting with Lemonade LLMs in VS Code using Microsoft's AI Toolkit](ai-toolkit.md)|[Watch Demo](https://youtu.be/JecpotOZ6qo?si=WxWVQhUBCJQgE6vX)|
14
+
|[GAIA](https://github.com/amd/gaia)|[An application for running LLMs locally, includes a ChatBot, YouTube Agent, and more](https://github.com/amd/gaia?tab=readme-ov-file#getting-started-guide)|[Watch Demo](https://youtu.be/_PORHv_-atI?si=EYQjmrRQ6Zy2H0ek)|
15
+
|[CodeGPT](https://codegpt.co/)|[How to use Lemonade LLMs as a coding assistant in CodeGPT](codeGPT.md)|_coming soon_|
16
+
|[MindCraft](mindcraft.md)|[How to use Lemonade LLMs as a Minecraft agent](mindcraft.md)|_coming soon_|
17
+
|[wut](https://github.com/shobrook/wut)|[Terminal assistant that uses Lemonade LLMs to explain errors](wut.md)|_coming soon_|
18
+
|[AnythingLLM](https://anythingllm.com/)|[Running agents locally with Lemonade and AnythingLLM](anythingLLM.md)|_coming soon_|
19
+
|[lm-eval](https://github.com/EleutherAI/lm-evaluation-harness)|[A unified framework to test generative language models on a large number of different evaluation tasks.](lm-eval.md)|_coming soon_|
20
+
|[PEEL](https://github.com/lemonade-apps/peel)|[Using Local LLMs in Windows PowerShell](https://github.com/lemonade-apps/peel?tab=readme-ov-file#installation)|_coming soon_|
21
+
19
22
## 📦 Looking for Installation Help?
20
23
21
24
To set up Lemonade Server, check out the [Lemonade_Server_Installer.exe guide](lemonade_server_exe.md) for installation instructions and the [server spec](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md) to learn more about the functionality. For more information about 🍋 Lemonade SDK, see the [Lemonade SDK README](https://github.com/onnx/turnkeyml/tree/main/docs/lemonade/).
0 commit comments