Skip to content

Commit db73728

Browse files
authored
Release v6.2.2 (#319)
## Server Improvements v6.2.2 is focused on improvements for 🍋Lemonade Server users and app developers. - LLMs can be downloaded after Lemonade Server installation using `lemonade-server pull MODEL_NAME` (@jeremyfowers) - Users can find a list of supported LLMs [here](https://github.com/aigdat/turnkeyml/blob/main/docs/lemonade/server_models.md). - Developers can programmatically access a list of supported LLMs [here]( https://github.com/aigdat/genai/blob/main/src/lemonade_server/server_models.json). - New supported models: - 6 new AWQ OGA+CPU models, enabling performant and lightweight support for non-STX systems (@iswaryaalex) - `Mistral-7B-v0.3-Instruct-Hybrid` and `Llama-3.1-8B-Instruct-Hybrid` (@jeremyfowers) - Stop any running Lemonade Server process using `lemonade-server stop` (@danielholanda) - Lemonade Server works in offline mode (@ramkrishna2910) ## Additional Improvements - A new Lemonade Server integration guide, for using Mindcraft to add LLM-controlled NPCs to Minecraft (@itomek) - Add response length to bench statistics and improve Quark installation process (@amd-pworfolk) ## New Contributors @itomek made their first contribution! Co-authored-by: Victoria Godsoe <victoria.godsoe@amd.com> Co-authored-by: Daniel Holanda <holand.daniel@gmail.com> Co-authored-by: Ramakrishnan Sivakumar <ramkrishna2910@gmail.com> Co-authored-by: amd-pworfolk <166068376+amd-pworfolk@users.noreply.github.com> Co-authored-by: Iswarya Alex <47045679+iswaryaalex@users.noreply.github.com> Co-authored-by: Tomasz Iniewicz <itomek@users.noreply.github.com> Signed-off-by: Jeremy Fowers <80718789+jeremyfowers@users.noreply.github.com>
1 parent e8f0488 commit db73728

File tree

22 files changed

+1221
-183
lines changed

22 files changed

+1221
-183
lines changed

.github/workflows/test_server.yml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,10 +36,14 @@ jobs:
3636
python -m pip install --upgrade pip
3737
python -m pip check
3838
pip install -e .[llm-oga-cpu]
39-
lemonade-install --model Qwen2.5-0.5B-Instruct-CPU
40-
- name: Run server tests
39+
lemonade-server-dev pull Qwen2.5-0.5B-Instruct-CPU
40+
- name: Run server tests (network online mode)
4141
shell: bash -el {0}
4242
run: |
4343
python test/lemonade/server.py
44+
- name: Run server tests (offline mode)
45+
shell: bash -el {0}
46+
run: |
47+
python test/lemonade/server.py --offline
4448
4549

docs/lemonade/server_integration.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,17 @@ https://github.com/onnx/turnkeyml/releases/download/v6.0.0/Lemonade_Server_Insta
7070

7171
Please note that the Server Installer is only available on Windows. Apps that integrate with our server on a Linux machine must install Lemonade from source as described [here](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#from-source-code).
7272

73+
### Installing Additional Models
74+
75+
Lemonade Server installations always come with at least one LLM installed. If you want to install additional models on behalf of your users, the following tools are available:
76+
77+
- Discovering which LLMs are available:
78+
- [A human-readable list of supported models](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_models.md).
79+
- [A JSON file with the list of supported models](https://github.com/onnx/turnkeyml/tree/main/src/lemonade_server/server_models.json) is included in every Lemonade Server installation.
80+
- Installing LLMs:
81+
- [The `pull` endpoint in the server](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md#get-apiv0pull-).
82+
- `lemonade-server pull MODEL` on the command line interface.
83+
7384
## Stand-Alone Server Integration
7485

7586
Some apps might prefer to be responsible for installing and managing Lemonade Server on behalf of the user. This part of the guide includes steps for installing and running Lemonade Server so that your users don't have to install Lemonade Server separately.
@@ -94,6 +105,8 @@ lemonade-server --port 8123
94105

95106
You can also run the server as a background process using a subprocess or any preferred method.
96107

108+
To stop the server, you may use the `lemonade-server stop` command, or simply terminate the process you created by keeping track of its PID. Please do not run the `lemonade-server stop` command if your application has not started the server, as the server may be used by other applications.
109+
97110
### Silent Installation
98111

99112
Silent installation runs `Lemonade_Server_Installer.exe` without a GUI and automatically accepts all prompts.

docs/lemonade/server_models.md

Lines changed: 116 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,22 @@ This document provides the models we recommend for use with Lemonade Server. Cli
66
## Naming Convention
77
The format of each Lemonade name is a combination of the name in the base checkpoint and the backend where the model will run. So, if the base checkpoint is `meta-llama/Llama-3.2-1B-Instruct`, and it has been optimized to run on Hybrid, the resulting name is Llama-3.2-3B-Instruct-Hybrid.
88

9-
## Supported Models
10-
<details>
11-
<summary>Qwen2.5-0.5B-Instruct-CPU</summary>
9+
## Installing Additional Models
1210

13-
| Key | Value |
14-
| --- | ----- |
15-
| Checkpoint | [amd/Qwen2.5-0.5B-Instruct-quantized_int4-float16-cpu-onnx](https://huggingface.co/amd/Qwen2.5-0.5B-Instruct-quantized_int4-float16-cpu-onnx) |
16-
| Recipe | oga-cpu |
17-
| Reasoning | False |
11+
Once you've installed Lemonade Server, you can install any model on this list using the `lemonade-server pull` command.
1812

19-
</details>
13+
Example:
14+
15+
```bash
16+
lemonade-server pull Qwen2.5-0.5B-Instruct-CPU
17+
```
18+
19+
> Note: `lemonade-server` is a utility that is added to your PATH when you install Lemonade Server with the GUI installer.
20+
> If you are using Lemonade Server from a Python environment, use the `lemonade-server-dev pull` command instead.
21+
22+
## Supported Models
23+
24+
### Hybrid
2025

2126
<details>
2227
<summary>Llama-3.2-1B-Instruct-Hybrid</summary>
@@ -84,3 +89,105 @@ The format of each Lemonade name is a combination of the name in the base checkp
8489

8590
</details>
8691

92+
<details>
93+
<summary>Mistral-7B-v0.3-Instruct-Hybrid</summary>
94+
95+
| Key | Value |
96+
| --- | ----- |
97+
| Checkpoint | [amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid](https://huggingface.co/amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid) |
98+
| Recipe | oga-hybrid |
99+
| Reasoning | False |
100+
101+
</details>
102+
103+
<details>
104+
<summary>Llama-3.1-8B-Instruct-Hybrid</summary>
105+
106+
| Key | Value |
107+
| --- | ----- |
108+
| Checkpoint | [amd/Llama-3.1-8B-Instruct-awq-asym-uint4-g128-lmhead-onnx-hybrid](https://huggingface.co/amd/Llama-3.1-8B-Instruct-awq-asym-uint4-g128-lmhead-onnx-hybrid) |
109+
| Recipe | oga-hybrid |
110+
| Reasoning | False |
111+
112+
</details>
113+
114+
115+
### CPU
116+
117+
<details>
118+
<summary>Qwen2.5-0.5B-Instruct-CPU</summary>
119+
120+
| Key | Value |
121+
| --- | ----- |
122+
| Checkpoint | [amd/Qwen2.5-0.5B-Instruct-quantized_int4-float16-cpu-onnx](https://huggingface.co/amd/Qwen2.5-0.5B-Instruct-quantized_int4-float16-cpu-onnx) |
123+
| Recipe | oga-cpu |
124+
| Reasoning | False |
125+
126+
</details>
127+
128+
<details>
129+
<summary>Llama-3.2-1B-Instruct-CPU</summary>
130+
131+
| Key | Value |
132+
| --- | ----- |
133+
| Checkpoint | [amd/Llama-3.2-1B-Instruct-awq-uint4-float16-cpu-onnx](https://huggingface.co/amd/Llama-3.2-1B-Instruct-awq-uint4-float16-cpu-onnx) |
134+
| Recipe | oga-cpu |
135+
| Reasoning | False |
136+
137+
</details>
138+
139+
<details>
140+
<summary>Llama-3.2-3B-Instruct-CPU</summary>
141+
142+
| Key | Value |
143+
| --- | ----- |
144+
| Checkpoint | [amd/Llama-3.2-3B-Instruct-awq-uint4-float16-cpu-onnx](https://huggingface.co/amd/Llama-3.2-3B-Instruct-awq-uint4-float16-cpu-onnx) |
145+
| Recipe | oga-cpu |
146+
| Reasoning | False |
147+
148+
</details>
149+
150+
<details>
151+
<summary>Phi-3-Mini-Instruct-CPU</summary>
152+
153+
| Key | Value |
154+
| --- | ----- |
155+
| Checkpoint | [amd/Phi-3-mini-4k-instruct_int4_float16_onnx_cpu](https://huggingface.co/amd/Phi-3-mini-4k-instruct_int4_float16_onnx_cpu) |
156+
| Recipe | oga-cpu |
157+
| Reasoning | False |
158+
159+
</details>
160+
161+
<details>
162+
<summary>Qwen-1.5-7B-Chat-CPU</summary>
163+
164+
| Key | Value |
165+
| --- | ----- |
166+
| Checkpoint | [amd/Qwen1.5-7B-Chat_uint4_asym_g128_float16_onnx_cpu](https://huggingface.co/amd/Qwen1.5-7B-Chat_uint4_asym_g128_float16_onnx_cpu) |
167+
| Recipe | oga-cpu |
168+
| Reasoning | False |
169+
170+
</details>
171+
172+
<details>
173+
<summary>DeepSeek-R1-Distill-Llama-8B-CPU</summary>
174+
175+
| Key | Value |
176+
| --- | ----- |
177+
| Checkpoint | [amd/DeepSeek-R1-Distill-Llama-8B-awq-asym-uint4-g128-lmhead-onnx-cpu](https://huggingface.co/amd/DeepSeek-R1-Distill-Llama-8B-awq-asym-uint4-g128-lmhead-onnx-cpu) |
178+
| Recipe | oga-cpu |
179+
| Reasoning | True |
180+
181+
</details>
182+
183+
<details>
184+
<summary>DeepSeek-R1-Distill-Qwen-7B-CPU</summary>
185+
186+
| Key | Value |
187+
| --- | ----- |
188+
| Checkpoint | [amd/DeepSeek-R1-Distill-Llama-8B-awq-asym-uint4-g128-lmhead-onnx-cpu](https://huggingface.co/amd/DeepSeek-R1-Distill-Llama-8B-awq-asym-uint4-g128-lmhead-onnx-cpu) |
189+
| Recipe | oga-cpu |
190+
| Reasoning | True |
191+
192+
</details>
193+

docs/lemonade/server_spec.md

Lines changed: 33 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ They focus on enabling client applications by extending existing cloud-focused A
2323
- Unload models to save memory space.
2424

2525
The additional endpoints under development are:
26+
- POST `/api/v0/pull` - Install a model
2627
- POST `/api/v0/load` - Load a model
2728
- POST `/api/v0/unload` - Unload a model
2829
- POST `/api/v0/params` - Set generation parameters
@@ -250,9 +251,40 @@ curl http://localhost:8000/api/v0/models
250251

251252
## Additional Endpoints
252253

254+
### `GET /api/v0/pull` <sub>![Status](https://img.shields.io/badge/status-fully_available-green)</sub>
255+
256+
Install a model by downloading it and registering it with Lemonade Server.
257+
258+
#### Parameters
259+
260+
| Parameter | Required | Description |
261+
|-----------|----------|-------------|
262+
| `model_name` | Yes | [Lemonade Server model name](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_models.md) to load. |
263+
264+
Example request:
265+
266+
```bash
267+
curl http://localhost:8000/api/v0/pull \
268+
-H "Content-Type: application/json" \
269+
-d '{
270+
"model_name": "Qwen2.5-0.5B-Instruct-CPU"
271+
}'
272+
```
273+
274+
Response format:
275+
276+
```json
277+
{
278+
"status":"success",
279+
"message":"Installed model: Qwen2.5-0.5B-Instruct-CPU"
280+
}
281+
```
282+
283+
In case of an error, the status will be `error` and the message will contain the error message.
284+
253285
### `GET /api/v0/load` <sub>![Status](https://img.shields.io/badge/status-fully_available-green)</sub>
254286

255-
Explicitly load a model into memory. This is useful to ensure that the model is loaded before you make a request.
287+
Explicitly load a model into memory. This is useful to ensure that the model is loaded before you make a request. Installs the model if necessary.
256288

257289
#### Parameters
258290

@@ -321,7 +353,6 @@ Response format:
321353

322354
In case of an error, the status will be `error` and the message will contain the error message.
323355

324-
325356
### `POST /api/v0/unload` <sub>![Status](https://img.shields.io/badge/status-partially_available-red)</sub>
326357

327358
Explicitly unload a model from memory. This is useful to free up memory while still leaving the server process running (which takes minimal resources but a few seconds to start).

examples/lemonade/server/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ This allows the same application to leverage local LLMs instead of relying on Op
1212
| [Continue](https://www.continue.dev/) | [How to use Lemonade LLMs as a coding assistant in Continue](continue.md) | _coming soon_ |
1313
| [Microsoft AI Toolkit](https://learn.microsoft.com/en-us/windows/ai/toolkit/) | [Experimenting with Lemonade LLMs in VS Code using Microsoft's AI Toolkit](ai-toolkit.md) | _coming soon_ |
1414
| [CodeGPT](https://codegpt.co/) | [How to use Lemonade LLMs as a coding assistant in CodeGPT](codeGPT.md) | _coming soon_ |
15+
[MindCraft](mindcraft.md) | [How to use Lemonade LLMs as a Minecraft agent](mindcraft.md) | _coming soon_ |
1516
| [wut](https://github.com/shobrook/wut) | [Terminal assistant that uses Lemonade LLMs to explain errors](wut.md) | _coming soon_ |
1617
| [AnythingLLM](https://anythingllm.com/) | [Running agents locally with Lemonade and AnythingLLM](anythingLLM.md) | _coming soon_ |
1718
| [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness) | [A unified framework to test generative language models on a large number of different evaluation tasks.](lm-eval.md) | _coming soon_

examples/lemonade/server/ai-toolkit.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ The AI Toolkit now supports "Bring Your Own Model" functionality, allowing you t
3737
http://localhost:8000/api/v0/chat/completions
3838
```
3939
5. When prompted to "Enter the exact model name as in the API" select a model (e.g., `Phi-3-Mini-Instruct-Hybrid`)
40-
- Note: You can get a list of all models available by running `curl http://localhost:8000/api/v0/models`
40+
- Note: You can get a list of all models available [here](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_models.md).
4141
6. Select the same name as the display model name.
4242
7. Skip the HTTP authentication step by pressing "Enter".
4343

0 commit comments

Comments
 (0)