You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/develop/rust/wasinn/llm_inference.md
+60-25
Original file line number
Diff line number
Diff line change
@@ -4,26 +4,61 @@ sidebar_position: 1
4
4
5
5
# Llama 2 inference
6
6
7
-
WasmEdge now supports running llama2 series of models in Rust. We will use [this example project](https://github.com/second-state/llama-utils/tree/main/chat) to show how to make AI inferences with the llama2 model in WasmEdge and Rust.
8
-
9
-
WasmEdge now supports Llama2, Codellama-instruct, BELLE-Llama, Mistral-7b-instruct, Wizard-vicuna, OpenChat 3.5B and raguile-chatml.
7
+
WasmEdge now supports running llama2 series of models in Rust. We will use [this example project](https://github.com/second-state/LlamaEdge/tree/main/chat) to show how to make AI inferences with the llama2 model in WasmEdge and Rust.
8
+
9
+
WasmEdge now supports the following models:
10
+
11
+
1. Llama-2-7B-Chat
12
+
1. Llama-2-13B-Chat
13
+
1. CodeLlama-13B-Instruct
14
+
1. Mistral-7B-Instruct-v0.1
15
+
1. Mistral-7B-Instruct-v0.2
16
+
1. MistralLite-7B
17
+
1. OpenChat-3.5-0106
18
+
1. OpenChat-3.5-1210
19
+
1. OpenChat-3.5
20
+
1. Wizard-Vicuna-13B-Uncensored-GGUF
21
+
1. TinyLlama-1.1B-Chat-v1.0
22
+
1. Baichuan2-13B-Chat
23
+
1. OpenHermes-2.5-Mistral-7B
24
+
1. Dolphin-2.2-Yi-34B
25
+
1. Dolphin-2.6-Mistral-7B
26
+
1. Samantha-1.2-Mistral-7B
27
+
1. Samantha-1.11-CodeLlama-34B
28
+
1. WizardCoder-Python-7B-V1.0
29
+
1. Zephyr-7B-Alpha
30
+
1. WizardLM-13B-V1.0-Uncensored
31
+
1. Orca-2-13B
32
+
1. Neural-Chat-7B-v3-1
33
+
1. Yi-34B-Chat
34
+
1. Starling-LM-7B-alpha
35
+
1. DeepSeek-Coder-6.7B
36
+
1. DeepSeek-LLM-7B-Chat
37
+
1. SOLAR-10.7B-Instruct-v1.0
38
+
1. Mixtral-8x7B-Instruct-v0.1
39
+
1. Nous-Hermes-2-Mixtral-8x7B-DPO
40
+
1. Nous-Hermes-2-Mixtral-8x7B-SFT
41
+
42
+
And more, please check [the supported models](https://github.com/second-state/LlamaEdge/blob/main/models.md) for detials.
10
43
11
44
## Prerequisite
12
45
13
46
Besides the [regular WasmEdge and Rust requirements](../../rust/setup.md), please make sure that you have the [Wasi-NN plugin with ggml installed](../../../start/install.md#wasi-nn-plug-in-with-ggml-backend).
14
47
15
48
## Quick start
16
49
17
-
Because the example already includes a compiled WASM file from the Rust code, we could use WasmEdge CLI to execute the example directly. First, git clone the `llama-utils` repo.
50
+
Because the example already includes a compiled WASM file from the Rust code, we could use WasmEdge CLI to execute the example directly.
Next, let's get the model. In this example, we are going to use the llama2 7b chat model in GGUF format. You can also use other kinds of llama2 models, check out [here](https://github.com/second-state/llama-utils/blob/main/chat/README.md#get-model).
58
+
Next, let's get the model. In this example, we are going to use the llama2 7b chat model in GGUF format. You can also use other kinds of llama2 models, check out [here](https://github.com/second-state/llamaedge/blob/main/chat/README.md#get-model).
Furthermore, the following command tells WasmEdge to print out logs and statistics of the model at runtime.
@@ -155,11 +190,11 @@ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf l
155
190
156
191
## Understand the code
157
192
158
-
The [main.rs](https://github.com/second-state/llama-utils/blob/main/chat/src/main.rs) is the full Rust code to create an interactive chatbot using a LLM. The Rust program manages the user input, tracks the conversation history, transforms the text into the llama2 and other model’s chat templates, and runs the inference operations using the WASI NN standard API. The code logic for the chat interaction is somewhat complex. In this section, we will use the [simple example](https://github.com/second-state/llama-utils/tree/main/simple) to explain how to set up and perform one inference round trip. Here is how you use the simple example.
193
+
The [main.rs](https://github.com/second-state/llamaedge/blob/main/chat/src/main.rs) is the full Rust code to create an interactive chatbot using a LLM. The Rust program manages the user input, tracks the conversation history, transforms the text into the llama2 and other model’s chat templates, and runs the inference operations using the WASI NN standard API. The code logic for the chat interaction is somewhat complex. In this section, we will use the [simple example](https://github.com/second-state/llamaedge/tree/main/simple) to explain how to set up and perform one inference round trip. Here is how you use the simple example.
output: in 1942, when he led the team that developed the first atomic bomb, which was dropped on Hiroshima, Japan in 1945.
169
204
```
170
205
171
-
First, let's parse command line arguments to customize the chatbot's behavior using `Command` struct. It extracts the following parameters: `prompt` (a prompt that guides the conversation), `model_alias` (a list for the loaded model), and `ctx_size` (the size of the chat context).
206
+
First, let's parse command line arguments to customize the chatbot's behavior using `Command` struct. It extracts the following parameters: `prompt` (a prompt that guides the conversation), `model_alias` (a list for the loaded model), and `ctx_size` (the size of the chat context).
* If you're looking for multi-turn conversations with llama 2 models, please check out the above mentioned chat example source code [here](https://github.com/second-state/llama-utils/tree/main/chat).
276
-
* If you want to construct OpenAI-compatible APIs specifically for your llama2 model, or the Llama2 model itself, please check out the source code [for the API server](https://github.com/second-state/llama-utils/tree/main/api-server).
310
+
* If you're looking for multi-turn conversations with llama 2 models, please check out the above mentioned chat example source code [here](https://github.com/second-state/llamaedge/tree/main/chat).
311
+
* If you want to construct OpenAI-compatible APIs specifically for your llama2 model, or the Llama2 model itself, please check out the source code [for the API server](https://github.com/second-state/llamaedge/tree/main/api-server).
277
312
* To learn more, please check out [this article](https://medium.com/stackademic/fast-and-portable-llama2-inference-on-the-heterogeneous-edge-a62508e82359).
0 commit comments