Skip to content
This repository was archived by the owner on Sep 12, 2024. It is now read-only.

Commit a545f3c

Browse files
authored
Merge pull request #27 from Atome-FE/feature/update-llama-rs
feat: upgrade to newest llama-rs so that we can support ggjt model
2 parents e7d1795 + 360a83b commit a545f3c

File tree

21 files changed

+512
-152
lines changed

21 files changed

+512
-152
lines changed

.github/workflows/llama-build.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ on:
77
pull_request:
88
branches:
99
- master
10+
- main
1011
types:
1112
- ready_for_review
1213
- review_requested

Cargo.lock

Lines changed: 23 additions & 13 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README-zh-CN.md

Lines changed: 47 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ Node.js运行的大语言模型LLaMA。
2222
- [安装](#安装)
2323
- [模型获取](#模型获取)
2424
- [模型版本](#模型版本)
25+
- [llama.cpp](#llamacpp)
26+
- [llama-rs](#llama-rs)
2527
- [使用(llama.cpp后端)](#使用llamacpp后端)
2628
- [推理](#推理)
2729
- [分词](#分词)
@@ -87,13 +89,48 @@ llama-node底层调用llama-rs,它使用的模型格式源自llama.cpp。由
8789

8890
### 模型版本
8991

90-
目前llama.cpp社区有3个版本:
92+
#### llama.cpp
93+
94+
以下是llama.cpp支持的模型类型,ggml.h源码中可找到:
95+
96+
```c
97+
enum ggml_type {
98+
// explicitly numbered values are used in llama.cpp files
99+
GGML_TYPE_F32 = 0,
100+
GGML_TYPE_F16 = 1,
101+
GGML_TYPE_Q4_0 = 2,
102+
GGML_TYPE_Q4_1 = 3,
103+
GGML_TYPE_Q4_2 = 4,
104+
GGML_TYPE_Q4_3 = 5,
105+
GGML_TYPE_Q8_0 = 6,
106+
GGML_TYPE_I8,
107+
GGML_TYPE_I16,
108+
GGML_TYPE_I32,
109+
GGML_TYPE_COUNT,
110+
};
111+
```
91112

92-
- GGML:旧版格式,最早的GGML张量文件格式。
93-
- GGMF:也是旧版格式,比GGML新,比GGJT旧。
94-
- GGJT:可进行mmap映射的格式。
113+
#### llama-rs
114+
115+
以下是llama-rs支持的模型类型,从llama-rs的ggml绑定中可找到:
116+
117+
```rust
118+
pub enum Type {
119+
/// Quantized 4-bit (type 0).
120+
#[default]
121+
Q4_0,
122+
/// Quantized 4-bit (type 1); used by GPTQ.
123+
Q4_1,
124+
/// Integer 32-bit.
125+
I32,
126+
/// Float 16-bit.
127+
F16,
128+
/// Float 32-bit.
129+
F32,
130+
}
131+
```
95132

96-
llama-rs后端现在只支持GGML / GGMF模型。llama.cpp后端仅支持GGJT模型
133+
llama-rs也支持旧版的ggml/ggmf模型
97134

98135
---
99136

@@ -110,7 +147,7 @@ import { LLama } from "llama-node";
110147
import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";
111148
import path from "path";
112149

113-
const model = path.resolve(process.cwd(), "./ggml-vicuna-7b-4bit-rev1.bin");
150+
const model = path.resolve(process.cwd(), "./ggml-vicuna-7b-1.1-q4_1.bin");
114151

115152
const llama = new LLama(LLamaCpp);
116153

@@ -163,7 +200,7 @@ import { LLama } from "llama-node";
163200
import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";
164201
import path from "path";
165202

166-
const model = path.resolve(process.cwd(), "./ggml-vicuna-7b-4bit-rev1.bin");
203+
const model = path.resolve(process.cwd(), "./ggml-vicuna-7b-1.1-q4_1.bin");
167204

168205
const llama = new LLama(LLamaCpp);
169206

@@ -195,7 +232,7 @@ import { LLama } from "llama-node";
195232
import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";
196233
import path from "path";
197234

198-
const model = path.resolve(process.cwd(), "./ggml-vicuna-7b-4bit-rev1.bin");
235+
const model = path.resolve(process.cwd(), "./ggml-vicuna-7b-1.1-q4_1.bin");
199236

200237
const llama = new LLama(LLamaCpp);
201238

@@ -363,7 +400,7 @@ import { LLama } from "llama-node";
363400
import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";
364401
import path from "path";
365402

366-
const model = path.resolve(process.cwd(), "../ggml-vicuna-7b-4bit-rev1.bin");
403+
const model = path.resolve(process.cwd(), "../ggml-vicuna-7b-1.1-q4_1.bin");
367404

368405
const llama = new LLama(LLamaCpp);
369406

@@ -446,5 +483,5 @@ run();
446483
- [ ] 更多平台和处理器架构(在最高的性能条件下)
447484
- [ ] 优化嵌入API,提供可以配置尾词的选项
448485
- [ ] 命令行工具
449-
- [ ] 更新llama-rs以支持更多模型 https://github.com/rustformers/llama-rs/pull/85 https://github.com/rustformers/llama-rs/issues/75
486+
- [ ] 更新llama-rs以支持更多模型 https://github.com/rustformers/llama-rs/pull/141
450487
- [ ] 更多native推理后端(如rwkv)支持!

README.md

Lines changed: 47 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ This project is in an early stage, the API for nodejs may change in the future,
2424
- [Install](#install)
2525
- [Getting the weights](#getting-the-weights)
2626
- [Model versioning](#model-versioning)
27+
- [llama.cpp](#llamacpp)
28+
- [llama-rs](#llama-rs)
2729
- [Usage (llama.cpp backend)](#usage-llamacpp-backend)
2830
- [Inference](#inference)
2931
- [Tokenize](#tokenize)
@@ -89,13 +91,48 @@ The llama-node uses llama-rs under the hook and uses the model format derived fr
8991

9092
### Model versioning
9193

92-
There are now 3 versions from llama.cpp community:
94+
#### llama.cpp
95+
96+
For llama.cpp, supported types can check from ggml.h source:
97+
98+
```c
99+
enum ggml_type {
100+
// explicitly numbered values are used in llama.cpp files
101+
GGML_TYPE_F32 = 0,
102+
GGML_TYPE_F16 = 1,
103+
GGML_TYPE_Q4_0 = 2,
104+
GGML_TYPE_Q4_1 = 3,
105+
GGML_TYPE_Q4_2 = 4,
106+
GGML_TYPE_Q4_3 = 5,
107+
GGML_TYPE_Q8_0 = 6,
108+
GGML_TYPE_I8,
109+
GGML_TYPE_I16,
110+
GGML_TYPE_I32,
111+
GGML_TYPE_COUNT,
112+
};
113+
```
93114

94-
- GGML: legacy format, oldest ggml tensor file format
95-
- GGMF: also legacy format, newer than GGML, older than GGJT
96-
- GGJT: mmap-able format
115+
#### llama-rs
116+
117+
For llama-rs, supported model types can check from llama-rs ggml bindings:
118+
119+
```rust
120+
pub enum Type {
121+
/// Quantized 4-bit (type 0).
122+
#[default]
123+
Q4_0,
124+
/// Quantized 4-bit (type 1); used by GPTQ.
125+
Q4_1,
126+
/// Integer 32-bit.
127+
I32,
128+
/// Float 16-bit.
129+
F16,
130+
/// Float 32-bit.
131+
F32,
132+
}
133+
```
97134

98-
The llama-rs backend now only supports GGML/GGMF models, and llama.cpp backend only supports GGJT models.
135+
llama-rs also supports legacy llama.cpp models
99136

100137
---
101138

@@ -112,7 +149,7 @@ import { LLama } from "llama-node";
112149
import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";
113150
import path from "path";
114151

115-
const model = path.resolve(process.cwd(), "./ggml-vicuna-7b-4bit-rev1.bin");
152+
const model = path.resolve(process.cwd(), "./ggml-vicuna-7b-1.1-q4_1.bin");
116153

117154
const llama = new LLama(LLamaCpp);
118155

@@ -165,7 +202,7 @@ import { LLama } from "llama-node";
165202
import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";
166203
import path from "path";
167204

168-
const model = path.resolve(process.cwd(), "./ggml-vicuna-7b-4bit-rev1.bin");
205+
const model = path.resolve(process.cwd(), "./ggml-vicuna-7b-1.1-q4_1.bin");
169206

170207
const llama = new LLama(LLamaCpp);
171208

@@ -197,7 +234,7 @@ import { LLama } from "llama-node";
197234
import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";
198235
import path from "path";
199236

200-
const model = path.resolve(process.cwd(), "./ggml-vicuna-7b-4bit-rev1.bin");
237+
const model = path.resolve(process.cwd(), "./ggml-vicuna-7b-1.1-q4_1.bin");
201238

202239
const llama = new LLama(LLamaCpp);
203240

@@ -366,7 +403,7 @@ import { LLama } from "llama-node";
366403
import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";
367404
import path from "path";
368405

369-
const model = path.resolve(process.cwd(), "../ggml-vicuna-7b-4bit-rev1.bin");
406+
const model = path.resolve(process.cwd(), "../ggml-vicuna-7b-1.1-q4_1.bin");
370407

371408
const llama = new LLama(LLamaCpp);
372409

@@ -452,5 +489,5 @@ The following steps will allow you to compile the binary with best quality on yo
452489
- [ ] more platforms and cross compile (performance related)
453490
- [ ] tweak embedding API, make end token configurable
454491
- [ ] cli and interactive
455-
- [ ] support more open source models as llama-rs planned https://github.com/rustformers/llama-rs/pull/85 https://github.com/rustformers/llama-rs/issues/75
492+
- [ ] support more open source models as llama-rs planned https://github.com/rustformers/llama-rs/pull/141
456493
- [ ] more backends (eg. rwkv) supports!

example/src/langchain/langchain.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ import { LLama } from "llama-node";
44
import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";
55
import path from "path";
66

7-
const model = path.resolve(process.cwd(), "../ggml-vicuna-7b-4bit-rev1.bin");
7+
const model = path.resolve(process.cwd(), "../ggml-vicuna-7b-1.1-q4_1.bin");
88

99
const llama = new LLama(LLamaCpp);
1010

example/src/llama-cpp/embedding.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ import { LLama } from "llama-node";
22
import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";
33
import path from "path";
44

5-
const model = path.resolve(process.cwd(), "../ggml-vicuna-7b-4bit-rev1.bin");
5+
const model = path.resolve(process.cwd(), "../ggml-vicuna-7b-1.1-q4_1.bin");
66

77
const llama = new LLama(LLamaCpp);
88

example/src/llama-cpp/llama-cpp.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ import { LLama } from "llama-node";
22
import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";
33
import path from "path";
44

5-
const model = path.resolve(process.cwd(), "../ggml-vicuna-7b-4bit-rev1.bin");
5+
const model = path.resolve(process.cwd(), "../ggml-vicuna-7b-1.1-q4_1.bin");
66

77
const llama = new LLama(LLamaCpp);
88

@@ -22,7 +22,7 @@ const config: LoadConfig = {
2222

2323
llama.load(config);
2424

25-
const template = `How are you`;
25+
const template = `How are you?`;
2626

2727
const prompt = `### Human:
2828

example/src/llama-cpp/tokenize.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ import { LLama } from "llama-node";
22
import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";
33
import path from "path";
44

5-
const model = path.resolve(process.cwd(), "../ggml-vicuna-7b-4bit-rev1.bin");
5+
const model = path.resolve(process.cwd(), "../ggml-vicuna-7b-1.1-q4_1.bin");
66

77
const llama = new LLama(LLamaCpp);
88

0 commit comments

Comments
 (0)