Skip to content

Commit ebdada8

Browse files
authored
Merge pull request #3603 from alibaba/feature/sync
MNN:Sync: Sync 3.2.0
2 parents bab3ebd + b66ec85 commit ebdada8

File tree

110 files changed

+10645
-2731
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

110 files changed

+10645
-2731
lines changed

docs/compile/engine.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ mkdir build && cd build && cmake .. -DCMAKE_OSX_ARCHITECTURES=arm64 && make -j8
126126

127127
- 基于脚本编译:运行脚本并开启`MNN_ARM82`选项
128128
```
129-
sh package_scripts/ios/buildiOS.sh "-DMNN_ARM82=true"
129+
sh package_scripts/ios/buildiOS.sh -DMNN_ARM82=ON
130130
```
131131
132132
## 鸿蒙(Harmony)

docs/start/demo.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,11 @@
2020
### 图像实例分割
2121
代码位置:`demo/exec/segment.cpp`
2222

23-
下载 deeplabv3 分割模型并转换到 mnn 模型
23+
下载 deeplabv3 分割模型
2424
[https://storage.googleapis.com/download.tensorflow.org/models/tflite/gpu/deeplabv3_257_mv_gpu.tflite](https://storage.googleapis.com/download.tensorflow.org/models/tflite/gpu/deeplabv3_257_mv_gpu.tflite)
2525

26+
使用 [模型转换工具](../tools/convert.md) 转换为 MNN 模型,转换时加上参数 --keepInputFormat=0 【把输入由NHWC转换为NC4HW4布局】
27+
2628
```bash
2729
./segment.out model.mnn input.png result.png
2830
```
@@ -95,14 +97,14 @@ flops_info: 568.792175M
9597
backend_info: 13
9698
expect 983
9799
output belong to class: 983
98-
$ python gpu_session_demo.py mobilenet_demo/mobilenet_v1.mnn mobilenet_demo/ILSVRC2012_val_00049999.JPEG
100+
$ python gpu_session_demo.py mobilenet_demo/mobilenet_v1.mnn mobilenet_demo/ILSVRC2012_val_00049999.JPEG
99101
Testing gpu model calling method
100102
101103
Load Cache file error.
102104
MNN use high precision
103105
Can't Find type=3 backend, use 0 instead
104106
Can't Find type=3 backend, use 0 instead
105-
Run on backendtype: 13
107+
Run on backendtype: 13
106108
107109
expect 983
108110
output belong to class: 983
@@ -127,7 +129,7 @@ output belong to class: 983
127129
#### mnist
128130
使用mnist数据训练模型,并测试准确率,无需下载资源,用法如下:
129131
```bash
130-
$ pip install mnist
132+
$ pip install mnist
131133
$ python train_mnist.py
132134
train loss: 2.3346531
133135
train loss: 0.28027835
@@ -161,7 +163,7 @@ AttributeError: module 'MNN.nn' has no attribute 'FixModule'
161163
#### module_save
162164
演示了模型权值的存储和加载
163165
```bash
164-
$ python test_save.py
166+
$ python test_save.py
165167
0.0004
166168
10
167169
```
@@ -225,4 +227,3 @@ sh ../tools/script/get_model.sh
225227
- [视频抠图](https://github.com/DefTruth/RobustVideoMatting.lite.ai.toolkit)
226228
- [SuperGlue关键点匹配](https://github.com/Hanson0910/MNNSuperGlue)
227229
- [OCR](https://github.com/DayBreak-u/chineseocr_lite/tree/onnx/android_projects/OcrLiteAndroidMNN)
228-
- [Bert-VITS2-MNN](https://github.com/Voine/Bert-VITS2-MNN)

docs/transformers/llm.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -73,34 +73,48 @@ python llmexport.py \
7373
- 使用`--lm_quant_bit`来制定lm_head层权重的量化bit数,不指定则使用`--quant_bit`的量化bit数
7474

7575
### 参数
76+
执行 `python llmexport.py -h` 可查看参数:
7677
```
77-
usage: llmexport.py [-h] --path PATH [--type TYPE] [--lora_path LORA_PATH] [--dst_path DST_PATH] [--test TEST] [--export EXPORT]
78-
[--quant_bit QUANT_BIT] [--quant_block QUANT_BLOCK] [--lm_quant_bit LM_QUANT_BIT]
79-
[--mnnconvert MNNCONVERT]
78+
usage: llmexport.py [-h] --path PATH [--type TYPE] [--tokenizer_path TOKENIZER_PATH] [--lora_path LORA_PATH]
79+
[--gptq_path GPTQ_PATH] [--dst_path DST_PATH] [--verbose] [--test TEST] [--export EXPORT]
80+
[--onnx_slim] [--quant_bit QUANT_BIT] [--quant_block QUANT_BLOCK]
81+
[--lm_quant_bit LM_QUANT_BIT] [--mnnconvert MNNCONVERT] [--ppl] [--awq] [--sym] [--seperate_embed]
82+
[--lora_split]
8083
8184
llm_exporter
8285
83-
options:
86+
optional arguments:
8487
-h, --help show this help message and exit
8588
--path PATH path(`str` or `os.PathLike`):
8689
Can be either:
8790
- A string, the *model id* of a pretrained model like `THUDM/chatglm-6b`. [TODO]
8891
- A path to a *directory* clone from repo like `../chatglm-6b`.
8992
--type TYPE type(`str`, *optional*):
9093
The pretrain llm model type.
94+
--tokenizer_path TOKENIZER_PATH
95+
tokenizer path, defaut is `None` mean using `--path` value.
9196
--lora_path LORA_PATH
9297
lora path, defaut is `None` mean not apply lora.
98+
--gptq_path GPTQ_PATH
99+
gptq path, defaut is `None` mean not apply gptq.
93100
--dst_path DST_PATH export onnx/mnn model to path, defaut is `./model`.
101+
--verbose Whether or not to print verbose.
94102
--test TEST test model inference with query `TEST`.
95103
--export EXPORT export model to an onnx/mnn model.
104+
--onnx_slim Whether or not to use onnx-slim.
96105
--quant_bit QUANT_BIT
97106
mnn quant bit, 4 or 8, default is 4.
98107
--quant_block QUANT_BLOCK
99-
mnn quant block, default is 0 mean channle-wise.
108+
mnn quant block, 0 mean channle-wise, default is 128.
100109
--lm_quant_bit LM_QUANT_BIT
101110
mnn lm_head quant bit, 4 or 8, default is `quant_bit`.
102111
--mnnconvert MNNCONVERT
103112
local mnnconvert path, if invalid, using pymnn.
113+
--ppl Whether or not to get all logits of input tokens.
114+
--awq Whether or not to use awq quant.
115+
--sym Whether or not to using symmetric quant (without zeropoint), defualt is False.
116+
--seperate_embed For lm and embed shared model, whether or not to sepearte embed to avoid quant, defualt is False, if True, embed weight will be seperate to embeddingbf16.bin.
117+
--lora_split Whether or not export lora split, defualt is False.
104118
```
105119

106120
### 权重读取

include/MNN/MNNDefine.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ MNN_ERROR("Check failed: %s ==> %s\n", #success, #log); \
7575
#define STR_IMP(x) #x
7676
#define STR(x) STR_IMP(x)
7777
#define MNN_VERSION_MAJOR 3
78-
#define MNN_VERSION_MINOR 1
79-
#define MNN_VERSION_PATCH 4
78+
#define MNN_VERSION_MINOR 2
79+
#define MNN_VERSION_PATCH 0
8080
#define MNN_VERSION STR(MNN_VERSION_MAJOR) "." STR(MNN_VERSION_MINOR) "." STR(MNN_VERSION_PATCH)
8181
#endif /* MNNDefine_h */

source/backend/cpu/CPUAttention.cpp

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -99,14 +99,22 @@ static void pack_QK(char * pack_qk_dst, float * qk_src, int seq_len, int kv_seq_
9999
template <typename T>
100100
static void mask_QK(float * unpack_qk, int seq_len, int kv_seq_len, float mScale, float min_val, const Tensor* mask) {
101101
if (seq_len == 1 || mask == nullptr) {
102-
for (int i = 0; i < seq_len * kv_seq_len; i++) {
102+
for (int i = 0; i < kv_seq_len; i++) {
103103
unpack_qk[i] = unpack_qk[i] * mScale;
104104
}
105105
} else if (mask->getType() == halide_type_of<float>()) {
106106
// float mask
107107
T* fpmask_ptr = mask->host<T>();
108-
for (int i = 0; i < seq_len * kv_seq_len; i++) {
109-
unpack_qk[i] = unpack_qk[i] * mScale + fpmask_ptr[i];
108+
int offset = kv_seq_len-seq_len;
109+
for (int i=0; i<seq_len; ++i) {
110+
auto unpack_qki = unpack_qk + i * kv_seq_len;
111+
auto fpmask_ptri = fpmask_ptr + i * seq_len;
112+
for (int j=0; j<offset; ++j) {
113+
unpack_qki[j] = unpack_qki[j] * mScale;
114+
}
115+
for (int j=0; j<seq_len; ++j) {
116+
unpack_qki[offset+j] = unpack_qki[offset+j] * mScale + fpmask_ptri[j];
117+
}
110118
}
111119
} else {
112120
// int mask
@@ -192,7 +200,6 @@ ErrorCode CPUAttention::onExecute(const std::vector<Tensor*>& inputs, const std:
192200
int seq_len = query->length(1);
193201
if (inputs.size() > 3) {
194202
mask = inputs[3];
195-
MNN_ASSERT(seq_len == mask->length(2));
196203
}
197204
int tileCount = UP_DIV(mNumHead, mThreadNum);
198205
int group_size = mNumHead / mKvNumHead;

source/backend/cpu/CPURaster.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -594,7 +594,7 @@ ErrorCode CPURaster::onExecute(const std::vector<Tensor *> &____inputs, const st
594594
}
595595
auto core = static_cast<CPUBackend*>(backend())->functions();
596596
auto output = outputs[0];
597-
auto bytes = CPUBackend::getBytes(backend(), output);
597+
size_t bytes = (size_t)(CPUBackend::getBytes(backend(), output));
598598
auto outputEleSize = static_cast<CPUBackend*>(backend())->getTensorSize(output);
599599
auto threadNum = static_cast<CPUBackend*>(backend())->threadNumber();
600600
if (mSingleConvert.type > 0) {

0 commit comments

Comments
 (0)