Skip to content

Commit

Permalink
feat(service): support en language
Browse files Browse the repository at this point in the history
  • Loading branch information
tpoisonooo committed Jan 10, 2024
1 parent 00de0ba commit 791f388
Show file tree
Hide file tree
Showing 6 changed files with 356 additions and 322 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ pk/
badcase.txt
config.bak
config.ini
resource/prompt.txt
194 changes: 100 additions & 94 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,125 +1,126 @@
# HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

<small> 简体中文 | [English](README_en.md) </small>
<small> [简体中文](README_zh.md) | English </small>

[![GitHub license](https://img.shields.io/badge/license-GPL--3--Clause-brightgreen.svg)](./LICENSE)
![CI](https://img.shields.io/github/actions/workflow/status/internml/huixiangdou/lint.yml?branch=master)

“茴香豆”是一个基于 LLM 的领域特定知识助手。特点:
1. 应对群聊这类复杂场景,解答用户问题的同时,不会消息泛滥
2. 提出一套解答技术问题的算法 pipeline
3. 部署成本低,只需要 LLM 模型满足 4 个 trait 即可解答大部分用户问题,见[技术报告](./resource/HuixiangDou.pdf)
"HuixiangDou" is a domain-specific knowledge assistant based on the LLM. Features:
1. Deal with complex scenarios like group chats, answer user questions without causing message flooding.
2. Propose an algorithm pipeline for answering technical questions.
3. Low deployment cost, only need the LLM model to meet 4 traits can answer most of the user's questions, see [technical report](./resource/HuixiangDou.pdf).

View [HuixiangDou inside](./huixiangdou-inside.md).

查看[茴香豆已运行在哪些场景](./huixiangdou-inside.md)
# 🔥 Run

# 🔥 运行
We will take lmdeploy & mmpose as examples to explain how to deploy the knowledge assistant to Feishu group chat.

我们将以 lmdeploy & mmpose 为例,介绍如何把知识助手部署到飞书群
## STEP1. Establish Topic Feature Repository

## STEP1. 建立话题特征库
```bash
# 下载聊天话题
```shell
# Download chat topics
mkdir repodir
git clone https://github.com/openmmlab/mmpose --depth=1 repodir/mmpose
git clone https://github.com/internlm/lmdeploy --depth=1 repodir/lmdeploy

# 建立特征库
cd HuixiangDou && mkdir workdir # 创建工作目录
python3 -m pip install -r requirements.txt # 安装依赖
python3 service/feature_store.py repodir workdir # repodir 的特征保存到 workdir
# Establish feature repository
cd HuixiangDou && mkdir workdir # Create working directory
python3 -m pip install -r requirements.txt # Install dependencies
python3 service/feature_store.py repodir workdir # Save features from repodir to workdir
```
运行结束后,茴香豆能够区分应该处理哪些用户话题,哪些闲聊应该拒绝。请编辑 [good_questions](./resource/good_questions.json)[bad_questions](./resource/bad_questions.json),尝试自己的领域知识(医疗,金融,电力等)。

```bash
# 接受技术话题
process query: mmdeploy 现在支持 mmtrack 模型转换了么
process query: 有啥中文的 text to speech 模型吗?
# 拒绝闲聊
reject query: 今天中午吃什么?
reject query: 茴香豆是怎么做的

After running, HuixiangDou can distinguish which user topics should be dealt with and which chitchats should be rejected. Please edit [good_questions](./resource/good_questions.json) and [bad_questions](./resource/bad_questions.json), and try your own domain knowledge (medical, finance, electricity, etc.).

```shell
# Accept technical topics
process query: Does mmdeploy support mmtrack model conversion now?
process query: Are there any Chinese text to speech models?
# Reject chitchat
reject query: What to eat for lunch today?
reject query: How to make HuixiangDou?
```

## STEP2. 运行基础版技术助手
## STEP2. Run Basic Technical Assistant

**配置免费 TOKEN**
**Configure free TOKEN**

茴香豆使用了搜索引擎,点击 [serper 官网](https://serper.dev/api-key)获取限额 WEB_SEARCH_TOKEN,填入 `config.ini`
HuixiangDou uses a search engine. Click on the [serper official website](https://serper.dev/api-key) to obtain a quota-limited WEB_SEARCH_TOKEN and fill it in `config.ini`.

```bash
```shell
# config.ini
..
[web_search]
x_api_key = "${YOUR-X-API-KEY}"
..
```

**测试问答效果**
**Test Q&A Effect**

请保证 GPU 显存超过 20GB(如 3090 及以上),若显存较低请按 FAQ 修改。
Please ensure that the GPU memory is over 20GB (such as 3090 or above). If the memory is low, please modify it according to the FAQ.

首次运行将自动下载配置中的 internlm2-7B text2vec-large-chinese,请保证网络畅通。
The first run will automatically download the configuration of internlm2-7B and text2vec-large-chinese, please ensure network connectivity.

* **docker 用户**。如果你****使用 docker 环境,可以一次启动所有服务。
```bash
* **Non-docker users**. If you **don't** use docker environment, you can start all services at once.
```shell
# standalone
python3 main.py workdir --standalone
..
ErrorCode.SUCCESS, 请教下视频流检测 跳帧 造成框一闪一闪的 有好的优化办法吗
1. 帧率控制和跳帧策略是优化视频流检测性能的关键,但需要注意跳帧对检测结果的影响。
2. 多线程处理和缓存机制可以提高检测效率,但需要注意检测结果的稳定性。
3. 使用滑动窗口的方式可以减少跳帧和缓存对检测结果的影响。
ErrorCode.SUCCESS, Could you please advise if there is any good optimization method for video stream detection flickering caused by frame skipping?
1. Frame rate control and frame skipping strategy are key to optimizing video stream detection performance, but you need to pay attention to the impact of frame skipping on detection results.
2. Multithreading processing and caching mechanism can improve detection efficiency, but you need to pay attention to the stability of detection results.
3. The use of sliding window method can reduce the impact of frame skipping and caching on detection results.
```
* **docker 用户**。如果你正在使用 docker`HuixiangDou`Hybrid LLM Service 需要分离部署。
```bash
# 启动 LLM 服务
* **Docker users**. If you are using docker, HuixiangDou's Hybrid LLM Service needs to be deployed separately.
```shell
# Start LLM service
python3 service/llm_server_hybride.py
```
打开新终端,把 host IP 配置进 `config.ini`,运行
Open a new terminal, configure the host IP in `config.ini`, run
```bash
```shell
# config.ini
[llm]
..
client_url = "http://10.140.24.142:39999/inference" # 举例
client_url = "http://10.140.24.142:39999/inference" # example
python3 main.py workdir
```
## STEP3.集成到飞书[可选]
## STEP3. Integrate into Feishu [Optional]
点击[创建飞书自定义机器人](https://open.feishu.cn/document/client-docs/bot-v3/add-custom-bot),获取回调 WEBHOOK_URL,填写到 config.ini
Click [Create a Feishu Custom Robot](https://open.feishu.cn/document/client-docs/bot-v3/add-custom-bot) to get the WEBHOOK_URL callback, and fill it in the config.ini.
```bash
```shell
# config.ini
..
[frontend]
type = "lark"
webhook_url = "${YOUR-LARK-WEBHOOK-URL}"
```
运行。结束后,技术助手的答复将发送到飞书群。
```bash
Run. After it ends, the technical assistant's reply will be sent to the Feishu group chat.
```shell
python3 main.py workdir
```
<img src="./resource/figures/lark-example.png" width="400">
如果还需要读取飞书群消息,见[飞书开发者广场-添加应用能力-机器人](https://open.feishu.cn/app?lang=zh-CN)
If you still need to read Feishu group messages, see [Feishu Developer Square - Add Application Capabilities - Robots](https://open.feishu.cn/app?lang=zh-CN).
## STEP4.高精度配置[可选]
为了进一步提升助手的答复体验,以下特性,开启得越多越好。
## STEP4. High Accuracy Method [Optional]
To further enhance the experience of the assistant's response, the more features you turn on, the better.
1. 使用更高精度 local LLM
1. Use higher accuracy local LLM
把 config.ini 中的`llm.local` 模型调整为 `internlm2-20B`
此选项效果显著,但需要更大的 GPU 显存。
Adjust the `llm.local` model in config.ini to `internlm2-20B`.
This option has a significant effect, but requires more GPU memory.
2. Hybrid LLM Service
对于支持 openai 接口的 LLM 服务,茴香豆可以发挥它的 Long Context 能力。
以 kimi 为例,以下是 `config.ini` 配置示例:
For LLM services that support the openai interface, HuixiangDou can utilize its Long Context ability.
Using Kimi as an example, below is an example of `config.ini` configuration:
```bash
```shell
# config.ini
[llm.server]
..
Expand All @@ -129,83 +130,88 @@ python3 main.py workdir
remote_llm_max_text_length = 128000
remote_llm_model = "moonshot-v1-128k"
```
我们同样支持 gpt API。注意此特性会增加响应耗时和运行成本。
We also support the gpt API. Note that this feature will increase response time and operating costs.
3. repo 搜索增强
3. Repo search enhancement
此特性适合处理疑难问题,需要基础开发能力调整 prompt
This feature is suitable for handling difficult questions and requires basic development capabilities to adjust the prompt.
* 点击 [sourcegraph-account-access](https://sourcegraph.com/users/tpoisonooo/settings/tokens) 获取 token
* Click [sourcegraph-account-access](https://sourcegraph.com/users/tpoisonooo/settings/tokens) to get token
```bash
```shell
# open https://github.com/sourcegraph/src-cli#installation
curl -L https://sourcegraph.com/.api/src-cli/src_linux_amd64 -o /usr/local/bin/src && chmod +x /usr/local/bin/src
# token 填入 config.ini
# Fill the token into config.ini
[sg_search]
..
src_access_token = "${YOUR_ACCESS_TOKEN}"
```

* 编辑 repo 的名字和简介,我们以 opencompass 为例
```bash
* Edit the name and introduction of the repo, we take opencompass as an example
```shell
# config.ini
# add your repo here, we just take opencompass and lmdeploy as example
[sg_search.opencompass]
github_repo_id = "open-compass/opencompass"
introduction = "用于评测大型语言模型(LLM.."
introduction = "Used for evaluating large language models (LLM) .."
```
* 使用 `python3 service/sg_search.py` 单测,返回内容应包含 opencompass 源码和文档
```bash
* Use `python3 -m service.sg_search` for unit test, the returned content should include opencompass source code and documentation
```shell
python3 service/sg_search.py
..
"filepath": "opencompass/datasets/longbench/longbench_trivia_qa.py",
"content": "from datasets import Dataset..
```
运行 `main.py`,茴香豆将在合适的时机,启用搜索增强。
Run `main.py`, HuixiangDou will enable search enhancement when appropriate.
4. Tune Parameters
It is often unavoidable to adjust parameters with respect to business scenarios.
* Refer to [data.json](./tests/data.json) to add real data, run [test_intention_prompt.py](./tests/test_intention_prompt.py) to get suitable prompts and thresholds, and update them into [worker](./service/worker.py).
* Adjust the [number of search results](./service/worker.py) based on the maximum length supported by the model.
# 🛠️ FAQ
1. 如何接入其他 IM ?
* 微信。企业微信请查看[企业微信应用开发指南](https://developer.work.weixin.qq.com/document/path/90594);对于个人微信,我们已向微信团队确认暂无 API,须自行搜索学习
* 钉钉。参考[钉钉开放平台-自定义机器人接入](https://open.dingtalk.com/document/robots/custom-robot-access)
1. How to access other IMs?
* WeChat. For Enterprise WeChat, see [Enterprise WeChat Application Development Guide](https://developer.work.weixin.qq.com/document/path/90594) ; for personal WeChat, we have confirmed with the WeChat team that there is currently no API, you need to search and learn by yourself.
* DingTalk. Refer to [DingTalk Open Platform-Custom Robot Access](https://open.dingtalk.com/document/robots/custom-robot-access)
2. 机器人太高冷/太嘴碎怎么办?
2. What if the robot is too cold/too chatty?
* 把真实场景中,把应该回答的问题填入`resource/good_questions.json`,应该拒绝的填入`resource/bad_questions.json`
* 调整 `repodir` 中的主题内容,确保底库的 markdown 文档不包含场景无关内容
* Fill in the questions that should be answered in the real scenario into `resource/good_questions.json`, and fill the ones that should be rejected into `resource/bad_questions.json`.
* Adjust the theme content in `repodir` to ensure that the markdown documents in the main library do not contain irrelevant content.
重新执行 `service/feature_store.py`,更新阈值和特征库
Re-run `service/feature_store.py` to update thresholds and feature libraries.
3. 启动正常,但运行期间显存 OOM 怎么办?
3. Launch is normal, but out of memory during runtime?
基于 transformers 结构的 LLM 长文本需要更多显存,此时需要对模型做 kv cache 量化,如 [lmdeploy 量化说明](https://github.com/InternLM/lmdeploy/blob/main/docs/en/kv_int8.md)。然后使用 docker 独立部署 Hybrid LLM Service
LLM long text based on transformers structure requires more memory. At this time, kv cache quantization needs to be done on the model, such as [lmdeploy quantization description](https://github.com/InternLM/lmdeploy/blob/main/docs/en/kv_int8.md). Then use docker to independently deploy Hybrid LLM Service.
4. 如何接入其他 local LLM/ 接入后效果不理想怎么办?
4. How to access other local LLM / After access, the effect is not ideal?
* 打开 [hybrid llm service](./service/llm_server_hybrid.py),增加新的 LLM 推理实现
* 参照 [test_intention_prompt 和测试数据](./tests/test_intention_prompt.py),针对新模型调整 prompt 和阈值,更新到 [worker.py](./service/worker.py)
* Open [hybrid llm service](./service/llm_server_hybrid.py), add a new LLM inference implementation.
* Refer to [test_intention_prompt and test data](./tests/test_intention_prompt.py), adjust prompt and threshold for the new model, and update them into [worker.py](./service/worker.py).
5. 响应太慢/请求总是失败怎么办?
5. What if the response is too slow/request always fails?
* 参考 [hybrid llm service](./service/llm_server_hybrid.py) 增加指数退避重传
* local LLM 替换为 [lmdeploy](https://github.com/internlm/lmdeploy) 等推理框架,而非原生的 huggingface/transformers
5. GPU 显存太低怎么办?
* Refer to [hybrid llm service](./service/llm_server_hybrid.py) to add exponential backoff and retransmission.
* Replace local LLM with an inference framework such as [lmdeploy](https://github.com/internlm/lmdeploy), instead of the native huggingface/transformers.
此时无法运行 local LLM,只能用 remote LLM 配合 text2vec 执行 pipeline。请确保 `config.ini` 只使用 remote LLM,关闭 local LLM
6. What if the GPU memory is too low?
# 📝 引用
```bash
At this time, it is impossible to run local LLM, and only remote LLM can be used in conjunction with text2vec to execute the pipeline. Please make sure that `config.ini` only uses remote LLM and turn off local LLM.
# 📝 Citation
```shell
@misc{2023HuixiangDou,
title={HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance},
author={HuixiangDou Contributors},
howpublished = {\url{https://github.com/internlm/huixiangdou}},
year={2023}
}
```
```
Loading

0 comments on commit 791f388

Please sign in to comment.