Skip to content

Commit af8cfb7

Browse files
committed
优化NLU样本、改进训练pipline
1 parent 690e68b commit af8cfb7

File tree

13 files changed

+181
-13
lines changed

13 files changed

+181
-13
lines changed

ReadMe.md

Lines changed: 28 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,13 @@
55
**RASA 开发中文指南系列博文:**
66

77
- [Rasa中文聊天机器人开发指南(1):入门篇](https://jiangdg.blog.csdn.net/article/details/104328946)
8-
- Rasa中文聊天机器人开发指南(2):NLU篇
8+
- [Rasa中文聊天机器人开发指南(2):NLU篇](https://jiangdg.blog.csdn.net/article/details/104328946)
99
- Rasa中文聊天机器人开发指南(3):Core篇
10-
- Rasa中文聊天机器人开发指南(4):RasaX篇
10+
- Rasa中文聊天机器人开发指南(4):RasaX与模型评估
11+
- Rasa中文聊天机器人开发指南(5):浅析Mitie、spaCy和CRF实体识别器
12+
- Rasa中文聊天机器人开发指南(6):浅析Mitie、Sklearn和Embedding意图分类器
13+
14+
**注:本系列博客翻译自[Rasa官方文档](https://rasa.com/docs/rasa/),并融合了自己的理解和项目实战,同时对文档中涉及到的技术点进行了一定程度的扩展,目的是为了更好的理解Rasa工作机制和相关技术要点。与本系列博文配套的项目GitHub地址:[ChitChatAssistant](https://github.com/jiangdongguo/ChitChatAssistant),欢迎`star``issues`,我们共同讨论、学习!**
1115

1216

1317

@@ -57,11 +61,20 @@ pip install jieba
5761

5862
# 2. 训练模型
5963

60-
 当所有样本和配置文件准备好后,接下来就是训练模型了,打开命令终端执行下面的命令,该命令会同时训练NLU和Core模型,具体如下:
64+
 当所有样本和配置文件准备好后,接下来就是训练模型了,打开命令终端执行下面的命令,该命令会同时训练NLU和Core模型。
65+
66+
- 使用MITIE
67+
6168
```shell
6269
python -m rasa train --config configs/config.yml --domain configs/domain.yml --data data/
6370
```
6471

72+
- 使用Supervised_Embedding
73+
74+
```bash
75+
python -m rasa train --config configs/zh_jieba_supervised_embeddings_config.yml --domain configs/domain.yml --data data/
76+
```
77+
6578
# 3. 运行服务
6679

6780
**(1)启动Rasa服务**
@@ -91,24 +104,32 @@ Python -m rasa run actions --port 5055 --actions actions --debug
91104
python server.py
92105
```
93106

94-
**Rasa Server****Action Server****Server.py**运行后,在浏览器输入
107+
**Rasa Server****Action Server****Server.py**运行后,在浏览器输入测试
95108

96109
` http://127.0.0.1:8088/ai?content="查询广州明天的天气"`
97110

98-
返回的结果为:
111+
终端调用效果为:
112+
113+
![](https://img-blog.csdnimg.cn/20200227153932228.jpg?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0FuZHJFeHBlcnQ=,size_16,color_FFFFFF,t_70)
114+
99115

100-
![入门7](https://img-blog.csdnimg.cn/20200215154508497.png)
101116

102117
# 4. 更新日志
103118

104119

105120

106-
**(1)V1.0.0.2020.02.15**
121+
**(1)V1.0.2020.02.15**
107122

108123
- 创建项目,模型训练成功;
109124
- 前端访问Rasa服务器正常响应;
110125
- 对接图灵闲聊机器人、心知天气API,便于测试;
111126

127+
**(2)V1..1.2020.02.27**
128+
129+
- 优化NLU样本,尝试使用同义词、正则、查找表;
130+
- 改进supervised_embeddings,实体提取和意图识别明显提高,训练速度加快很多;
131+
- 完成`Rasa中文聊天机器人开发指南(2):NLU篇`文章撰写;
132+
112133

113134

114135
# 5. License

actions/action.py

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,39 @@
1515
)
1616

1717

18+
class PhoneForm(FormAction):
19+
20+
def name(self) -> Text:
21+
"""Unique identifier of the form"""
22+
23+
return "phone_form"
24+
25+
@staticmethod
26+
def required_slots(tracker: Tracker) -> List[Text]:
27+
"""A list of required slots that the form has to fill"""
28+
29+
return ["phone_number", "business"]
30+
31+
def submit(
32+
self,
33+
dispatcher: CollectingDispatcher,
34+
tracker: Tracker,
35+
domain: Dict[Text, Any],
36+
) -> List[Dict]:
37+
"""Define what the form has to do
38+
after all required slots are filled"""
39+
business = tracker.get_slot('business')
40+
number = tracker.get_slot('phone_number')
41+
42+
if business == "机主":
43+
dispatcher.utter_message("您要查询的号码{}属于隔壁老蒋".format(number))
44+
elif business == "余额":
45+
dispatcher.utter_message("您要查询的号码{}账户余额为66666元".format(number))
46+
else:
47+
dispatcher.utter_message("暂不支持查询{}业务".format(business))
48+
return [Restarted()]
49+
50+
1851
class WeatherForm(FormAction):
1952

2053
def name(self) -> Text:

configs/config.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ pipeline:
44
- name: "MitieNLP"
55
model: "data/total_word_feature_extractor_zh.dat"
66
- name: "JiebaTokenizer"
7+
dictionary_path: "data/dict"
78
- name: "MitieEntityExtractor"
89
- name: "EntitySynonymMapper"
910
- name: "RegexFeaturizer"

configs/domain.yml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,23 @@ intents:
77
- whoareyou
88
- whattodo
99
- request_weather
10+
- request_phone_business
1011

1112
slots:
1213
date-time:
1314
type: unfeaturized
1415
address:
1516
type: unfeaturized
17+
phone_number:
18+
type: unfeaturized
19+
business:
20+
type: unfeaturized
1621

1722
entities:
1823
- date-time
1924
- address
25+
- phone_number
26+
- business
2027

2128
actions:
2229
- utter_answer_affirm
@@ -28,10 +35,13 @@ actions:
2835
- utter_answer_whattodo
2936
- utter_ask_date-time
3037
- utter_ask_address
38+
- utter_ask_phone_number
39+
- utter_ask_business
3140
- action_default_fallback
3241

3342
forms:
3443
- weather_form
44+
- phone_form
3545

3646
responses:
3747
utter_answer_affirm:
@@ -72,5 +82,11 @@ responses:
7282
utter_ask_address:
7383
- text: "请问您要查下哪里的天气?"
7484

85+
utter_ask_phone_number:
86+
- text: "请问您要查的电话号码是多少?"
87+
88+
utter_ask_business:
89+
- text: "请问您要查询什么业务呢?"
90+
7591
utter_default:
7692
- text: "没听懂,请换种说法吧~"
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
language: "zh"
2+
3+
pipeline:
4+
- name: "JiebaTokenizer"
5+
dictionary_path: "data/dict"
6+
- name: "RegexFeaturizer"
7+
- name: "CRFEntityExtractor"
8+
- name: "EntitySynonymMapper"
9+
- name: "CountVectorsFeaturizer"
10+
- name: "CountVectorsFeaturizer"
11+
analyzer: "char_wb"
12+
min_ngram: 1
13+
max_ngram: 4
14+
- name: "EmbeddingIntentClassifier"
15+
16+
policies:
17+
- name: KerasPolicy
18+
epochs: 500
19+
max_history: 5
20+
- name: FallbackPolicy
21+
fallback_action_name: 'action_default_fallback'
22+
- name: MemoizationPolicy
23+
max_history: 5
24+
- name: FormPolicy

data/dict/userdict.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
谁的 5 n
2+
属于谁 5 n

data/lookup_tables/DataPackage.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
腾讯视频流量包
2+
爱奇艺会员流量包
3+
网易免流包
4+
抖音免流包
5+
流量月包
6+
酷狗定向流量包

data/nlu/nlu.md

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,4 +114,47 @@
114114
- [长沙](address)的天气
115115
- [深圳](address)[明天](date-time)的天气
116116
- 查下[今天](date-time)[上海](address)的天气
117-
- 帮我查查[佛山](address)[周六](date-time)的天气
117+
- 帮我查查[佛山](address)[周六](date-time)的天气
118+
119+
## intent:request_phone_business
120+
- 查电话[19820618425](phone_number)
121+
- 我想知道电话号码为[19860612425](phone_number)
122+
-[11160222425](phone_number)
123+
- 查电话号码[19800222425](phone_number)
124+
- [机主](business)
125+
- 号码是[谁的](business)
126+
- 这个号码是[属于谁](business)
127+
- 谁是这个号码的[拥有者](business)
128+
- 查下[机主信息](business)
129+
- [机主](business)是谁
130+
- 我要查这个号码的[账户余额](business)
131+
- 帮我查[余额](business)
132+
-[话费](business)
133+
- 能告诉我现在的[话费余额](business)还剩多少
134+
- 我想查电话号码[19860618422](phone_number)[账户余额](business)
135+
- 我要查下[19822618425](phone_number)[机主](business)是谁
136+
- 你好!请帮我查询一下电话[12260618425](phone_number)[账户余额](business)
137+
- 查一下手机号码[19862228425](phone_number)[机主信息](business)
138+
- 帮我查个手机号[19860612222](phone_number)[余额](business)
139+
- [19860222425](phone_number)[谁的](business)
140+
141+
142+
## synonym:机主
143+
- 机主信息
144+
- 机主
145+
- 拥有者
146+
- 谁的
147+
- 属于谁
148+
149+
## synonym:余额
150+
- 余额
151+
- 话费
152+
- 话费余额
153+
- 账户余额
154+
155+
156+
## regex:phone_number
157+
- ((\d{3,4}-)?\d{7,8})|(((\+86)|(86))?(1)\d{10})
158+
159+
## lookup: mobile_data_package
160+
data/lookup_tables/DataPackage.txt

data/stories/stories.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,4 +67,10 @@
6767
* request_weather
6868
- weather_form
6969
- form{"name": "weather_form"}
70+
- form{"name": null}
71+
72+
## happy path
73+
* request_phone_business
74+
- phone_form
75+
- form{"name": "phone_form"}
7076
- form{"name": null}

models/20200226-221429.tar.gz

7.59 MB
Binary file not shown.

0 commit comments

Comments
 (0)