Skip to content

Commit 516f8d2

Browse files
authored
EN entity models and revert 20210211 to 20210205 (#6214)
* EN entity models and revert 20210211 to 20210205 * defaults update
1 parent 00783ab commit 516f8d2

File tree

2 files changed

+34
-14
lines changed

2 files changed

+34
-14
lines changed

Orchestrator/docs/NLRModels.md

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ It is a 6-layer pretrained [Transformer][7] model optimized for conversation.
1616
Its architecture is pretrained for example-based use ([KNN][3]),
1717
thus it can be used out of box. This is the default model used if none explicitly specified.
1818

19-
### pretrained.20210211.microsoft.dte.00.06.unicoder_multilingual.onnx
19+
### pretrained.20210205.microsoft.dte.00.06.unicoder_multilingual.onnx
2020
This is a high quality multilingual base model for intent detection. It's smaller and faster than its 12-layer alternative.
2121
It is a 6-layer pretrained pretrained [Transformer][7] model optimized for conversation.
2222
Its architecture is pretrained for example-based use ([KNN][3]), thus it can be used out of box. The model supports in total 100 languages (full list can be found at [XLMR Supported Languages][8]). 8 languages (EN, ES, DE, FR, IT, JA, PT, and ZH) are fine-tuned with additional data (performance can be found [here](#multilingual-intent-detection-models-evaluation)).
@@ -40,15 +40,18 @@ This is a high quality multilingual base model for intent detection.
4040
It is a 12-layer pretrained pretrained [Transformer][7] model optimized for conversation.
4141
Its architecture is pretrained for example-based use ([KNN][3]), thus it can be used out of box. The model supports in total 100 languages (full list can be found at [XLMR Supported Languages][8]). 8 languages (EN, ES, DE, FR, IT, JA, PT, and ZH) are fine-tuned with additional data (performance can be found [here](#multilingual-intent-detection-models-evaluation)).
4242

43-
44-
4543
## Experimental Models
4644

4745
### pretrained.20210205.microsoft.dte.00.12.bert_example_ner.en.onnx (experimental)
4846
This is a high quality EN-only base model for entity extraction.
4947
It is a 12-layer pretrained pretrained [Transformer][7] model optimized for conversation.
5048
Its architecture is pretrained for example-based use ([KNN][3]), thus it can be used out of box.
5149

50+
### pretrained.20210218.microsoft.dte.00.12.bert_example_ner.en.onnx (experimental)
51+
This is a yet another high quality EN-only base model for entity extraction.
52+
It is a 12-layer pretrained pretrained [Transformer][7] model optimized for conversation.
53+
Its architecture is pretrained for example-based use ([KNN][3]), thus it can be used out of box.
54+
5255
### pretrained.20210105.microsoft.dte.00.12.bert_example_ner_multilingual.onnx (experimental)
5356
This is a high quality multilingual base model for entity extraction.
5457
It is a 12-layer pretrained pretrained [Transformer][7] model optimized for conversation.
@@ -64,7 +67,12 @@ This is a high quality EN-only base model for entity extraction. It's smaller an
6467
It is a 6-layer pretrained pretrained [Transformer][7] model optimized for conversation.
6568
Its architecture is pretrained for example-based use ([KNN][3]), thus it can be used out of box.
6669

67-
### pretrained.20210211.microsoft.dte.00.06.bert_example_ner_multilingual.onnx (experimental)
70+
### pretrained.20210218.microsoft.dte.00.06.bert_example_ner.en.onnx (experimental)
71+
This is a high quality EN-only base model for entity extraction. It's smaller and faster than its 12-layer alternative.
72+
It is a 6-layer pretrained pretrained [Transformer][7] model optimized for conversation.
73+
Its architecture is pretrained for example-based use ([KNN][3]), thus it can be used out of box.
74+
75+
### pretrained.20210205.microsoft.dte.00.06.bert_example_ner_multilingual.onnx (experimental)
6876
This is a high quality multilingual base model for entity extraction. It's smaller and faster than its 12-layer alternative.
6977
It is a 6-layer pretrained pretrained [Transformer][7] model optimized for conversation.
7078
Its architecture is pretrained for example-based use ([KNN][3]), thus it can be used out of box.
@@ -103,21 +111,21 @@ For a more quantitative comparison analysis of the different models see the foll
103111

104112
| Model | Base Model | Layers | Encoding time per query | Disk Allocation |
105113
| ------------------------------------------------------------ | ---------- | ------ | ----------------------- | --------------- |
106-
| pretrained.20210211.microsoft.dte.00.06.unicoder_multilingual.onnx | Unicoder | 6 | ~ 16 ms | 896M |
114+
| pretrained.20210205.microsoft.dte.00.06.unicoder_multilingual.onnx | Unicoder | 6 | ~ 16 ms | 896M |
107115
| pretrained.20201210.microsoft.dte.00.12.unicoder_multilingual.onnx | Unicoder | 12 | ~ 30 ms | 1.08G |
108116

109117
- The following table shows how accurate is each model by training and testing on the same language, evaluated by **micro-average-accuracy** on an internal dataset.
110118

111119
| Model | de-de | en-us | es-es | es-mx | fr-ca | fr-fr | it-it | ja-jp | pt-br | zh-cn |
112120
| ------------------------------------------------------------ | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
113-
| pretrained.20210211.microsoft.dte.00.06.unicoder_multilingual.onnx | 0.638 | 0.785 | 0.662 | 0.760 | 0.723 | 0.661 | 0.701 | 0.786 | 0.735 | 0.805 |
121+
| pretrained.20210205.microsoft.dte.00.06.unicoder_multilingual.onnx | 0.638 | 0.785 | 0.662 | 0.760 | 0.723 | 0.661 | 0.701 | 0.786 | 0.735 | 0.805 |
114122
| pretrained.20201210.microsoft.dte.00.12.unicoder_multilingual.onnx | 0.642 | 0.764 | 0.646 | 0.754 | 0.722 | 0.636 | 0.689 | 0.789 | 0.725 | 0.809 |
115123

116124
- The following table shows how accurate is each model by training on **en-us** and testing on the different languages, evaluated by **micro-average-accuracy** on an internal dataset.
117125

118126
| Model | de-de | en-us | es-es | es-mx | fr-ca | fr-fr | it-it | ja-jp | pt-br | zh-cn |
119127
| ------------------------------------------------------------ | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
120-
| pretrained.20210211.microsoft.dte.00.06.unicoder_multilingual.onnx | 0.495 | 0.785 | 0.530 | 0.621 | 0.560 | 0.518 | 0.546 | 0.663 | 0.568 | 0.687 |
128+
| pretrained.20210205.microsoft.dte.00.06.unicoder_multilingual.onnx | 0.495 | 0.785 | 0.530 | 0.621 | 0.560 | 0.518 | 0.546 | 0.663 | 0.568 | 0.687 |
121129
| pretrained.20201210.microsoft.dte.00.12.unicoder_multilingual.onnx | 0.499 | 0.764 | 0.529 | 0.604 | 0.562 | 0.515 | 0.547 | 0.646 | 0.555 | 0.681 |
122130

123131
### English Entity Extraction Models Evaluation

Orchestrator/v0.2/nlr_versions.json

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"version": "0.2",
33
"defaults": {
44
"en_intent": "pretrained.20200924.microsoft.dte.00.06.en.onnx",
5-
"multilingual_intent": "pretrained.20210211.microsoft.dte.00.06.unicoder_multilingual.onnx"
5+
"multilingual_intent": "pretrained.20210205.microsoft.dte.00.06.unicoder_multilingual.onnx"
66
},
77
"models": {
88
"pretrained.20200924.microsoft.dte.00.03.en.onnx": {
@@ -29,6 +29,12 @@
2929
"description": "(experimental) Bot Framework SDK release 4.10 - English ONNX V1.4 12-layer per-token entity base model",
3030
"minSDKVersion": "4.10.0"
3131
},
32+
"pretrained.20210218.microsoft.dte.00.12.bert_example_ner.en.onnx": {
33+
"releaseDate": "02/18/2021",
34+
"modelUri": "https://models.botframework.com/models/dte/onnx/pretrained.20210218.microsoft.dte.00.12.bert_example_ner.en.onnx.zip",
35+
"description": "(experimental) Bot Framework SDK release 4.10 - English ONNX V1.4 12-layer per-token entity base model",
36+
"minSDKVersion": "4.10.0"
37+
},
3238
"pretrained.20201210.microsoft.dte.00.12.unicoder_multilingual.onnx": {
3339
"releaseDate": "12/10/2020",
3440
"modelUri": "https://models.botframework.com/models/dte/onnx/pretrained.20201210.microsoft.dte.00.12.unicoder_multilingual.onnx.zip",
@@ -53,15 +59,21 @@
5359
"description": "(experimental) Bot Framework SDK release 4.10 - English ONNX V1.4 6-layer per-token entity base model",
5460
"minSDKVersion": "4.10.0"
5561
},
56-
"pretrained.20210211.microsoft.dte.00.06.unicoder_multilingual.onnx": {
57-
"releaseDate": "02/11/2021",
58-
"modelUri": "https://models.botframework.com/models/dte/onnx/pretrained.20210211.microsoft.dte.00.06.unicoder_multilingual.onnx.zip",
62+
"pretrained.20210218.microsoft.dte.00.06.bert_example_ner.en.onnx": {
63+
"releaseDate": "02/18/2021",
64+
"modelUri": "https://models.botframework.com/models/dte/onnx/pretrained.20210218.microsoft.dte.00.06.bert_example_ner.en.onnx.zip",
65+
"description": "(experimental) Bot Framework SDK release 4.10 - English ONNX V1.4 6-layer per-token entity base model",
66+
"minSDKVersion": "4.10.0"
67+
},
68+
"pretrained.20210205.microsoft.dte.00.06.unicoder_multilingual.onnx": {
69+
"releaseDate": "02/05/2021",
70+
"modelUri": "https://models.botframework.com/models/dte/onnx/pretrained.20210205.microsoft.dte.00.06.unicoder_multilingual.onnx.zip",
5971
"description": "Bot Framework SDK release 4.10 - Multilingual ONNX V1.4 6-layer per-token intent base model",
6072
"minSDKVersion": "4.10.0"
6173
},
62-
"pretrained.20210211.microsoft.dte.00.06.bert_example_ner_multilingual.onnx": {
63-
"releaseDate": "02/11/2021",
64-
"modelUri": "https://models.botframework.com/models/dte/onnx/pretrained.20210211.microsoft.dte.00.06.bert_example_ner_multilingual.onnx.zip",
74+
"pretrained.20210205.microsoft.dte.00.06.bert_example_ner_multilingual.onnx": {
75+
"releaseDate": "02/05/2021",
76+
"modelUri": "https://models.botframework.com/models/dte/onnx/pretrained.20210205.microsoft.dte.00.06.bert_example_ner_multilingual.onnx.zip",
6577
"description": "(experimental) Bot Framework SDK release 4.10 - Multilingual ONNX V1.4 6-layer per-token entity base model",
6678
"minSDKVersion": "4.10.0"
6779
},

0 commit comments

Comments
 (0)