-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathREADME.txt
50 lines (31 loc) · 1.76 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Tencent AI Lab embedding corpus for Chinese words and phrases (first released in October 2018)
URL: https://ai.tencent.com/ailab/nlp/data/Tencent_AILab_ChineseEmbedding.tar.gz
==== Content ====
I. Description of the corpus
II. Contact
III. Citation
IV. Disclaimer
==== I. Description of the corpus ====
The pre-trained embeddings are in Tencent_AILab_ChineseEmbedding.txt.
>> Data structure
The first line shows the total number of embeddings and their dimension size, separated by a space. In each line below, there are two fields separated by a space: <the Chinese word or phrase> and <the corresponding embedding vector>. For each embedding vector, its values in different dimensions are also separated by spaces.
==== II. Contact ====
Should you have any questions, please contact [email protected] (Tencent AI Lab natural language understanding team).
==== III. Citation ====
If you use or refer to our corpus, please support us by citing our paper:
@InProceedings{N18-2028,
author = "Song, Yan
and Shi, Shuming
and Li, Jing
and Zhang, Haisong",
title = "Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings",
booktitle = "Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)",
year = "2018",
publisher = "Association for Computational Linguistics",
pages = "175--180",
location = "New Orleans, Louisiana",
url = "http://aclweb.org/anthology/N18-2028"
}
Many thanks!
==== IV. Disclaimer ====
This corpus is for research purpose only and released under a Creative Commons Attribution 3.0 Unported License (http://creativecommons.org/licenses/by/3.0/).