README.txt

Tencent AI Lab embedding corpus for Chinese words and phrases (first released in October 2018)
URL: https://ai.tencent.com/ailab/nlp/data/Tencent_AILab_ChineseEmbedding.tar.gz


==== Content ====

I. Description of the corpus
II. Contact
III. Citation
IV. Disclaimer


==== I. Description of the corpus ====

The pre-trained embeddings are in Tencent_AILab_ChineseEmbedding.txt.

>> Data structure

The first line shows the total number of embeddings and their dimension size, separated by a space. In each line below, there are two fields separated by a space: <the Chinese word or phrase> and <the corresponding embedding vector>. For each embedding vector, its values in different dimensions are also separated by spaces.


==== II. Contact ====

Should you have any questions, please contact nlu@tencent.com (Tencent AI Lab natural language understanding team).


==== III. Citation ====

If you use or refer to our corpus, please support us by citing our paper:

@InProceedings{N18-2028,
  author = 	"Song, Yan
		and Shi, Shuming
		and Li, Jing
		and Zhang, Haisong",
  title = 	"Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings",
  booktitle = 	"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"175--180",
  location = 	"New Orleans, Louisiana",
  url = 	"http://aclweb.org/anthology/N18-2028"
}

Many thanks!


==== IV. Disclaimer ====

This corpus is for research purpose only and released under a Creative Commons Attribution 3.0 Unported License (http://creativecommons.org/licenses/by/3.0/).