Skip to content

Latest commit

 

History

History
40 lines (26 loc) · 2.2 KB

README.md

File metadata and controls

40 lines (26 loc) · 2.2 KB
Polyphone Disambiguation in Mandarin Chinese with Semi-Supervised Learning
 
Interspeech 2021
Yi Shi* Congyi Wang** Yu Chen Bin Wang
 

*First Author **Corresponding Author

This is the official repo of the interspeech 2021 paper: Polyphone Disambiguation in Mandarin Chinese with Semi-Supervised Learning. Here we provide data and other useful links.

Links

Paper | Video | Slides | DataTest | DataTrain

Dataset:

The data is split into training set and test set. The training set is scraped from the internet. The corresponding annotations are auto-labeled using traditional techniques and then fixed manually. The correstness of the training set is not guaranteed, so if you are able to find incorrect labels, please report in the issue section. The testset is created manually and cover some most difficult cases in real life conversations for robustness evaluation.

The format of training data file: aug<#local label>
The format of data: Pinyin/space/position/space/sentence/\n

License:

This is a research conducted by ([Xmov|魔珐科技]https://www.xmov.ai/about/). The usage of the dataset is restricted to eductaion and research purposes only.

If you would like to cite our paper:

@inproceedings{shi21semp,
  author={Shi,Yi and Wang,Congyi and Chen,Yu and Wang,Bin},
  title={Polyphone Disambiguation in Mandarin Chinese with Semi-Supervised Learning},
  year={2021},
  booktitle={Proc. Interspeech 2021},
  pages={4109--4113},
  doi={10.21437/Interspeech.2021-502}
}

For further question, you are welcome to contact [email protected]