- Dartmouth LING 48 Final Project: Improving TTS for Shanghainese
- Yuanhao Chen yuanhao.chen.25@dartmouth.edu Spring 2023
To build a text-to-speech (TTS) system for Shanghainese from scratch, seeking to improve the production of tone sandhi compared to existing models by paying special attention to preprocessing of text.
See writeup/main.pdf.
pip install -r phonemisation/requirements.txt
pip install -r speech_synthesis/requirements.txt
pip install -r comparison_questionnaire/requirements.txt # for analysis of questionnaire resultsSee speech_synthesis/README.md.
phonemisation/: contains the phonemisation module- See explanation of output in
phonemisation/__init__.py - Usage:
python -m phonemisation "text to phonemise" - Mechanism: Chinese sentence — word segmentation ⟶ Chinese words — romanisation ⟶ Shanghainese pinyin — phonemisation ⟶ Shanghainese phonemes
jiebais used for word segmentation- A Shanghainese dictionary I previously made is used for romanisation
- Uses
Qieyunmodule to add the tone number1to syllables of 陰平 yinping/inbin tone; other tones are phonologically unmarked
- Uses
- The
romanisation_to_ipafunction inromanisation.pycontains the phonemisation function
- See explanation of output in
make_metadata.py: uses thephonemisationmodule to convert transcription into IPA and generate metadata for training- See below in
data/
- See below in
data/: contains the dataset used for training- The transcriptions and audio files are adapted from this repo
- Downsampled to 16kHz for training
- Currently, only
shh.dict.cn/is used for training
- The
*/metadata.txtfiles are generated bymake_metadata.py
- The transcriptions and audio files are adapted from this repo
training/- Juptyer notebook for training the model
- Intended to be uploaded and run in Google Colab environment; needs to be modified for local use
- Uses the
coqui-ai/TTSrepo, which contains an implementation of VITS
writeup/: the write-upspeech_synthesis/: contains the speech synthesis model- See
speech_synthesis/README.mdfor more details
- See
comparison_questionnaire/: contains the questionnaire and audio files used to compare speech produced by this model, the Apple model, and a human speaker*-1.wav: produced by this model*-2.wav: produced by Apple VoiceOver (MacBook Pro 14-inch, 2021; MacOS Ventura 13.0.1)*-3.wav: spoken by myselfstats.ipynb: Jupyter notebook for analysing the questionnaire results