Skip to content

ilya16/speech-synthesis-course

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Speech Synthesis and Voice Cloning Course

This repository contains materials from my self-designed introductory course Speech Synthesis and Voice Cloning, delivered during the Independent Study Period 2025 (ISP'25) at Skoltech.

The course introduces students to modern deep learning–based speech synthesis technologies and provides hands-on experience in building personalized text-to-speech (TTS) models.

Contents

  • syllabus.pdf: full course syllabus as submitted for the ISP'25 course proposals. Also available online;
  • lectures/: complete set of lecture slides (PDF). The structure is described below.
  • isp-tts: accompanying repository with a simple TTS model implementation and example notebooks used in lectures and project work.

Course Structure

Module 1: Speech Synthesis and Text-to-Speech Datasets

  1. (22.01) What is Speech Synthesis?
    Text-to-speech demos, brief history, TTS pipeline
  2. (23.01) Audio Representations and Text Preprocessing
    Waveforms and spectrograms, graphemes and phonemes, practice
  3. (24.01) Creating Own Speech Dataset
    TTS dataset checklist, useful tools, practice, start of the project work

Module 2: Text-to-Speech models

  1. (27.01) Text-to-Speech Models. Acoustic models
    TTS architectures, code-along session
  2. (28.01) Model Training and Evaluation
    Training pipeline, evaluation metrics, code-along session
  3. (29.01) Vocoders. Voice Conversion
    Vocoder architectures, voice conversion models

Module 3: Project Work

  1. (30.01) Extra Topics and Project Work
    Expressive, emotional, multi-lingual and modern speech synthesis
  2. (31.01) Project Presentations and Wrap-up
    Presentations and fun

Project Work

Task:

Develop a working speech synthesis model trained or fine-tuned on your own voice data.

Steps:

  1. Collect and prepare a dataset
  2. Train or fine-tune a TTS model
  3. Present results and insights

Specifics:

  1. Groups of 2–3 students: explore and prepare solution together but everyone uses own dataset
  2. Baseline solution presented in class but free to use any models / ready-to-use solutions
  3. Peer voting for the most interesting projects

Evaluation:

Feedback on the individual steps and results, no formal grade.

Acknowledgements

All material was collected and prepared with care and love. References and attributions are included on individual slides.

This course was inspired by the following open courses:

  1. YSDA Speech Processing Course: https://github.com/yandexdataschool/speech_course
  2. HSE Deep Learning for Audio (DLA): https://github.com/markovka17/dla
  3. MIPT Deep Learning for Audio Course: https://github.com/severilov/DL-Audio-AIMasters-Course

About

An introduction course on Speech Synthesis and Voice Cloning (Skoltech ISP'25)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published