Speech Synthesis and Voice Cloning Course

This repository contains materials from my self-designed introductory course Speech Synthesis and Voice Cloning, delivered during the Independent Study Period 2025 (ISP'25) at Skoltech.

The course introduces students to modern deep learning–based speech synthesis technologies and provides hands-on experience in building personalized text-to-speech (TTS) models.

syllabus.pdf: full course syllabus as submitted for the ISP'25 course proposals. Also available online;
lectures/: complete set of lecture slides (PDF). The structure is described below.
isp-tts: accompanying repository with a simple TTS model implementation and example notebooks used in lectures and project work.

Course Structure

Module 1: Speech Synthesis and Text-to-Speech Datasets

(22.01) What is Speech Synthesis?
Text-to-speech demos, brief history, TTS pipeline
(23.01) Audio Representations and Text Preprocessing
Waveforms and spectrograms, graphemes and phonemes, practice
(24.01) Creating Own Speech Dataset
TTS dataset checklist, useful tools, practice, start of the project work

Module 2: Text-to-Speech models

(27.01) Text-to-Speech Models. Acoustic models
TTS architectures, code-along session
(28.01) Model Training and Evaluation
Training pipeline, evaluation metrics, code-along session
(29.01) Vocoders. Voice Conversion
Vocoder architectures, voice conversion models

Module 3: Project Work

(30.01) Extra Topics and Project Work
Expressive, emotional, multi-lingual and modern speech synthesis
(31.01) Project Presentations and Wrap-up
Presentations and fun

Project Work

Task:

Develop a working speech synthesis model trained or fine-tuned on your own voice data.

Steps:

Collect and prepare a dataset
Train or fine-tune a TTS model
Present results and insights

Specifics:

Groups of 2–3 students: explore and prepare solution together but everyone uses own dataset
Baseline solution presented in class but free to use any models / ready-to-use solutions
Peer voting for the most interesting projects

Evaluation:

Feedback on the individual steps and results, no formal grade.

Acknowledgements

All material was collected and prepared with care and love. References and attributions are included on individual slides.

This course was inspired by the following open courses:

YSDA Speech Processing Course: https://github.com/yandexdataschool/speech_course
HSE Deep Learning for Audio (DLA): https://github.com/markovka17/dla
MIPT Deep Learning for Audio Course: https://github.com/severilov/DL-Audio-AIMasters-Course

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
lectures		lectures
LICENSE		LICENSE
README.md		README.md
syllabus.pdf		syllabus.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speech Synthesis and Voice Cloning Course

Contents

Course Structure

Module 1: Speech Synthesis and Text-to-Speech Datasets

Module 2: Text-to-Speech models

Module 3: Project Work

Project Work

Task:

Steps:

Specifics:

Evaluation:

Acknowledgements

About

Uh oh!

Releases

License

ilya16/speech-synthesis-course

Folders and files

Latest commit

History

Repository files navigation

Speech Synthesis and Voice Cloning Course

Contents

Course Structure

Module 1: Speech Synthesis and Text-to-Speech Datasets

Module 2: Text-to-Speech models

Module 3: Project Work

Project Work

Task:

Steps:

Specifics:

Evaluation:

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases