Hi everyone,
I recently published Building Speech AI: Speech Representation, Understanding & Synthesis. A Practitioner’s Guide.
The book is meant for practitioners building real speech systems. It covers audio representations, ASR, Whisper-style models, neural TTS, voice cloning ethics, deployment, evaluation, and audio language models. I also put together a companion code repo with runnable examples.
Landing page: https://prdeepakbabu.github.io/building-speech-ai
Code repo: https://github.com/prdeepakbabu/building-speech-ai
I’d be grateful for feedback from this community, especially on what examples or failure modes would be most useful to add next.
Thanks,
Deepak
Hi everyone,
I recently published Building Speech AI: Speech Representation, Understanding & Synthesis. A Practitioner’s Guide.
The book is meant for practitioners building real speech systems. It covers audio representations, ASR, Whisper-style models, neural TTS, voice cloning ethics, deployment, evaluation, and audio language models. I also put together a companion code repo with runnable examples.
Landing page: https://prdeepakbabu.github.io/building-speech-ai
Code repo: https://github.com/prdeepakbabu/building-speech-ai
I’d be grateful for feedback from this community, especially on what examples or failure modes would be most useful to add next.
Thanks,
Deepak