VoiceAI

This project uses the Whisper.ai (more detail at github.com/openai/whisper) model to transcribe audio and do multiple things with it. All the translation is done through the googletrans library. The program also uses Tkinter to show the subtitles on the screen. To play the translated audio, a local Coqui TTS server is set up to receive text and output audio file. The VB Audio cable software is also used to make the output of the device into an input source.

Some use cases:

Live subtitles

Transcribed audio will be translated into the desired target language and shown on screen with Tkinter.

Demo

live-subtitles.mp4
Speech to text to speech (dubbing with translation)

Use Whisper.ai to transcribe your voice (or from an audio source) and use Coqui TTS to speak it. This works as a live vocal translation

How to run

Make sure to have all the libraries installed and then just run the main.py file with the correct flags

If you use Windows, then you can try to make a batch file like the main.bat to activate a virtual environment and run the code

Some flags are useful such as --save_files because it stores the audio files temporarily and transcribes them one by one

Run the live subtitles

An example command to run the live subtitles feature

python main.py --save_file --subtitles

Run the dubbing translation

To run this feature, first run the TTS server, all the installation and setup can be found at https://github.com/coqui-ai/TTS:

tts-server --model_name "tts_models/en/vctk/vits" --use_cuda True

The models and the GPU option are up to you. This config uses the GPU and the English model vits.

After the TTS server is running in the localhost, run this command to run the live dubbing feature

python main.py --save_file --dubbing

Improvements

Some enhancements can be made such as:

Use a better translation service such as DeepL for more natural translations
Use native win32api to show text better
Improve the threading, multiprocesses to run efficiently
Try different thresholds for recording, and the interval for showing subtitles,...

This project is inspired by these similar projects

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
modules		modules
temp		temp
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
live_subtitles.bat		live_subtitles.bat
main.bat		main.bat
main.py		main.py
requirements.txt		requirements.txt
tts.bat		tts.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VoiceAI

Some use cases:

Live subtitles

Demo

Speech to text to speech (dubbing with translation)

How to run

Run the live subtitles

Run the dubbing translation

Improvements

This project is inspired by these similar projects

About

Uh oh!

Releases

Packages

Uh oh!

Languages

hoangtamthai/VoiceAI

Folders and files

Latest commit

History

Repository files navigation

VoiceAI

Some use cases:

Live subtitles

Demo

Speech to text to speech (dubbing with translation)

How to run

Run the live subtitles

Run the dubbing translation

Improvements

This project is inspired by these similar projects

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages