Skip to content

An advanced AI-powered tool that automatically translates and dubs YouTube videos into different languages while dynamically adjusting video speed. This project combines state-of-the-art speech recognition, translation, and voice cloning technologies to create natural-sounding dubbed videos.

License

Notifications You must be signed in to change notification settings

frrobledo/AutoDub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoDub

An advanced AI-powered tool that automatically translates and dubs YouTube videos into different languages while dynamically adjusting video speed. This project combines state-of-the-art speech recognition, translation, and voice cloning technologies to create natural-sounding dubbed videos.

Features

  • Automatic Video Processing: Downloads YouTube videos using yt-dlp and extracts audio automatically
  • Speech Recognition: Uses Whisper AI for accurate speech-to-text transcription
  • Voice Separation: Splits original audio into vocal and instrumental tracks using Spleeter
  • Neural Translation: Supports high-quality translation through DeepL API
  • Voice Cloning: Uses XTTS v2 for natural-sounding voice synthesis that matches the original speaker
  • Intelligent Video Speed Adjustment: Automatically adjusts video speed per speech segment to maintain lip-sync
  • Background Music Preservation: Maintains original background music and sound effects
  • Multi-language Support: Can translate and dub into multiple target languages

Prerequisites

  • Python 3.8+
  • CUDA-capable GPU (recommended for faster processing)
  • FFmpeg installed and added to system PATH

Installation

  1. Clone the repository:
git clone https://github.com/frrobledo/AutoDub.git
cd AutoDub
  1. Install required packages:
pip install -r requirements.txt
  1. Install additional dependencies:
apt-get install ffmpeg  # for debian based systems

For other OS, refer to the ffmpeg installation guide

  1. Set up API keys:
    • Create a DeepL API account and add your API key to the configuration

Project Structure

├── tools/
│   ├── audio_synthesis.py     # Voice cloning and audio processing
│   ├── transcriber.py         # Speech recognition and translation
│   ├── video_editing.py       # Video speed adjustment and editing
│   ├── video_downloader.py    # YouTube video downloading
│   ├── audio_splitter_ffmpeg.py # Audio separation
│   └── logger.py             # Logging utilities
├── main.py                   # Main execution script
└── README.md

Usage

  1. Run the main script:
python main.py
  1. Enter the YouTube URL when prompted.

  2. The script will automatically:

    • Download the video
    • Extract and transcribe the audio
    • Separate speech from background audio
    • Translate the speech
    • Clone the voice in the target language
    • Adjust video speed for lip-sync
    • Combine everything into the final video
  3. Find the output video in the final_output directory.

How It Works

  1. Video Processing:

    • Downloads YouTube video using yt-dlp
    • Extracts audio track
    • Separates vocals from background using Spleeter
  2. Speech Processing:

    • Transcribes speech using Whisper AI
    • Detects spoken language automatically
    • Translates text using DeepL API
  3. Voice Synthesis:

    • Clones original voice using XTTS v2
    • Generates speech in target language
    • Matches timing of original speech segments
  4. Video Adjustment:

    • Analyzes duration of original vs. translated speech
    • Adjusts video speed per segment for lip-sync
    • Preserves original background audio
    • Combines all elements into final video

Configuration

The project creates several directories for processing:

  • downloads/: Downloaded YouTube videos
  • original_audios/: Extracted audio files
  • output_audio/: Processed audio segments
  • final_output/: Final dubbed videos
  • logs/: Processing logs

Known Limitations

  • Video quality depends on source YouTube video
  • For some languages, audio generation can produce artifacts and very slow/fast segments
  • Processing time varies based on video length and hardware
  • Some languages may have better results than others

Contributing

Contributions are welcome! Please feel free to submit pull requests or create issues for bugs and feature requests.

Acknowledgments

Contact

For questions or support, please create an issue in the GitHub repository.

About

An advanced AI-powered tool that automatically translates and dubs YouTube videos into different languages while dynamically adjusting video speed. This project combines state-of-the-art speech recognition, translation, and voice cloning technologies to create natural-sounding dubbed videos.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages