Skip to content

i4Ds/whisperx-replicate

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WhisperX Replicate

A Cog-based deployment of WhisperX for German speech-to-text transcription using the faster-whisper-large-v3-turbo model.

Overview

This repository packages WhisperX as a Replicate-compatible model, enabling easy deployment and inference via Cog. It uses:

  • WhisperX (i4ds fork) for transcription with VAD (Voice Activity Detection)
  • faster-whisper-large-v3-turbo model for fast, accurate German transcription
  • Cog for containerization and deployment to Replicate

Prerequisites

  • Cog installed
  • NVIDIA GPU with CUDA 12.1 support
  • Docker

Setup

1. Download the Model

First, download the model to your Hugging Face cache. You can use the helper script:

python get_models.py

This will download the model (i4ds/daily-brook-134) to your local Hugging Face cache.

2. Copy Model to Repository

Copy the cached model to the models/ directory:

./copy_models.sh

This creates the following structure:

models/
└── faster-whisper-large-v3-turbo/
    ├── config.json
    ├── tokenizer.json
    ├── vocabulary.json
    └── ...

3. Build the Cog Image

cog build

Usage

Run a Prediction

cog predict -i audio_file=@your_audio.mp3

Input Parameters

Parameter Description Default
audio_file Audio file to transcribe (required)
language Language (fixed to German) de
batch_size Parallelization for transcription 8
temperature Sampling temperature 0
vad_onset VAD onset threshold 0.500
vad_offset VAD offset threshold 0.363
align_output Enable word-level timestamps False
debug Print timing/memory info True

Example

cog predict -i [email protected]

Output

The prediction returns:

  • segments: Transcription in SRT subtitle format
  • detected_language: The detected language code (e.g., de)

File Structure

├── cog.yaml              # Cog configuration (CUDA, Python, dependencies)
├── predict.py            # Main prediction class for Cog
├── requirements.txt      # Python dependencies
├── copy_models.sh        # Script to copy model from HF cache
├── get_vad_model_url.py  # Helper to download model
└── models/               # Local model directory
    └── faster-whisper-large-v3-turbo/

Deployment to Replicate

cog login
cog push r8.im/your-username/whisperx-german

Notes

  • The model is hardcoded to German (de) transcription
  • Uses float16 compute type for GPU efficiency
  • VAD is enabled by default for better handling of speech segments

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 91.4%
  • Shell 8.6%