Fast, reliable SRT subtitle generator using Faster-Whisper (CTranslate2).
- β No Compatibility Issues: No PyTorch/NumPy version hell
- β 4x Faster on CPU: CTranslate2 optimized inference
- β Lower Memory: 2-4x less RAM than PyTorch Whisper
- β Same Quality: Uses official OpenAI Whisper models
- β Parallel Processing: Multi-threaded with persistent worker pools
- β Smart Chunking: Splits at natural silence boundaries
- β Just Works: Install and run, no patches needed
- β Python 3.11-3.13: Full support for latest Python
- Python 3.11+ (including Python 3.13 β )
- FFmpeg
# Quick setup with Makefile
make prepare-prod
source venv/bin/activate
# Or manual install
pip install -e .
# Verify installation
whisper-srt --help# Process single video (uses parallel processing by default)
whisper-srt video.mp4# For short videos or single-core CPUs
whisper-srt video.mp4 --no-chunking# Adjust chunk size and silence detection
whisper-srt video.mp4 \
--target-chunk-duration 180 \
--min-silence-duration 1.0 \
--silence-threshold -35dB \
--workers 6
# Custom chunking for better balance
whisper-srt video.mp4 \
--workers 4 \
--target-chunk-duration 240 \
--min-silence-duration 2.0# Catch short utterances ("Yeah", "Bye", "Hi")
whisper-srt video.mp4 \
--vad-threshold 0.025 \
--vad-min-speech-duration 50 \
--vad-min-silence-duration 250
# Maximum sensitivity (catch everything)
whisper-srt video.mp4 \
--vad-threshold 0.01 \
--vad-min-speech-duration 30# Enable duration adjustment (recommended)
whisper-srt video.mp4
# Disable duration adjustment (keep Whisper's original timing)
whisper-srt video.mp4 --no-adjust-durations
# Custom duration constraints
whisper-srt video.mp4 \
--min-duration 1.0 \
--max-duration 5.0 \
--chars-per-second 18# Process all videos in directory
whisper-srt-batch /path/to/videos -m small -l en
# Recursive search with all options
whisper-srt-batch /path/to/videos \
-m medium \
-l en \
--recursive \
--vad-threshold 0.025 \
--vad-min-silence-duration 250 \
--min-duration 0.7 \
--max-duration 7.0
# Process TV show season (automatically skips existing .srt files)
whisper-srt-batch /videos/GoodLuckCharlie/Season1 \
-m small \
-l en \
--recursive
# Re-process everything (overwrite existing)
whisper-srt-batch /videos -m small -l en --no-skip
For TV shows with rapid dialogue (like sitcoms):
whisper-srt video.mp4 \
-m small \
-l en \
--vad-threshold 0.025 \
--vad-min-speech-duration 50 \
--vad-min-silence-duration 250 \
--min-duration 0.7 \
--max-duration 7.0 \
--chars-per-second 20 \
--workers 4Results:
- ~90%+ dialogue coverage
- Perfect timing (0.7-7s range)
- Natural subtitle breaks
- Production ready
What each parameter does:
--vad-threshold 0.025- Very sensitive (catches soft dialogue)--vad-min-speech-duration 50- Catches short utterances ("Yeah!", "Hi")--vad-min-silence-duration 250- Splits on brief pauses (more natural breaks)--min-duration 0.7- Minimum subtitle display time--max-duration 7.0- Limits maximum subtitle time--chars-per-second 20- Comfortable reading speed--workers 4- Parallel processing with 4 workers
| Command | Purpose | Best For |
|---|---|---|
whisper-srt |
Process single video | All videos (parallel by default) |
whisper-srt-batch |
Batch process directory | Multiple files with model reuse |
whisper-srt-compare |
Quality validation | Verify against transcript |
# Compare generated SRT with original transcript
whisper-srt-compare output.srt original.txt| Option | Description | Default |
|---|---|---|
video_path |
Input video file | Required |
-o, --output |
Output SRT file | {video}.srt |
-m, --model |
Model size | small |
-l, --language |
Language code (e.g., 'en') | Auto-detect |
--device |
Device (cpu/cuda) | cpu |
--compute-type |
Compute type (int8/float16/float32) | Auto |
--workers |
Number of workers | cpu_count/2 |
| Option | Description | Default |
|---|---|---|
--no-chunking |
Disable parallel chunking | False |
| Option | Description | Default |
|---|---|---|
--target-chunk-duration |
Target chunk size (seconds) | 300.0 |
--min-silence-duration |
Min silence for chunk splitting | 2.0 |
--silence-threshold |
Silence detection threshold (dB) | -30dB |
| Option | Description | Default |
|---|---|---|
--no-vad |
Disable VAD | False |
--vad-threshold |
VAD sensitivity (0.0-1.0) | 0.05 |
--vad-min-speech-duration |
Min speech duration (ms) | 50 |
--vad-min-silence-duration |
Min silence for splitting (ms) | 500 |
| Option | Description | Default |
|---|---|---|
--no-adjust-durations |
Disable duration adjustment | False |
--min-duration |
Minimum subtitle duration (s) | 0.7 |
--max-duration |
Maximum subtitle duration (s) | 7.0 |
--chars-per-second |
Reading speed (characters/s) | 20.0 |
| Option | Description | Default |
|---|---|---|
--recursive |
Search subdirectories | False |
--no-skip |
Process even if SRT exists | False |
--enable-chunking |
Use chunking in batch mode | False |
make help # Show all commands
make prepare-prod # Setup environment
make run FILE=video.mp4 # Process video
make batch DIR=/videos # Batch process
make clean # Remove venv
make clean-output # Remove SRT files| Model | Speed | Quality | RAM | Best For |
|---|---|---|---|---|
| tiny | β‘β‘β‘β‘β‘ | ββ | 1GB | Quick tests |
| base | β‘β‘β‘β‘ | βββ | 1GB | Fast processing |
| small | β‘β‘β‘ | ββββ | 2GB | Good balance (default) |
| medium | β‘β‘ | βββββ | 5GB | High quality |
| large-v2 | β‘ | βββββ | 10GB | Best quality |
| large-v3 | β‘ | βββββ | 10GB | Latest & best |
99 languages supported. Common codes:
en- Englishes- Spanishfr- Frenchde- Germanzh- Chineseja- Japaneseko- Koreanru- Russian
1. FFmpeg not found
# macOS
brew install ffmpeg
# Ubuntu
sudo apt install ffmpeg2. CUDA out of memory
whisper-srt video.mp4 --device cpu
# or use smaller model
whisper-srt video.mp4 -m tiny3. No silence gaps detected (single huge chunk)
# Adjust silence detection to be more sensitive
whisper-srt video.mp4 \
--min-silence-duration 1.0 \
--silence-threshold -35dB
# Or use sequential mode for better performance
whisper-srt video.mp4 --no-chunking4. Missing short subtitles ("Yeah", "Hi", "Bye")
See VAD_TUNING_GUIDE.md for detailed tuning instructions:
whisper-srt video.mp4 \
--vad-threshold 0.025 \
--vad-min-speech-duration 50 \
--vad-min-silence-duration 2505. OpenMP library warning (macOS)
The script automatically handles this. You can ignore the warning.
6. Batch processing: All files skipped
# Files already have .srt - use --no-skip to reprocess
whisper-srt-batch /videos --no-skip -m smallFaster-Whisper vs Original Whisper:
- 4x faster on CPU
- 2-3x faster on GPU
- 2-4x less memory
- Same transcription quality
1-hour video benchmarks (approximate):
| Model | CPU Time (4 workers) | GPU Time |
|---|---|---|
| tiny | ~4 min | ~2 min |
| base | ~8 min | ~3 min |
| small | ~15 min | ~5 min |
| medium | ~30 min | ~8 min |
Note: Performance varies by hardware, video content, worker count, and VAD settings.
Batch Mode Optimization:
Processing 10 videos with batch mode:
- Sequential (10 separate calls): Load model 10 times
- Batch mode: Load model once, reuse for all 10 videos
- Speed improvement: 2-5x faster!
Example: Processing a TV Season (25 episodes, ~23min each)
whisper-srt-batch /videos/season1 -m small -l en --recursive- Total video time: ~9.5 hours
- Processing time: ~3.5 hours (with batch optimization)
- Model loads: 1 time (vs 25 times without batch)
- Time saved: ~30-45 minutes from model reuse alone!
from whisper_srt import Processor
# Single video processing
with Processor(model_size="small", device="cpu", num_workers=4) as processor:
output = processor.process_video(
video_path="video.mp4",
language="en",
enable_chunking=True,
vad_threshold=0.025,
vad_min_speech_duration=50,
vad_min_silence_duration=250,
adjust_durations=True,
min_duration=0.7,
max_duration=7.0,
chars_per_second=20.0,
)
print(f"Generated: {output}")
# Batch processing with processor reuse
with Processor(model_size="small", device="cpu", num_workers=1) as processor:
for video in video_list:
output = processor.process_video(
video_path=video,
language="en",
enable_chunking=False, # Sequential mode for batch
)
print(f"Processed: {output}")- VAD_TUNING_GUIDE.md - Fix missing subtitles by tuning VAD parameters
- MATCH_ALGORITHM.md - Detailed explanation of the matching algorithm
- CHANGELOG.md - Version history and migration guide
Contributions welcome! Please open an issue or pull request on GitHub.
MIT License - See LICENSE
- Faster-Whisper - CTranslate2 backend
- OpenAI Whisper - Original models
- FFmpeg - Audio extraction