MediaHeist is a modular, high-performance automation toolkit for downloading, processing, and summarizing audio-visual content, with a focus on YouTube videos and local media files. It provides a robust pipeline covering download, audio extraction, keyframe analysis, subtitle generation, and AI-powered summarization, all orchestrated via a Makefile and extensible Bash scripts.
- Flexible Input: Supports YouTube URLs, video IDs, local file paths, and batch lists.
- Automated Workflow: Download/copy videos, extract audio, keyframe extraction, subtitle generation, and markdown summarization.
- Parallel & Batch Processing: Efficiently handles large-scale datasets with parallel job support.
- Robust Logging & Error Handling: Centralized logs for every run, strict error propagation, and clear process markers.
- Environment & Dependency Management: Uses
.envfor configuration, supports override via environment variables, and validates all required dependencies. - Cross-Platform Packaging: Multiple binary packaging options (Go, makeself, SHC) for easy deployment.
- AI Integration: Supports both local LLM (Ollama) and Google Gemini API for transcript summarization.
MediaHeist/
├── Makefile
├── build_binary.sh
├── scripts/
│ ├── audio.sh
│ ├── common.sh
│ ├── download.sh
│ ├── frames.sh
│ ├── pre_srt_summary.sh
│ └── transcribe.sh
├── cmd/
│ └── mediaheist/
│ └── main.go
├── summary/
├── logs/
└── .env
- System: macOS, Linux, or Windows (WSL recommended)
- Dependencies:
yt-dlpffmpeg,ffprobejq,curlImageMagick(for phash)GNU parallelorxargsGo(for binary build)ollama(optional, for local LLM summarization)whisper.cpp(for fallback speech-to-text)
Copy and edit .env as needed:
cp .env.example .env
# Edit .env to set:
# GEMINI_API_KEY, GEMINI_MODEL_ID, WHISPER_BIN, WHISPER_MODEL, etc.make download URL="https://youtu.be/xxxx"
make all URL="https://youtu.be/xxxx"make download URL="/path/to/video.mp4"
make all URL="/path/to/video.mp4"make download LIST=urls.txt
make all LIST=urls.txt MAX_JOBS=8./build_binary.sh- Download/Copy Video: Detects input type, downloads via
yt-dlpor copies local file, and records mapping. - Audio Extraction: Uses
ffmpegto produce a 16kHz mono MP3. - Keyframe Extraction: Dynamically segments video, extracts keyframes, removes duplicates (based on phash).
- Subtitle Generation: Downloads YouTube CC subtitles (priority: zh-TW, zh, zh-CN, en); falls back to
whisper.cppif unavailable. - Summarization: Feeds transcript to Gemini API or local LLM to generate a Markdown summary.
- Logging: All stages log to a timestamped file in
logs/.
- Go Binary: Use
build_binary.shto build a standalone binary for your platform. - Makeself: (Recommended for easy distribution) Use
build_package.sh(not shown above) for self-extracting installer. - SHC: Use
build_shc_binary.shto compile shell scripts into binaries.
All packaging scripts ensure scripts and dependencies are bundled, and maintain compatibility with the Makefile workflow.
Key variables (set in .env or exported):
GEMINI_API_KEY,GEMINI_MODEL_ID: For Gemini summarization.WHISPER_BIN,WHISPER_MODEL: For speech-to-text fallback.MAX_JOBS: Controls parallel processing.YTDLP,FFMPEG: Tool overrides.
- All scripts redirect output to both console and a central log file.
- Strict error handling (
set -eEuo pipefail) throughout all scripts. - Each processing stage produces
.donemarker files for workflow tracking.
- Add new scripts to
scripts/and integrate with the Makefile. - Override tool paths or parameters via
.envor environment variables. - Easily swap LLM models or endpoints in
pre_srt_summary.sh.
For questions, suggestions, or contributions, please open an issue or pull request on GitHub.
