This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
MediaScribe is a CLI toolkit for transcribing and summarizing audio, text, and video with reusable staged workflows.
Public project and CLI branding uses MediaScribe. The canonical Python package namespace is mediascribe.
Preferred CLI commands:
mediascribemediascribe-transcribermediascribe-text
- Local ASR:
faster-whisper+pyannote.audio - Cloud ASR: Azure Speech / Aliyun / iFlytek through a shared provider interface
- LLM Summary:
litellmrouting to local Ollama models or cloud LLM APIs - Video support:
yt-dlp+ffmpeg - CLI:
argparse - Python: >= 3.10
uv sync --extra local --extra devDependencies are split into layers (see pyproject.toml):
- Core (
uv sync):litellm+requests - Video (
uv sync --extra video):yt-dlp - Local (
uv sync --extra local):faster-whisper+pyannote.audio+ PyTorch - Dev (
uv sync --extra dev):pytest
Runtime prerequisites that are not installed by uv sync:
- Local ASR:
ffmpeg>= 4.4 inPATH, plusHF_TOKENforpyannote.audio - Local LLM summary: Ollama installed separately, Ollama running on
http://localhost:11434, and a local model pulled (default:qwen2.5:3b)
Typical local summary setup:
ollama pull qwen2.5:3b
# If Ollama is not already running in the background:
ollama serve# Audio: transcript + summary
uv run mediascribe audio.mp3 --asr azure
# Audio: transcript only
uv run mediascribe audio.mp3 --asr azure --no-summary
# Audio: transcription-focused entrypoint
uv run mediascribe-transcriber ./recordings --asr azure -o ./output
# Text: summarize existing notes or transcripts
uv run mediascribe-text ./notes
# Text: summarize with the default local Ollama model
uv run mediascribe-text ./notes --llm-model ollama/qwen2.5:3b --llm-api-base http://localhost:11434
# Video: subtitle-first summary
uv run mediascribe video "https://www.youtube.com/watch?v=aircAruvnKk"uv run pytest
uv run pytest tests/test_scanner.py -k "test_filter"mediascribe/
├── cli.py # Top-level mediascribe entrypoint
├── transcription_service.py # Audio transcription orchestration
├── audio_summary_service.py # Audio transcript summary helpers
├── text_summary_service.py # Generic text/file summary helpers
├── video_summary_service.py # Video orchestration
├── video_input_service.py # Local-vs-remote video resolution
├── subtitle_fetch_service.py # Subtitle-first video path
├── media_extract_service.py # Audio extraction/download path
├── formatter.py # Transcript formatting and writing
├── scanner.py # Audio input discovery
├── asr/ # ASR provider registry, config, adapters, providers
└── summary/ # Summary provider registry, config, adapters, providers
- Provider-based architecture: ASR and summary backends are isolated behind registries and typed configs.
- Subtitle-first video flow: video prefers subtitles, then falls back to audio extraction + ASR + summary.
- Reusable staged workflows: transcription, transcript summary, text summary, and video summary can be run independently.
- Default audio behavior:
mediascribesummarizes by default unless--no-summaryor--transcript-onlyis used. - Fail-forward batches: multi-file transcription logs individual failures and continues.
- Use type hints on all function signatures.
- Use
logging, notprint, for runtime output. - Keep configuration defaults in
mediascribe/config.py. - Read credentials from environment variables only.
- New providers should plug into the existing registry/discovery flow instead of branching CLI logic.
Transcript files use this format:
[HH:MM:SS] Speaker N: <text>
Summary output is Markdown and keeps source-reference metadata when available.