Video transcription and translation CLI for language learners.
Transcribe, refine, translate, and study — all in one pipeline.
Prerequisites: Python 3.12+, uv, ffmpeg (optional: mpv, Ollama)
git clone https://github.com/RizhongLin/PolyglotWhisperer.git
cd PolyglotWhisperer
uv sync --all-extras
# API keys (optional — only for cloud providers)
cp .env.example .envspaCy language models download automatically on first use.
For local Postgres-backed development of pgw serve:
docker compose -f docker-compose.dev.yml up -d # Postgres on :5432
export PGW_DATABASE_URL=postgresql+psycopg://pgw:pgw@localhost:5432/pgw_dev
uv run pgw maintenance migrate # alembic upgrade head, with legacy-stamp fallbackWithout PGW_DATABASE_URL, the server falls back to a local SQLite file under the workspace dir — fine for single-user use.
For full-stack production-style testing (pgw + Postgres in containers), the included docker-compose.yml boots both services together:
docker compose up -d # builds image, starts Postgres + pgw
docker compose logs -f pgw # entrypoint runs `pgw maintenance migrate` before servingThe container's entrypoint (docker/entrypoint.sh) runs migrations automatically whenever PGW_DATABASE_URL is set, so you never need to invoke alembic by hand on the production path.
docker-compose.yml reads your local .env (gitignored) for runtime secrets and the admin bootstrap. The variables that matter:
# Auto-create the admin on first boot — survives `docker compose down -v`
PGW_ADMIN_EMAIL=admin@example.com
PGW_ADMIN_PASSWORD=change-me-strong-password
# CSRF + signed-URL key (REQUIRED in production; auto-random in dev)
PGW_SECRET_KEY=<run: python -c "import secrets; print(secrets.token_urlsafe(32))">
# Provider keys (anything LiteLLM/OpenAI-compatible reads)
GROQ_API_KEY=...
OPENAI_API_KEY=...PGW_ADMIN_* is consumed by ensure_admin_from_env on every server start: if no users exist yet, the admin is created from those credentials. Idempotent — once an admin exists the env vars are ignored.
It doesn't — but docker compose down -v does. The -v flag wipes named volumes, including pgw_pg_data where the Postgres database lives. For routine rebuilds:
docker compose down # keeps the volume → admin persists
docker compose build pgw
docker compose up -dUse -v only when you want a deliberately fresh database (e.g., testing the migration path on empty Postgres). With PGW_ADMIN_EMAIL/PGW_ADMIN_PASSWORD set in .env, even a down -v recovers cleanly — the next boot recreates the admin from env.
# Full pipeline: download → transcribe → (refine + translate combined)
pgw run video.mp4 -l fr --refine --translate en
# From a URL (requires yt-dlp)
pgw run "https://youtube.com/watch?v=..." -l fr --translate en
# Cloud backends (no GPU needed)
pgw run video.mp4 --backend api --llm-backend api --translate en
# Skip transcription — use existing subtitles from the video page
pgw run "https://youtube.com/watch?v=..." --subs --translate enAll pipeline output lands in pgw_workspace/<slug>/<timestamp>/ alongside a metadata.json.
| Command | Purpose |
|---|---|
pgw run |
Full pipeline: download, transcribe, refine, translate |
pgw transcribe |
Whisper transcription only (local or cloud API) |
pgw translate |
Translate existing subtitle files |
pgw vocab |
Vocabulary analysis (difficulty tiers, rare words) |
pgw export |
Export vocabulary as CSV for Anki/spreadsheet |
pgw play |
Play video with dual subtitles via mpv |
pgw serve |
Launch web player for a workspace (or library view) |
pgw clean |
Clear cached files (downloads, audio, transcriptions) |
pgw languages |
List all supported languages |
pgw worker connect |
Run as a remote worker against a pgw serve instance |
| Component | Local | Cloud API (default) |
|---|---|---|
| Transcription | stable-ts (MLX/CUDA/CPU) | OpenAI SDK → Groq, OpenAI, custom servers |
| Translation / Refinement | Ollama (via OpenAI SDK) | OpenAI SDK → DeepSeek, Groq, OpenAI, Claude, … |
Any OpenAI SDK-compatible server works too — set api_base, api_key, and api_model:
# pgw.toml
[whisper]
backend = "api"
api_base = "https://your-whisper-server/v1"
api_key = "sk-..."
api_model = "openai/whisper-1"
[llm]
backend = "api"
api_base = "https://your-llm-server/v1"
api_key = "sk-..."
api_model = "openai/meta-llama-3.1-8b-instruct"# Per-run overrides
pgw run video.mp4 -l fr --backend api --whisper-model groq/whisper-large-v3-turbo
pgw run video.mp4 -l fr --llm-backend api --llm-model groq/openai/gpt-oss-120bFor local LLM, pull a model first: ollama pull qwen3:8b
Four layers, lowest to highest: packaged defaults → ~/.config/pgw/config.toml → ./pgw.toml → .env / env vars → CLI flags.
# pgw.toml
[whisper]
backend = "api"
language = "fr"
[llm]
backend = "api"
target_language = "en"Env vars use PGW_ prefix: PGW_WHISPER__BACKEND=api, PGW_LLM__API_MODEL=groq/.... See .env.example for all options.
pgw serve # library + studio + player (multi-page SPA)
pgw serve <workspace-dir> # single-workspace playerThe web UI is a React SPA built from frontend/ (Vite + TypeScript + TanStack Router/Query + Tailwind v4 + shadcn-style components). The bundle is shipped as static assets inside the Python wheel, so end users never need Node.
Pages:
- Library (
/library) — workspace grid with thumbnails, language pair, difficulty, dates. Click any card to open the player. - Studio (
/studio) — paste a URL or drop a file, pick source + target language from dropdowns, choose where to run (auto / worker / server), hit Start. Live progress cards stream events from the backend; cancel any time, close the tab and come back without losing state. Advanced flags (backends, models, chunk size, ffmpeg start/duration, refine, subs) are tucked behind a disclosure. - Player (
/library/<slug>/<ts>) — HTML5 / YouTube / Vimeo video via thePlayerAdapterswitch (auto-falls-back to HTML5 when an embed refuses), click-to-seek transcript with anticipate/linger windows, track switcher (bilingual / original / translation), vocab card with click-to-save flashcards, downloads card. - Review (
/review) — FSRS spaced-repetition queue: due cards stream one-at-a-time, reveal answer, grade Again/Hard/Good/Easy, optional 🗑 Discard. Audio clip auto-plays per card when the workspace has a saved range.
Backend is FastAPI + uvicorn serving JSON over /api/... and raw workspace files over /ws/<slug>/<ts>/<file>. Job state is persisted as append-only JSONL under <workspace>/.jobs/, so an in-flight job survives a browser refresh and the server's restart marks orphaned jobs as interrupted rather than leaving them stuck.
Knobs:
-
PGW_SERVE_HOST— bind address (default127.0.0.1; Docker sets0.0.0.0). -
PGW_SERVE_MAX_JOBS— concurrent pipeline workers (default1, keeps Whisper warm). -
PGW_JOBS_RETENTION— how many finished job logs to keep (default200). -
PGW_DATABASE_URL— DB connection string (default: SQLite under workspace dir; production:postgresql+psycopg://...). -
PGW_DB_POOL_SIZE— Postgres connection pool size (default5). -
PGW_ADMIN_EMAIL,PGW_ADMIN_PASSWORD— non-interactive admin bootstrap on first start. Without them, the SPA's/setupflow handles it. -
PGW_SECRET_KEY— signs CSRF cookies and signed URLs. Required in production. -
PGW_SPA_DIR— override the built-in SPA bundle path (useful during Docker-based frontend dev with a host-mounteddist/volume). -
PGW_DEV_BACKEND— backend URL for the Vite dev server proxy (defaulthttp://127.0.0.1:8321; set when Docker hosts the backend).
When the DB has no users, pgw serve runs in bootstrap mode — the SPA serves /setup on first visit so you can create the admin. After that, /login is required for /api/* and /jobs/*. CSRF protection is double-submit cookie + X-CSRF-Token header on every state-changing request.
pgw worker connect --server <url> --token <t> runs the pipeline on the user's machine using their own IP, GPU, and API keys. The remote pgw serve becomes a thin orchestrator + library surface; videos and big artifacts stay local. On the server side, manage tokens with POST /api/workers, GET /api/workers, DELETE /api/workers/{id}. In the Studio, select where to run each job — Auto (prefer connected worker), This machine (explicit worker), or Server (admin-only). When a worker disconnects, its in-flight jobs are marked interrupted and any open NDJSON stream reflects it.
Click a vocab word in the player transcript → "Save flashcard" pre-fills front/back/audio range. Cards land in /review ordered by FSRS due time; rate Again (1) / Hard (2) / Good (3) / Easy (4) and the algorithm reschedules. The 🗑 Discard button on each card removes it permanently.
When PGW_LLM__API_KEY is configured, every new card schedules a background LLM pass that adds: dictionary lemma, POS tag, polished definition, an example sentence pair, and (optionally) a mnemonic. Results land within seconds; the SPA polls until refine_status === 'done'. A per-(language, lemma, pos) cache means repeat surfaces are free.
Bulk-create cards from existing vocab:
pgw maintenance backfill-flashcards --owner you@example.com [--language fr] [--limit 50]Knobs (in .env):
PGW_FLASHCARD_REFINE=0— disable LLM enrichment entirely (cards still save with originalback).PGW_FLASHCARD_REFINE_MNEMONIC=1— enable mnemonics (off by default; ~2× output tokens).
Each run generates a vocabulary.<lang>.json in the workspace. Difficulty tiers (A1–C2) are estimated from word frequency — approximations, not official CEFR levels.
pgw vocab <workspace> --top 50 # terminal view
pgw export <workspace> # → vocabulary.csv for AnkiThe image is multi-stage: a Node stage builds the TypeScript frontend, then a uv stage installs the Python wheel with all extras (transcribe, llm, vocab, export, serve). End users never need Node installed.
docker build -t pgw .All commands work inside Docker — mount your project at /data and ensure .env has your API keys:
# Web UI (library + end-to-end pipeline)
docker run --rm -it -p 8321:8321 -v "$PWD:/data" pgw serve --no-open
# Full pipeline (CLI)
docker run --rm -it -v "$PWD:/data" pgw run /data/video.mp4 -l fr \
--translate en --backend api --llm-backend api --no-playThe mounted /data is also where pgw_workspace/ and pgw_workspace/.jobs/<id>.jsonl live — keep the volume mount stable across restarts so in-flight jobs reattach cleanly.
Dev mode — mount src/ to iterate on Python without rebuilding:
docker run --rm -it -p 8321:8321 \
-v "$PWD:/data" -v "$PWD/src:/app/src" \
pgw serve --no-opendocker build always rebuilds the React SPA from the TypeScript source via the js-builder stage, so you don't need Node locally to ship a Docker image. The host-side npm run build step is only needed when running pgw serve directly against your working tree (no Docker), since pgw serve reads src/pgw/templates/dist/ from disk.
src/pgw/
├── auth/ Argon2 passwords, sessions, CSRF, FastAPI deps, env-bootstrap
├── cli/ Typer commands (run, transcribe, translate, serve, worker, …)
├── core/ Config (Pydantic), pipeline orchestrator, events, JobContext
├── db/ SQLAlchemy 2.0 engine + ORM models (users, workspaces, vocab, workers)
├── downloader/ yt-dlp wrapper, URL resolver
├── llm/ OpenAI SDK client, translation, refinement, prompts
├── server/ FastAPI app + JobManager + routes/ (auth, workers, …)
├── subtitles/ Format conversion (VTT/SRT), PDF/EPUB export
├── transcriber/ Whisper backends (stable-ts local + API), segmentation
├── templates/ Built React SPA (templates/dist/) + favicon + brand mark
├── utils/ Audio extraction, cache, logging, spaCy, paths
├── vocab/ Vocabulary analysis + CEFR estimation
└── worker/ `pgw worker connect` agent + protocol (WebSocket to remote server)
frontend/ React SPA source (compiles → src/pgw/templates/dist/)
├── src/
│ ├── routes/ TanStack Router file-based routes (library, studio, player)
│ ├── components/ui/ shadcn-style primitives (Button, Card, Dialog, …)
│ ├── api/ typed fetch client + wire-format types
│ ├── lib/ cn(), VTT parser, formatters, theme hook
│ └── main.tsx entry: QueryClient + RouterProvider
├── vite.config.ts
└── tsconfig*.json
Build the frontend bundle:
cd frontend && npm ci && npm run build # → ../src/pgw/templates/dist/
cd frontend && npm run typecheck # tsc -b
cd frontend && npm run dev # Vite dev server on :5173 (proxies /api → :8321)Tight TypeScript dev loop:
# terminal 1 — backend
pgw serve --no-open --port 8321
# terminal 2 — Vite dev server (HMR + API proxy)
cd frontend && npm run dev
# open http://127.0.0.1:5173| Role | Library |
|---|---|
| Transcription | stable-ts (local), OpenAI SDK (cloud / OpenAI-compatible) |
| LLMs | OpenAI SDK → Ollama / DeepSeek / Groq / OpenAI / Claude / custom |
| NLP | spaCy (POS, lemmatizer), wordfreq (frequency) |
| Subtitles | pysubs2 |
| Download | yt-dlp |
| Export | WeasyPrint (PDF), ebooklib (EPUB) |
| CLI | Typer + Rich |
| Web UI backend | FastAPI + uvicorn |
| Web UI frontend | React 19 + Vite + TanStack Router/Query + Tailwind v4 |
| Playback | mpv (CLI), browser <video> (web UI) |
