AI-powered pipeline for generating YouTube Shorts and Instagram Reels from a text summary.
You provide a 2-3 sentence description of the video topic. The pipeline:
- Researches the topic via Perplexity API for current facts and data
- Writes a script using a local LLM (Ollama) — produces scene-by-scene visual descriptions and narration text
- Generates speech (Kokoro TTS) and video (Wan 2.1, currently stubbed) in parallel
- Assembles everything into a final 9:16 MP4 with synced audio
"AI is transforming coding in 2026"
│
▼
┌──────────────┐
│ Research │ → facts, stats, sources
└──────┬───────┘
▼
┌──────────────┐
│ Script │ → 4 scenes with narration + visual prompts
└──────┬───────┘
┌────┴────┐
▼ ▼
┌─────┐ ┌───────┐
│ TTS │ │ Video │ (parallel)
└──┬──┘ └──┬────┘
└────┬────┘
▼
┌──────────────┐
│ Assembly │ → output/20260201_143022_ai_is.../final.mp4
└──────────────┘
# Install
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
# Configure
cp config.example.toml config.toml
# Edit config.toml — add your Perplexity API key, adjust Ollama model, etc.
# Ensure Ollama is running
ollama pull mistral
# Generate (dry run — research + script only, no media)
shortgen generate --dry-run "AI is transforming how developers write code in 2026"
# Generate (full pipeline with stub video)
shortgen generate "AI is transforming how developers write code in 2026"- Python 3.11+
- FFmpeg:
brew install ffmpeg - Ollama: running locally with a model pulled (e.g.,
ollama pull mistral) - Perplexity API key: set in
config.tomlorSHORTGEN_PERPLEXITY_API_KEYenv var
For TTS (optional, needed for audio generation):
pip install -e ".[local]" # installs Kokoro TTSRun each pipeline stage individually. After the first command, subsequent stages auto-pick the most recent job — no need to pass paths:
shortgen script "AI tools for developers in 2026" # research + write script
shortgen tts # generate audio
shortgen video # generate scene videos
shortgen assemble # combine into final.mp4You can also point at a specific job directory:
shortgen tts output/20260201_143022_ai_tools_for_2026/shortgen generate "your topic summary" # runs all stages end-to-end
shortgen generate --dry-run "summary" # research + script only
shortgen generate --verbose "summary" # with debug loggingshortgen config # show resolved configuration
shortgen --version # show versionCopy config.example.toml to config.toml and edit. Key sections:
[research]
provider = "perplexity" # research provider
[scriptwriter]
provider = "ollama" # "ollama" for local LLM
target_scene_count = 4 # scenes per video
target_duration_seconds = 45 # target video length
[tts]
provider = "kokoro" # local TTS on CPU
[video]
provider = "stub" # "stub" until Wan 2.1 is implementedEnvironment variables override config file values:
SHORTGEN_PERPLEXITY_API_KEYANTHROPIC_API_KEY(for future Claude scriptwriter)
Each run creates a job directory under output/:
output/20260201_143022_ai_is_transforming/
├── job.json # job metadata + stages_completed tracker
├── research.json # research findings, sources, raw text
├── script.json # generated script with scenes
├── tts.json # TTS metadata (scene timings)
├── audio.wav # TTS audio
├── scenes/ # per-scene video clips
│ ├── scene_000.mp4
│ ├── scene_001.mp4
│ └── ...
└── final.mp4 # assembled output
src/shortgen/
├── cli.py # Click CLI entry point
├── config.py # Pydantic config + component factory
├── pipeline.py # Async orchestrator
├── models.py # Data models (Scene, Script, TTSResult, etc.)
├── log.py # Logging setup
├── research/ # Web research providers
│ ├── base.py # Researcher Protocol
│ └── perplexity.py # Perplexity API
├── scriptwriter/ # Script generation providers
│ ├── base.py # ScriptWriter Protocol
│ └── ollama.py # Ollama (local LLM)
├── tts/ # Text-to-speech providers
│ ├── base.py # TTSEngine Protocol
│ └── kokoro.py # Kokoro TTS (local, CPU)
├── video/ # Video generation providers
│ ├── base.py # VideoGenerator Protocol
│ └── stub.py # Placeholder (colored rectangles)
└── assembly/ # Video assembly providers
├── base.py # Assembler Protocol
└── ffmpeg.py # MoviePy + FFmpeg
Each pipeline stage uses a Protocol interface. To add a new provider:
- Create a new file in the stage's directory (e.g.,
tts/elevenlabs.py) - Implement the Protocol (e.g.,
TTSEngine— just needs matching method signature) - Add a config model in
config.py - Register in the factory function
- Add config section to
config.example.toml
See docs/extending.md for a step-by-step example.
pip install -e ".[dev]"
pytest # run tests (15 tests)
ruff check . # lint
mypy src/ # type check- Full pipeline scaffold with async orchestration
- Working providers: Perplexity research, Ollama scriptwriter, Kokoro TTS, FFmpeg assembly
- Video generation is stubbed (placeholder colored rectangles with text)
- 30 tests passing, lint clean
See docs/TODO.md for planned work and docs/sessions/ for session history.