Skip to content

jaccus/shortgen

Repository files navigation

ShortGen

AI-powered pipeline for generating YouTube Shorts and Instagram Reels from a text summary.

How It Works

You provide a 2-3 sentence description of the video topic. The pipeline:

  1. Researches the topic via Perplexity API for current facts and data
  2. Writes a script using a local LLM (Ollama) — produces scene-by-scene visual descriptions and narration text
  3. Generates speech (Kokoro TTS) and video (Wan 2.1, currently stubbed) in parallel
  4. Assembles everything into a final 9:16 MP4 with synced audio
"AI is transforming coding in 2026"
         │
         ▼
  ┌──────────────┐
  │  Research     │ → facts, stats, sources
  └──────┬───────┘
         ▼
  ┌──────────────┐
  │  Script       │ → 4 scenes with narration + visual prompts
  └──────┬───────┘
    ┌────┴────┐
    ▼         ▼
 ┌─────┐  ┌───────┐
 │ TTS │  │ Video │  (parallel)
 └──┬──┘  └──┬────┘
    └────┬────┘
         ▼
  ┌──────────────┐
  │  Assembly     │ → output/20260201_143022_ai_is.../final.mp4
  └──────────────┘

Quick Start

# Install
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# Configure
cp config.example.toml config.toml
# Edit config.toml — add your Perplexity API key, adjust Ollama model, etc.

# Ensure Ollama is running
ollama pull mistral

# Generate (dry run — research + script only, no media)
shortgen generate --dry-run "AI is transforming how developers write code in 2026"

# Generate (full pipeline with stub video)
shortgen generate "AI is transforming how developers write code in 2026"

Prerequisites

  • Python 3.11+
  • FFmpeg: brew install ffmpeg
  • Ollama: running locally with a model pulled (e.g., ollama pull mistral)
  • Perplexity API key: set in config.toml or SHORTGEN_PERPLEXITY_API_KEY env var

For TTS (optional, needed for audio generation):

pip install -e ".[local]"   # installs Kokoro TTS

CLI

Step-by-step (recommended)

Run each pipeline stage individually. After the first command, subsequent stages auto-pick the most recent job — no need to pass paths:

shortgen script "AI tools for developers in 2026"   # research + write script
shortgen tts                                          # generate audio
shortgen video                                        # generate scene videos
shortgen assemble                                     # combine into final.mp4

You can also point at a specific job directory:

shortgen tts output/20260201_143022_ai_tools_for_2026/

Full pipeline (convenience)

shortgen generate "your topic summary"     # runs all stages end-to-end
shortgen generate --dry-run "summary"      # research + script only
shortgen generate --verbose "summary"      # with debug logging

Other commands

shortgen config                            # show resolved configuration
shortgen --version                         # show version

Configuration

Copy config.example.toml to config.toml and edit. Key sections:

[research]
provider = "perplexity"          # research provider

[scriptwriter]
provider = "ollama"              # "ollama" for local LLM
target_scene_count = 4           # scenes per video
target_duration_seconds = 45     # target video length

[tts]
provider = "kokoro"              # local TTS on CPU

[video]
provider = "stub"                # "stub" until Wan 2.1 is implemented

Environment variables override config file values:

  • SHORTGEN_PERPLEXITY_API_KEY
  • ANTHROPIC_API_KEY (for future Claude scriptwriter)

Output

Each run creates a job directory under output/:

output/20260201_143022_ai_is_transforming/
├── job.json          # job metadata + stages_completed tracker
├── research.json     # research findings, sources, raw text
├── script.json       # generated script with scenes
├── tts.json          # TTS metadata (scene timings)
├── audio.wav         # TTS audio
├── scenes/           # per-scene video clips
│   ├── scene_000.mp4
│   ├── scene_001.mp4
│   └── ...
└── final.mp4         # assembled output

Project Structure

src/shortgen/
├── cli.py              # Click CLI entry point
├── config.py           # Pydantic config + component factory
├── pipeline.py         # Async orchestrator
├── models.py           # Data models (Scene, Script, TTSResult, etc.)
├── log.py              # Logging setup
├── research/           # Web research providers
│   ├── base.py         #   Researcher Protocol
│   └── perplexity.py   #   Perplexity API
├── scriptwriter/       # Script generation providers
│   ├── base.py         #   ScriptWriter Protocol
│   └── ollama.py       #   Ollama (local LLM)
├── tts/                # Text-to-speech providers
│   ├── base.py         #   TTSEngine Protocol
│   └── kokoro.py       #   Kokoro TTS (local, CPU)
├── video/              # Video generation providers
│   ├── base.py         #   VideoGenerator Protocol
│   └── stub.py         #   Placeholder (colored rectangles)
└── assembly/           # Video assembly providers
    ├── base.py         #   Assembler Protocol
    └── ffmpeg.py       #   MoviePy + FFmpeg

Extending

Each pipeline stage uses a Protocol interface. To add a new provider:

  1. Create a new file in the stage's directory (e.g., tts/elevenlabs.py)
  2. Implement the Protocol (e.g., TTSEngine — just needs matching method signature)
  3. Add a config model in config.py
  4. Register in the factory function
  5. Add config section to config.example.toml

See docs/extending.md for a step-by-step example.

Development

pip install -e ".[dev]"
pytest                  # run tests (15 tests)
ruff check .            # lint
mypy src/               # type check

Current State (v0.1.0)

  • Full pipeline scaffold with async orchestration
  • Working providers: Perplexity research, Ollama scriptwriter, Kokoro TTS, FFmpeg assembly
  • Video generation is stubbed (placeholder colored rectangles with text)
  • 30 tests passing, lint clean

See docs/TODO.md for planned work and docs/sessions/ for session history.

About

Research, plan and generate short consumable content.

Resources

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

 
 
 

Contributors

Languages