Architecture

System design and component overview.

High-Level Flow

┌─────────────────────────────────────────────────────────────────────────┐
│                           User Interface                                 │
│                                                                          │
│   ./montage-ai.sh run hitchcock --cgpu --upscale                        │
│                              │                                           │
└──────────────────────────────┼───────────────────────────────────────────┘
                               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                        Docker Container                                  │
│                                                                          │
│  ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐   │
│  │ Creative        │     │ Style           │     │ Footage         │   │
│  │ Director        │────▶│ Templates       │────▶│ Manager         │   │
│  │ (LLM)           │     │ (JSON)          │     │ (Story Arc)     │   │
│  └────────┬────────┘     └─────────────────┘     └────────┬────────┘   │
│           │                                               │             │
│           └───────────────────┬───────────────────────────┘             │
│                               ▼                                          │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │                         Editor                                    │   │
│  │                                                                   │   │
│  │   ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐        │   │
│  │   │ Beat     │  │ Scene    │  │ Clip     │  │ Video    │        │   │
│  │   │ Detection│─▶│ Detection│─▶│ Assembly │─▶│ Rendering│        │   │
│  │   │(FFmpeg)  │  │(scenedet)│  │          │  │ (FFmpeg) │        │   │
│  │   └──────────┘  └──────────┘  └──────────┘  └──────────┘        │   │
│  │                                                                   │   │
│  └──────────────────────────────────────────────────────────────────┘   │
│                               │                                          │
│                               ▼                                          │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │                       Shorts Studio (v1.2+)                       │   │
│  │                                                                   │   │
│  │   ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐        │   │
│  │   │ Smart    │  │ Caption  │  │ Audio    │  │ Highlight│        │   │
│  │   │ Reframe  │  │ Burn     │  │ Polish   │  │ Detect   │        │   │
│  │   │(MediaPipe)│ │ (Styles) │  │(Sidechain)│ │ (Multi)  │        │   │
│  │   └──────────┘  └──────────┘  └──────────┘  └──────────┘        │   │
│  │                                                                   │   │
│  └──────────────────────────────────────────────────────────────────┘   │
│                               │                                          │
│                               ▼                                          │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │                    Enhancement Pipeline                           │   │
│  │                                                                   │   │
│  │   ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐        │   │
│  │   │ Upscale  │  │ Stabilize│  │ Color    │  │ Sharpen  │        │   │
│  │   │(ESRGAN)  │  │ (FFmpeg) │  │ Grade    │  │          │        │   │
│  │   └──────────┘  └──────────┘  └──────────┘  └──────────┘        │   │
│  │                                                                   │   │
│  └──────────────────────────────────────────────────────────────────┘   │
│                               │                                          │
└───────────────────────────────┼──────────────────────────────────────────┘
                               ▼
                        /data/output/
                        montage_001.mp4

Hybrid Architecture (Cloud Offloading)

For systems with limited resources (e.g., laptops), Montage AI supports a hybrid mode where heavy compute tasks are offloaded to the cloud via cgpu.

See cgpu-setup.md for setup details.

LLM Inference: Offloaded to Google Gemini via cgpu serve.
Upscaling: Offloaded to Google Colab GPUs via cgpu run.
Local: Orchestration, cutting, and basic rendering.

Module Responsibilities

Core Pipeline (src/montage_ai/core/)

The editing engine has been refactored into a modular pipeline.

MontageBuilder (montage_builder.py) The central orchestrator that executes the editing pipeline in phases:

Setup: Initialize workspace and logging.
Analyze: Process audio (beats/energy) and video (scenes/content).
Plan: Select clips and map them to the timeline based on the story arc.
Enhance: Apply stabilization, upscaling, and color grading.
Render: Generate the final video file.

Components:

Module	Purpose
`audio_analysis.py`	Beat detection, tempo extraction, energy profiling (FFmpeg astats/tempo; librosa optional)
`scene_analysis.py`	Scene detection, content analysis, visual similarity with LRU cache
`auto_reframe.py`	Auto Reframe Engine. 9:16 conversion using Convex Optimization (L2) for cinematic camera paths
`video_metadata.py`	Technical metadata extraction (ffprobe wrapper)
`clip_enhancement.py`	Stabilization, upscaling, color matching (Local/Cloud hybrid)
`audio_enhancer.py`	Audio Polish. Voice isolation, auto-ducking, loudness normalization (Pro Polish)
`shorts_workflow.py`	Shorts Studio. Dedicated pipeline for vertical content generation
`ffmpeg_config.py`	GPU encoder detection (NVENC/VAAPI/QSV), encoding parameters

Performance Optimizations

Optimization	Implementation	Impact
Preview Pipeline	Low-res (360p) + Ultrafast preset	10x faster iteration loop
LRU Histogram Cache	`@lru_cache` for frame extraction	91% cache hit rate, 2-3x faster clip selection
Parallel Scene Detection	`ProcessPoolExecutor` (max_workers = min(len(videos), max(4, cpu_count // 2), `MAX_SCENE_WORKERS`); ThreadPool fallback)	3-4x speedup on multi-core
FFmpeg Beat Detection	`astats` + `tempo` filters (primary path)	Portable, no heavy deps; librosa optional via try/except
Auto GPU Encoding	NVENC > VAAPI > QSV > CPU	2-6x encoding speedup
Hardware-Adjacent (Web)	Server-Sent Events (SSE) + `os.nice(10)`	Zero polling overhead, responsive UI under load
Lazy Loading (CLI)	Import heavy libs only when needed	Instant CLI startup time
Cluster Efficiency	`imagePullPolicy: IfNotPresent` (dev overlay uses `Always`)	Minimized network traffic for cached images

editor.py (CLI Entry Point)

A thin wrapper that initializes the MontageBuilder and handles CLI arguments.

creative_director.py (LLM Interface)

Translates natural language to editing parameters.

Responsibilities:

Parse user prompts
Query LLM (Ollama or Gemini)
Validate JSON responses
Map to style parameters
Incorporate RegisseurMemory hints when available
Emit schema-versioned outputs for compatibility (schema_version)

Backends:

Backend	Protocol	Model
Ollama	REST API	llama3.1:70b
cgpu/Gemini	OpenAI-compatible	gemini-2.0-flash

Flow:

User Prompt
    │
    ▼
┌─────────────────┐
│ System Prompt   │
│ + Style Options │
│ + Examples      │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ LLM (Ollama or  │
│ Gemini)         │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ JSON Validation │
│ + Schema Check  │
└────────┬────────┘
         │
         ▼
Editing Instructions

footage_manager.py (Clip Selection)

Professional-grade clip management with story arc awareness.

Key concepts:

Concept	Description
UsageStatus	Track if clip is UNUSED, USED, or RESERVED
SceneType	Classify clips: ESTABLISHING, ACTION, DETAIL, PORTRAIT, SCENIC
StoryPhase	Timeline position: INTRO, BUILD, CLIMAX, SUSTAIN, OUTRO
FootageClip	Data class with all clip metadata
FootagePoolManager	Manages available clip pool
StoryArcController	Maps timeline position to requirements

Selection algorithm:

Current Position → Story Phase → Required Energy + Scene Type
                                           │
                                           ▼
                               ┌───────────────────────┐
                               │ Score Available Clips │
                               │ - Energy match        │
                               │ - Scene type match    │
                               │ - Visual interest     │
                               │ - Variety bonus       │
                               └───────────┬───────────┘
                                           │
                                           ▼
                               Select Highest Score
                               Mark as USED

style_templates.py (Style Loader)

Loads and validates JSON style presets.

Responsibilities:

Discover preset files
Parse and validate JSON
Merge defaults with overrides
Cache loaded templates

File discovery order:

Built-in: src/montage_ai/styles/*.json
Env override: STYLE_PRESET_DIR/*.json
Single file: STYLE_PRESET_PATH

cgpu_upscaler.py (Cloud GPU)

Offloads AI upscaling to free cloud GPUs.

Flow:

Video Frames (local)
        │
        ▼
┌─────────────────┐
│ cgpu connect    │
│ (Google Colab)  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Real-ESRGAN     │
│ (T4/A100 GPU)   │
└────────┬────────┘
         │
         ▼
Upscaled Frames (local)

monitoring.py (Logging)

Real-time decision logging for debugging.

Logged events:

Clip selection decisions
Beat alignment choices
Energy level changes
Phase transitions
Performance metrics

Data Flow

Input Processing

/data/input/*.mp4
        │
        ├──▶ Scene Detection ──▶ Scene List
        │
        ├──▶ Energy Analysis ──▶ Clip Scores
        │
        └──▶ Metadata Extraction ──▶ Clip Database

Audio Processing

/data/music/*.mp3
        │
        ├──▶ Beat Detection ──▶ Beat Timestamps
        │
        ├──▶ Tempo Analysis ──▶ BPM
        │
        └──▶ Energy Curve ──▶ Energy Timeline

Assembly

Beat Timeline + Clip Database + Style Parameters
                      │
                      ▼
            ┌─────────────────┐
            │ For each beat:  │
            │ - Get story     │
            │   phase         │
            │ - Score clips   │
            │ - Select best   │
            │ - Mark used     │
            └────────┬────────┘
                     │
                     ▼
              Clip Sequence

Rendering

Clip Sequence
      │
      ├──▶ Crop/Scale to STANDARD_WIDTH x STANDARD_HEIGHT (1080x1920)
      │
        ├──▶ Optional: Upscale (Real-ESRGAN)
        │
        ├──▶ Optional: Stabilize (vidstab 2-pass)
      │
      ├──▶ Color Grade (20+ presets)
      │
      └──▶ Progressive Renderer
            │
            ├──▶ Batch clips (default 25)
            ├──▶ Write segments to disk
            ├──▶ FFmpeg concat (-c copy)
            ├──▶ Optional: xfade transitions
            └──▶ Audio mix + Logo overlay
                    │
                    ▼
            /data/output/montage.mp4

Memory-Efficient Progressive Rendering

The system uses ProgressiveRenderer (in segment_writer.py) to prevent OOM crashes.

┌─────────────────────────────────────────────────────────────┐
│                    Progressive Renderer                      │
│                                                              │
│  ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐     │
│  │ Clip 1  │   │ Clip 2  │   │ ...     │   │ Clip 25 │     │
│  └────┬────┘   └────┬────┘   └────┬────┘   └────┬────┘     │
│       │             │             │             │            │
│       └─────────────┴─────────────┴─────────────┘            │
│                         │                                    │
│                         ▼                                    │
│              ┌──────────────────┐                           │
│              │ flush_batch()    │                           │
│              │ - Normalize      │                           │
│              │ - Write segment  │                           │
│              │ - GC + cleanup   │                           │
│              └────────┬─────────┘                           │
│                       │                                      │
│                       ▼                                      │
│              segment_0001.mp4 (disk)                        │
│                                                              │
│  ... repeat for all batches ...                             │
│                                                              │
│              ┌──────────────────┐                           │
│              │ finalize()       │                           │
│              │ - FFmpeg concat  │                           │
│              │ - Audio mix      │                           │
│              │ - Logo overlay   │                           │
│              └────────┬─────────┘                           │
│                       │                                      │
│                       ▼                                      │
│              output.mp4                                      │
└─────────────────────────────────────────────────────────────┘

Key Constants (Dynamically Determined):

These constants are automatically determined from input footage using determine_output_profile():

Constant	Default	Determination Method
`STANDARD_WIDTH`	1080	Weighted median of input widths, snapped to standard presets
`STANDARD_HEIGHT`	1920	Weighted median of input heights, snapped to standard presets
`STANDARD_FPS`	30	Weighted median of input frame rates
`STANDARD_PIX_FMT`	yuv420p	Dominant pixel format from inputs (by duration)
`TARGET_CODEC`	libx264	Dominant codec from inputs, or env `OUTPUT_CODEC`
`TARGET_PROFILE`	high	Auto-selected based on resolution (4.1 for HD, 5.1 for 4K)
`TARGET_BITRATE`	auto	Weighted median of input bitrates, or calculated from pixels

Output Profile Heuristics:

Orientation (horizontal/vertical/square) determined by weighted aspect ratios
Resolution snapped to common presets (1080p, 720p, 4K) if within 12% variance
Avoids upscaling beyond maximum input resolution
Honors environment overrides: OUTPUT_CODEC, OUTPUT_PIX_FMT, OUTPUT_PROFILE, OUTPUT_LEVEL

External Dependencies

Dependency	Purpose	Version
FFmpeg	Video encoding/processing (beat detection via astats/tempo)	Latest
MoviePy	Video manipulation	2.2+
OpenCV	Frame processing	4.12+
scenedetect	Scene detection	0.6+
numpy	Numerical operations	2.0+
Pydantic	Data validation	2.0+
Real-ESRGAN	AI upscaling	Latest
OpenAI SDK	cgpu/Gemini client	1.0+

Note: librosa has been removed. Beat detection now uses FFmpeg astats/tempo filters as the primary engine (portable, no heavy deps). See audio_analysis.py.

Scaling Considerations

Parallel Processing

Clip enhancement runs in parallel threads
Frame upscaling can use cloud GPU
FFmpeg uses multi-threading

Memory Management

Clips loaded on-demand, not all at once
Temporary files cleaned after processing
Large videos processed in chunks

GPU Utilization

Auto-detection of available GPU encoders
Fallback chain: NVENC → VAAPI → QSV → CPU
Cloud GPU option for heavy workloads

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Architecture

High-Level Flow

Hybrid Architecture (Cloud Offloading)

Module Responsibilities

Core Pipeline (src/montage_ai/core/)

Performance Optimizations

editor.py (CLI Entry Point)

creative_director.py (LLM Interface)

footage_manager.py (Clip Selection)

style_templates.py (Style Loader)

cgpu_upscaler.py (Cloud GPU)

monitoring.py (Logging)

Data Flow

Input Processing

Audio Processing

Assembly

Rendering

Memory-Efficient Progressive Rendering

External Dependencies

Scaling Considerations

Parallel Processing

Memory Management

GPU Utilization

Uh oh!

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Architecture

High-Level Flow

Hybrid Architecture (Cloud Offloading)

Module Responsibilities

Core Pipeline (src/montage_ai/core/)

Performance Optimizations

editor.py (CLI Entry Point)

creative_director.py (LLM Interface)

footage_manager.py (Clip Selection)

style_templates.py (Style Loader)

cgpu_upscaler.py (Cloud GPU)

monitoring.py (Logging)

Data Flow

Input Processing

Audio Processing

Assembly

Rendering

Memory-Efficient Progressive Rendering

External Dependencies

Scaling Considerations

Parallel Processing

Memory Management

GPU Utilization