Animation Pipeline Architecture

Core Principle: The scene specification is the product. The engine is a set of loosely coupled, swappable adapters.

System Overview

┌─────────────────────────────────────────────────────────────────┐
│                     Scene Specification                          │
│  scenes/two_pointers/scene.json                                  │
│  - narration steps (text)                                        │
│  - visualization config (array, pointers, highlights)            │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Pipeline Orchestrator                       │
│  build.py                                                        │
│  - Loads scene spec                                              │
│  - Wires adapters together                                       │
│  - Executes build pipeline                                       │
└─────────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        ▼                     ▼                     ▼
┌──────────────┐      ┌──────────────┐      ┌──────────────┐
│  TTS Adapter │      │  Animation   │      │   Merger     │
│              │      │   Adapter    │      │   Adapter    │
│ - generate() │      │ - render()   │      │ - merge()    │
│ - duration() │      │ - duration() │      │              │
└──────────────┘      └──────────────┘      └──────────────┘
        │                     │                     │
   Implementations:     Implementations:      Implementations:
   - MacOSSay           - HTMLAnimation       - FFmpegMerger
   - Piper              - ManimAnimation
   - OpenAI             (future)

Directory Structure

animations/
├── ARCHITECTURE.md           # This file
├── build.py                  # Pipeline orchestrator (CLI entry point)
├── pipeline/
│   ├── __init__.py
│   ├── orchestrator.py       # Main pipeline logic
│   ├── adapters/
│   │   ├── __init__.py
│   │   ├── base.py           # Abstract base classes (interfaces)
│   │   ├── tts/
│   │   │   ├── __init__.py
│   │   │   ├── macos_say.py  # macOS say command
│   │   │   ├── piper.py      # Piper TTS (future)
│   │   │   └── openai.py     # OpenAI TTS (future)
│   │   ├── animation/
│   │   │   ├── __init__.py
│   │   │   ├── html.py       # HTML/CSS/JS renderer
│   │   │   └── manim.py      # Manim renderer (future)
│   │   ├── recorder/
│   │   │   ├── __init__.py
│   │   │   └── playwright.py # Playwright screen recorder
│   │   └── merger/
│   │       ├── __init__.py
│   │       └── ffmpeg.py     # FFmpeg audio/video merger
│   └── schema.py             # Scene spec validation (Pydantic)
├── scenes/                   # Scene specifications (the product)
│   └── two_pointers/
│       ├── scene.json        # Narration + visualization config
│       └── assets/           # Any scene-specific assets
├── templates/                # HTML animation templates
│   └── array_animation.html  # Jinja2 template for array scenes
├── output/                   # Build artifacts (gitignored)
│   └── two_pointers/
│       ├── timing.json       # Auto-generated durations
│       ├── step_*.aiff       # Individual audio segments
│       ├── audio.m4a         # Concatenated audio
│       ├── video.webm        # Recorded animation
│       └── final.mp4         # Merged output
└── tests/
    ├── __init__.py
    ├── test_tts_adapters.py
    ├── test_orchestrator.py
    └── fixtures/
        └── sample_scene.json

Scene Specification Schema

{
  "$schema": "scene.schema.json",
  "id": "two_pointers_basic",
  "title": "Two Pointers: O(n)",
  "description": "Demonstrates two pointer technique on sorted array",

  "visualization": {
    "type": "array_pointers",
    "config": {
      "array": [2, 7, 11, 15],
      "target": 9,
      "theme": "dark"
    }
  },

  "steps": [
    {
      "id": "init",
      "narration": "We start with two pointers at each end of the sorted array.",
      "state": {
        "left": 0,
        "right": 3,
        "highlight": null,
        "message": "Initialize pointers"
      }
    },
    {
      "id": "step1",
      "narration": "Two plus fifteen equals seventeen. That's bigger than our target of nine.",
      "state": {
        "left": 0,
        "right": 3,
        "highlight": "sum",
        "message": "2 + 15 = 17 > 9"
      }
    },
    {
      "id": "move_right",
      "narration": "Since the sum is too big, we move the right pointer left.",
      "state": {
        "left": 0,
        "right": 2,
        "highlight": "right_move",
        "message": "Move right pointer"
      }
    }
  ]
}

Design Principles

Adapter Encapsulation

Adapter-specific details must stay inside the adapter. The orchestrator and other components should not hardcode assumptions about adapter behavior.

Examples:

Output formats: Each TTS adapter declares its output_extension (.aiff, .wav, .mp3). The orchestrator uses self.tts.output_extension rather than hardcoding .aiff.
Dependencies: Adapters handle their own lazy-loading of dependencies (e.g., OpenAI client, Piper models).
Configuration: Adapter-specific config (voice, model, paths) lives in the adapter constructor.

This enables truly swappable adapters without touching orchestrator code.

Adapter Interfaces

TTSAdapter (Abstract Base)

class TTSAdapter(ABC):
    @abstractmethod
    def generate(self, text: str, output_path: Path) -> Path:
        """Generate audio file from text. Returns path to audio file."""
        pass

    @abstractmethod
    def get_duration(self, audio_path: Path) -> float:
        """Get duration of audio file in seconds."""
        pass

    @abstractmethod
    def name(self) -> str:
        """Adapter identifier for logging."""
        pass

AnimationAdapter (Abstract Base)

class AnimationAdapter(ABC):
    @abstractmethod
    def render(self, scene_spec: SceneSpec, timing: List[float], output_path: Path) -> Path:
        """Render animation video. Returns path to video file."""
        pass

    @abstractmethod
    def name(self) -> str:
        """Adapter identifier for logging."""
        pass

MergerAdapter (Abstract Base)

class MergerAdapter(ABC):
    @abstractmethod
    def merge(self, video_path: Path, audio_path: Path, output_path: Path) -> Path:
        """Merge video and audio. Returns path to merged file."""
        pass

    @abstractmethod
    def concat_audio(self, audio_paths: List[Path], output_path: Path) -> Path:
        """Concatenate multiple audio files. Returns path to combined file."""
        pass

Implementation Checklist

Phase 1: Core Infrastructure

Create directory structure
Implement pipeline/schema.py (Pydantic models for scene spec)
Implement pipeline/adapters/base.py (abstract interfaces)
Write basic tests for schema validation

Phase 2: TTS Adapter

Implement pipeline/adapters/tts/macos_say.py
Add duration extraction using ffprobe
Write tests for TTS adapter

Phase 3: Animation Adapter

Create templates/array_animation.html (Jinja2 template)
Implement pipeline/adapters/animation/html.py
Implement pipeline/adapters/recorder/playwright.py
Write tests for animation rendering

Phase 4: Merger Adapter

Implement pipeline/adapters/merger/ffmpeg.py
Add audio concatenation
Add video/audio merge with proper sync
Write tests for merger

Phase 5: Pipeline Orchestrator

Implement pipeline/orchestrator.py
Wire all adapters together
Implement build.py CLI entry point
Write integration tests

Phase 6: Scene Migration

Create scenes/two_pointers/scene.json from existing narration
Test full pipeline end-to-end
Generate final synced video

CLI Usage (Target)

# Build a single scene
python build.py scenes/two_pointers/scene.json

# Build with specific adapters
python build.py scenes/two_pointers/scene.json --tts openai --animation html

# Build all scenes
python build.py scenes/

# Dry run (validate only)
python build.py scenes/two_pointers/scene.json --dry-run

# Watch mode (rebuild on change)
python build.py scenes/two_pointers/scene.json --watch

Testing Strategy

Unit tests: Each adapter in isolation with mock inputs
Integration tests: Full pipeline with a minimal test scene
Fixtures: tests/fixtures/sample_scene.json with known expected outputs

# Run tests
pytest animations/tests/ -v

# Run with coverage
pytest animations/tests/ --cov=pipeline

Handoff Protocol for Subagents

Each subagent must:

Check off completed items in this document
Add a summary comment under "Implementation Notes" section
Run any relevant tests before handoff
Note any issues or decisions made

Implementation Notes

Subagents add notes here during implementation

Phase 1 Notes

Completed 2026-01-19

Files Created:

pipeline/__init__.py - Package init
pipeline/schema.py - Pydantic models: StepState, Step, VisualizationConfig, SceneSpec
pipeline/adapters/__init__.py - Package init
pipeline/adapters/base.py - Abstract base classes: TTSAdapter, AnimationAdapter, RecorderAdapter, MergerAdapter
pipeline/adapters/tts/__init__.py - Package init
pipeline/adapters/animation/__init__.py - Package init
pipeline/adapters/recorder/__init__.py - Package init
pipeline/adapters/merger/__init__.py - Package init
tests/__init__.py - Package init
tests/test_schema.py - 18 tests for schema validation
tests/fixtures/sample_scene.json - Sample scene fixture

Decisions:

Used Pydantic v2 for schema validation (installed as dependency)
Added helper methods to SceneSpec: get_narrations() and get_step_ids() for convenience
Made name a property (not method) on all adapters for cleaner access
RecorderAdapter added as separate interface from AnimationAdapter for flexibility (HTML adapter renders template, recorder captures it to video)
All StepState fields are optional to support various visualization types

Tests: 18 tests pass in tests/test_schema.py

Phase 2 Notes

Completed 2026-01-19

Files Created:

pipeline/adapters/tts/macos_say.py - MacOSSayAdapter implementation
tests/test_tts_adapters.py - 23 tests (21 pass, 2 skipped when ffprobe unavailable)

Files Modified:

pipeline/adapters/tts/__init__.py - Exports MacOSSayAdapter and TTSError

Implementation Details:

MacOSSayAdapter class with configurable voice (default: Samantha) and ffprobe path
generate(text, output_path) - Uses macOS say -v {voice} -o {path} "{text}" command
get_duration(audio_path) - Uses ffprobe with JSON output to extract duration
TTSError exception class for all TTS-related errors

Design Decisions:

ffprobe path defaults to ~/.local/bin/ffprobe but is configurable via constructor
Empty/whitespace-only text raises TTSError early (before calling say)
Output directories are created automatically if they don't exist
Integration tests are skipped gracefully when say or ffprobe are unavailable

Tests: 21 unit tests pass, 2 integration tests skipped (ffprobe not installed on test system)

Usage Example:

from pipeline.adapters.tts import MacOSSayAdapter

adapter = MacOSSayAdapter(voice="Samantha")
audio_path = adapter.generate("Hello world", Path("output.aiff"))
duration = adapter.get_duration(audio_path)

Phase 3 Notes

Completed 2026-01-19

Files Created:

templates/array_animation.html - Jinja2 template for array pointer animations
pipeline/adapters/animation/html.py - HTMLAnimationAdapter implementation
pipeline/adapters/recorder/playwright.py - PlaywrightRecorder implementation
tests/test_animation_adapters.py - 16 tests (14 unit tests, 2 integration tests)

Files Modified:

pipeline/adapters/animation/__init__.py - Exports HTMLAnimationAdapter
pipeline/adapters/recorder/__init__.py - Exports PlaywrightRecorder
pipeline/adapters/__init__.py - Exports all adapters for easy imports

Implementation Details:

HTMLAnimationAdapter (pipeline/adapters/animation/html.py)
- Renders SceneSpec to HTML using Jinja2 templates
- render(spec, timing, output_path) - Generates HTML file with embedded animation
- Maps visualization types to templates (array_pointers -> array_animation.html)
- Validates timing list length matches number of steps
- Creates parent directories automatically
PlaywrightRecorder (pipeline/adapters/recorder/playwright.py)
- Records HTML animations to WebM video using Playwright
- record(html_path, duration, output_path) - Captures video of specified duration
- Configurable viewport size (default: 1280x720)
- Waits for window.animationReady before recording
- Async implementation with sync wrapper for easy use
array_animation.html (Jinja2 template)
- Dark theme with clean design
- CSS animations for pointer movement and highlights
- JavaScript timeline that steps through animation states
- Sets window.animationDuration for recorder
- Supports highlight modes: sum, found, left_move, right_move
- Responsive array cells with pointer labels (L/R)

Template Variables:

{{ title }} - Scene title
{{ array }} - Array values (e.g., [2, 7, 11, 15])
{{ target }} - Target sum
{{ steps }} - List of step objects with state
{{ timing }} - List of durations per step in seconds

Design Decisions:

HTML adapter only generates HTML; recording is separate (RecorderAdapter)
Playwright records to temp directory, then moves to final location
WebM format is native Playwright output; format conversion left to merger
Animation waits for window.animationReady = true before starting
Duration buffer of 0.5s added to ensure complete capture

Tests: 14 unit tests pass, 2 integration tests pass (require Playwright browsers)

Unit tests use tempfile for isolated file operations
Integration tests marked as slow for optional skipping
Integration tests verify actual video file creation

Usage Example:

from pathlib import Path
from pipeline.schema import SceneSpec
from pipeline.adapters.animation import HTMLAnimationAdapter
from pipeline.adapters.recorder import PlaywrightRecorder

# Load scene spec
spec = SceneSpec(**scene_data)
timing = [2.0, 3.0, 2.5]  # seconds per step

# Render HTML
html_adapter = HTMLAnimationAdapter(Path("templates"))
html_path = html_adapter.render(spec, timing, Path("output/animation.html"))

# Record to video
recorder = PlaywrightRecorder(viewport_width=1280, viewport_height=720)
total_duration = sum(timing)
video_path = recorder.record(html_path, total_duration, Path("output/video.webm"))

Phase 4 Notes

Completed 2026-01-19

Files Created:

pipeline/adapters/merger/ffmpeg.py - FFmpegMerger implementation
tests/test_merger_adapters.py - 20 tests for merger adapter

Files Modified:

pipeline/adapters/merger/__init__.py - Exports FFmpegMerger and FFmpegMergerError

Implementation Details:

FFmpegMerger class with configurable FFmpeg path (default: ~/.local/bin/ffmpeg)
concat_audio(audio_paths, output_path) - Concatenates multiple audio files using FFmpeg concat demuxer
- Automatically re-encodes when input/output formats differ (e.g., AIFF to M4A)
- Uses stream copy (-c copy) when formats match for efficiency
merge(video_path, audio_path, output_path) - Merges video and audio streams
- Uses tpad filter with stop_mode=clone and stop=-1 to freeze last frame indefinitely
- Uses -shortest flag so FFmpeg stops when audio ends
- Output uses H.264 video (libx264) and AAC audio for wide compatibility
FFmpegMergerError exception class for all FFmpeg-related errors

Design Decisions:

FFmpeg path defaults to ~/.local/bin/ffmpeg but is configurable via constructor
Missing input files raise FFmpegMergerError early (before calling FFmpeg)
Output directories are created automatically if they don't exist
Commands are logged for debugging (using Python logging)
When concatenating a single file, copies/converts directly without using concat demuxer (optimization)
Format detection based on file extensions to decide between stream copy and re-encoding

FFmpeg Command Details:

Concat: ffmpeg -f concat -safe 0 -i filelist.txt -c copy output.m4a (or -c:a aac for format conversion)
Merge: ffmpeg -i video.webm -i audio.aiff -filter_complex "[0:v]tpad=stop_mode=clone:stop=-1[v]" -map "[v]" -map 1:a -c:v libx264 -c:a aac -shortest output.mp4

Tests: 20 tests pass (18 unit tests with mocked subprocess, 2 integration tests with real FFmpeg)

Usage Example:

from pipeline.adapters.merger import FFmpegMerger

merger = FFmpegMerger()

# Concatenate audio segments
audio_path = merger.concat_audio(
    [Path("step_0.aiff"), Path("step_1.aiff"), Path("step_2.aiff")],
    Path("output/audio.m4a")
)

# Merge video and audio (video extended if audio is longer)
final_path = merger.merge(
    Path("output/video.webm"),
    Path("output/audio.m4a"),
    Path("output/final.mp4")
)

Phase 5 Notes

Completed 2026-01-19

Files Created:

pipeline/orchestrator.py - PipelineOrchestrator class and BuildResult dataclass
build.py - CLI entry point for building scenes
tests/test_orchestrator.py - 22 tests (21 unit tests pass, 1 integration test skipped)

Implementation Details:

PipelineOrchestrator (pipeline/orchestrator.py)
- Wires all adapters together (TTS, Animation, Recorder, Merger)
- build(scene_spec, dry_run=False) - Executes the full 6-step pipeline
- build_from_file(scene_path) - Loads JSON and builds
- build_from_file_dry_run(scene_path) - Validates without building
- Creates output subdirectory per scene (output/{scene_id}/)
- Saves intermediate files: step_*.aiff, timing.json, animation.html, video.webm, audio.m4a, final.mp4
- Logs progress at each step using Python logging
- Returns BuildResult with timing info and success status
BuildResult (dataclass)
- scene_id: Scene identifier
- output_path: Path to final.mp4
- timing: List of step durations in seconds
- total_duration: Sum of all step durations
- success: Boolean success flag
- error: Error message if failed
- intermediate_files: Dict of intermediate file paths for debugging
build.py (CLI)
- python build.py scene.json - Build single scene
- python build.py scenes/ --all - Build all scenes in directory
- --tts macos_say - Select TTS adapter
- --voice Samantha - Select TTS voice
- -o output/ - Specify output directory
- --dry-run - Validate without building
- -v - Verbose logging
- Prints summary with success/failure counts

Pipeline Steps:

Generate TTS audio for each narration step (step_*.aiff)
Extract duration from each audio file
Save timing.json for debugging
Render animation HTML with timing
Record animation to WebM video
Concatenate audio segments to M4A
Merge video + audio to final MP4

Error Handling:

Each step wrapped in try/except
Failed builds return BuildResult(success=False, error=...)
Intermediate files preserved for debugging failed builds
PipelineError raised for file not found or parse errors

Tests: 21 unit tests pass, 1 integration test skipped (requires all dependencies)

Unit tests use mock adapters for isolation
Integration test marked with @pytest.mark.slow
Integration test checks for real dependencies before running

Usage Example:

# Validate a scene
python build.py tests/fixtures/sample_scene.json --dry-run

# Build a scene
python build.py tests/fixtures/sample_scene.json -o output/

# Build all scenes with verbose logging
python build.py scenes/ --all -v

Phase 6 Notes

Pending implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Animation Pipeline Architecture

System Overview

Directory Structure

Scene Specification Schema

Design Principles

Adapter Encapsulation

Adapter Interfaces

TTSAdapter (Abstract Base)

AnimationAdapter (Abstract Base)

MergerAdapter (Abstract Base)

Implementation Checklist

Phase 1: Core Infrastructure

Phase 2: TTS Adapter

Phase 3: Animation Adapter

Phase 4: Merger Adapter

Phase 5: Pipeline Orchestrator

Phase 6: Scene Migration

CLI Usage (Target)

Testing Strategy

Handoff Protocol for Subagents

Implementation Notes

Phase 1 Notes

Phase 2 Notes

Phase 3 Notes

Phase 4 Notes

Phase 5 Notes

Phase 6 Notes

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Animation Pipeline Architecture

System Overview

Directory Structure

Scene Specification Schema

Design Principles

Adapter Encapsulation

Adapter Interfaces

TTSAdapter (Abstract Base)

AnimationAdapter (Abstract Base)

MergerAdapter (Abstract Base)

Implementation Checklist

Phase 1: Core Infrastructure

Phase 2: TTS Adapter

Phase 3: Animation Adapter

Phase 4: Merger Adapter

Phase 5: Pipeline Orchestrator

Phase 6: Scene Migration

CLI Usage (Target)

Testing Strategy

Handoff Protocol for Subagents

Implementation Notes

Phase 1 Notes

Phase 2 Notes

Phase 3 Notes

Phase 4 Notes

Phase 5 Notes

Phase 6 Notes