Core Principle: The scene specification is the product. The engine is a set of loosely coupled, swappable adapters.
┌─────────────────────────────────────────────────────────────────┐
│ Scene Specification │
│ scenes/two_pointers/scene.json │
│ - narration steps (text) │
│ - visualization config (array, pointers, highlights) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Pipeline Orchestrator │
│ build.py │
│ - Loads scene spec │
│ - Wires adapters together │
│ - Executes build pipeline │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────────────┼─────────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ TTS Adapter │ │ Animation │ │ Merger │
│ │ │ Adapter │ │ Adapter │
│ - generate() │ │ - render() │ │ - merge() │
│ - duration() │ │ - duration() │ │ │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
Implementations: Implementations: Implementations:
- MacOSSay - HTMLAnimation - FFmpegMerger
- Piper - ManimAnimation
- OpenAI (future)
animations/
├── ARCHITECTURE.md # This file
├── build.py # Pipeline orchestrator (CLI entry point)
├── pipeline/
│ ├── __init__.py
│ ├── orchestrator.py # Main pipeline logic
│ ├── adapters/
│ │ ├── __init__.py
│ │ ├── base.py # Abstract base classes (interfaces)
│ │ ├── tts/
│ │ │ ├── __init__.py
│ │ │ ├── macos_say.py # macOS say command
│ │ │ ├── piper.py # Piper TTS (future)
│ │ │ └── openai.py # OpenAI TTS (future)
│ │ ├── animation/
│ │ │ ├── __init__.py
│ │ │ ├── html.py # HTML/CSS/JS renderer
│ │ │ └── manim.py # Manim renderer (future)
│ │ ├── recorder/
│ │ │ ├── __init__.py
│ │ │ └── playwright.py # Playwright screen recorder
│ │ └── merger/
│ │ ├── __init__.py
│ │ └── ffmpeg.py # FFmpeg audio/video merger
│ └── schema.py # Scene spec validation (Pydantic)
├── scenes/ # Scene specifications (the product)
│ └── two_pointers/
│ ├── scene.json # Narration + visualization config
│ └── assets/ # Any scene-specific assets
├── templates/ # HTML animation templates
│ └── array_animation.html # Jinja2 template for array scenes
├── output/ # Build artifacts (gitignored)
│ └── two_pointers/
│ ├── timing.json # Auto-generated durations
│ ├── step_*.aiff # Individual audio segments
│ ├── audio.m4a # Concatenated audio
│ ├── video.webm # Recorded animation
│ └── final.mp4 # Merged output
└── tests/
├── __init__.py
├── test_tts_adapters.py
├── test_orchestrator.py
└── fixtures/
└── sample_scene.json
{
"$schema": "scene.schema.json",
"id": "two_pointers_basic",
"title": "Two Pointers: O(n)",
"description": "Demonstrates two pointer technique on sorted array",
"visualization": {
"type": "array_pointers",
"config": {
"array": [2, 7, 11, 15],
"target": 9,
"theme": "dark"
}
},
"steps": [
{
"id": "init",
"narration": "We start with two pointers at each end of the sorted array.",
"state": {
"left": 0,
"right": 3,
"highlight": null,
"message": "Initialize pointers"
}
},
{
"id": "step1",
"narration": "Two plus fifteen equals seventeen. That's bigger than our target of nine.",
"state": {
"left": 0,
"right": 3,
"highlight": "sum",
"message": "2 + 15 = 17 > 9"
}
},
{
"id": "move_right",
"narration": "Since the sum is too big, we move the right pointer left.",
"state": {
"left": 0,
"right": 2,
"highlight": "right_move",
"message": "Move right pointer"
}
}
]
}Adapter-specific details must stay inside the adapter. The orchestrator and other components should not hardcode assumptions about adapter behavior.
Examples:
- Output formats: Each TTS adapter declares its
output_extension(.aiff,.wav,.mp3). The orchestrator usesself.tts.output_extensionrather than hardcoding.aiff. - Dependencies: Adapters handle their own lazy-loading of dependencies (e.g., OpenAI client, Piper models).
- Configuration: Adapter-specific config (voice, model, paths) lives in the adapter constructor.
This enables truly swappable adapters without touching orchestrator code.
class TTSAdapter(ABC):
@abstractmethod
def generate(self, text: str, output_path: Path) -> Path:
"""Generate audio file from text. Returns path to audio file."""
pass
@abstractmethod
def get_duration(self, audio_path: Path) -> float:
"""Get duration of audio file in seconds."""
pass
@abstractmethod
def name(self) -> str:
"""Adapter identifier for logging."""
passclass AnimationAdapter(ABC):
@abstractmethod
def render(self, scene_spec: SceneSpec, timing: List[float], output_path: Path) -> Path:
"""Render animation video. Returns path to video file."""
pass
@abstractmethod
def name(self) -> str:
"""Adapter identifier for logging."""
passclass MergerAdapter(ABC):
@abstractmethod
def merge(self, video_path: Path, audio_path: Path, output_path: Path) -> Path:
"""Merge video and audio. Returns path to merged file."""
pass
@abstractmethod
def concat_audio(self, audio_paths: List[Path], output_path: Path) -> Path:
"""Concatenate multiple audio files. Returns path to combined file."""
pass- Create directory structure
- Implement
pipeline/schema.py(Pydantic models for scene spec) - Implement
pipeline/adapters/base.py(abstract interfaces) - Write basic tests for schema validation
- Implement
pipeline/adapters/tts/macos_say.py - Add duration extraction using ffprobe
- Write tests for TTS adapter
- Create
templates/array_animation.html(Jinja2 template) - Implement
pipeline/adapters/animation/html.py - Implement
pipeline/adapters/recorder/playwright.py - Write tests for animation rendering
- Implement
pipeline/adapters/merger/ffmpeg.py - Add audio concatenation
- Add video/audio merge with proper sync
- Write tests for merger
- Implement
pipeline/orchestrator.py - Wire all adapters together
- Implement
build.pyCLI entry point - Write integration tests
- Create
scenes/two_pointers/scene.jsonfrom existing narration - Test full pipeline end-to-end
- Generate final synced video
# Build a single scene
python build.py scenes/two_pointers/scene.json
# Build with specific adapters
python build.py scenes/two_pointers/scene.json --tts openai --animation html
# Build all scenes
python build.py scenes/
# Dry run (validate only)
python build.py scenes/two_pointers/scene.json --dry-run
# Watch mode (rebuild on change)
python build.py scenes/two_pointers/scene.json --watch- Unit tests: Each adapter in isolation with mock inputs
- Integration tests: Full pipeline with a minimal test scene
- Fixtures:
tests/fixtures/sample_scene.jsonwith known expected outputs
# Run tests
pytest animations/tests/ -v
# Run with coverage
pytest animations/tests/ --cov=pipelineEach subagent must:
- Check off completed items in this document
- Add a summary comment under "Implementation Notes" section
- Run any relevant tests before handoff
- Note any issues or decisions made
Subagents add notes here during implementation
Completed 2026-01-19
Files Created:
pipeline/__init__.py- Package initpipeline/schema.py- Pydantic models:StepState,Step,VisualizationConfig,SceneSpecpipeline/adapters/__init__.py- Package initpipeline/adapters/base.py- Abstract base classes:TTSAdapter,AnimationAdapter,RecorderAdapter,MergerAdapterpipeline/adapters/tts/__init__.py- Package initpipeline/adapters/animation/__init__.py- Package initpipeline/adapters/recorder/__init__.py- Package initpipeline/adapters/merger/__init__.py- Package inittests/__init__.py- Package inittests/test_schema.py- 18 tests for schema validationtests/fixtures/sample_scene.json- Sample scene fixture
Decisions:
- Used Pydantic v2 for schema validation (installed as dependency)
- Added helper methods to
SceneSpec:get_narrations()andget_step_ids()for convenience - Made
namea property (not method) on all adapters for cleaner access RecorderAdapteradded as separate interface fromAnimationAdapterfor flexibility (HTML adapter renders template, recorder captures it to video)- All
StepStatefields are optional to support various visualization types
Tests: 18 tests pass in tests/test_schema.py
Completed 2026-01-19
Files Created:
pipeline/adapters/tts/macos_say.py- MacOSSayAdapter implementationtests/test_tts_adapters.py- 23 tests (21 pass, 2 skipped when ffprobe unavailable)
Files Modified:
pipeline/adapters/tts/__init__.py- Exports MacOSSayAdapter and TTSError
Implementation Details:
MacOSSayAdapterclass with configurable voice (default: Samantha) and ffprobe pathgenerate(text, output_path)- Uses macOSsay -v {voice} -o {path} "{text}"commandget_duration(audio_path)- Uses ffprobe with JSON output to extract durationTTSErrorexception class for all TTS-related errors
Design Decisions:
- ffprobe path defaults to
~/.local/bin/ffprobebut is configurable via constructor - Empty/whitespace-only text raises TTSError early (before calling say)
- Output directories are created automatically if they don't exist
- Integration tests are skipped gracefully when say or ffprobe are unavailable
Tests: 21 unit tests pass, 2 integration tests skipped (ffprobe not installed on test system)
Usage Example:
from pipeline.adapters.tts import MacOSSayAdapter
adapter = MacOSSayAdapter(voice="Samantha")
audio_path = adapter.generate("Hello world", Path("output.aiff"))
duration = adapter.get_duration(audio_path)Completed 2026-01-19
Files Created:
templates/array_animation.html- Jinja2 template for array pointer animationspipeline/adapters/animation/html.py- HTMLAnimationAdapter implementationpipeline/adapters/recorder/playwright.py- PlaywrightRecorder implementationtests/test_animation_adapters.py- 16 tests (14 unit tests, 2 integration tests)
Files Modified:
pipeline/adapters/animation/__init__.py- Exports HTMLAnimationAdapterpipeline/adapters/recorder/__init__.py- Exports PlaywrightRecorderpipeline/adapters/__init__.py- Exports all adapters for easy imports
Implementation Details:
-
HTMLAnimationAdapter (
pipeline/adapters/animation/html.py)- Renders SceneSpec to HTML using Jinja2 templates
render(spec, timing, output_path)- Generates HTML file with embedded animation- Maps visualization types to templates (array_pointers -> array_animation.html)
- Validates timing list length matches number of steps
- Creates parent directories automatically
-
PlaywrightRecorder (
pipeline/adapters/recorder/playwright.py)- Records HTML animations to WebM video using Playwright
record(html_path, duration, output_path)- Captures video of specified duration- Configurable viewport size (default: 1280x720)
- Waits for
window.animationReadybefore recording - Async implementation with sync wrapper for easy use
-
array_animation.html (Jinja2 template)
- Dark theme with clean design
- CSS animations for pointer movement and highlights
- JavaScript timeline that steps through animation states
- Sets
window.animationDurationfor recorder - Supports highlight modes: sum, found, left_move, right_move
- Responsive array cells with pointer labels (L/R)
Template Variables:
{{ title }}- Scene title{{ array }}- Array values (e.g., [2, 7, 11, 15]){{ target }}- Target sum{{ steps }}- List of step objects with state{{ timing }}- List of durations per step in seconds
Design Decisions:
- HTML adapter only generates HTML; recording is separate (RecorderAdapter)
- Playwright records to temp directory, then moves to final location
- WebM format is native Playwright output; format conversion left to merger
- Animation waits for
window.animationReady = truebefore starting - Duration buffer of 0.5s added to ensure complete capture
Tests: 14 unit tests pass, 2 integration tests pass (require Playwright browsers)
- Unit tests use tempfile for isolated file operations
- Integration tests marked as
slowfor optional skipping - Integration tests verify actual video file creation
Usage Example:
from pathlib import Path
from pipeline.schema import SceneSpec
from pipeline.adapters.animation import HTMLAnimationAdapter
from pipeline.adapters.recorder import PlaywrightRecorder
# Load scene spec
spec = SceneSpec(**scene_data)
timing = [2.0, 3.0, 2.5] # seconds per step
# Render HTML
html_adapter = HTMLAnimationAdapter(Path("templates"))
html_path = html_adapter.render(spec, timing, Path("output/animation.html"))
# Record to video
recorder = PlaywrightRecorder(viewport_width=1280, viewport_height=720)
total_duration = sum(timing)
video_path = recorder.record(html_path, total_duration, Path("output/video.webm"))Completed 2026-01-19
Files Created:
pipeline/adapters/merger/ffmpeg.py- FFmpegMerger implementationtests/test_merger_adapters.py- 20 tests for merger adapter
Files Modified:
pipeline/adapters/merger/__init__.py- Exports FFmpegMerger and FFmpegMergerError
Implementation Details:
FFmpegMergerclass with configurable FFmpeg path (default:~/.local/bin/ffmpeg)concat_audio(audio_paths, output_path)- Concatenates multiple audio files using FFmpeg concat demuxer- Automatically re-encodes when input/output formats differ (e.g., AIFF to M4A)
- Uses stream copy (
-c copy) when formats match for efficiency
merge(video_path, audio_path, output_path)- Merges video and audio streams- Uses
tpadfilter withstop_mode=cloneandstop=-1to freeze last frame indefinitely - Uses
-shortestflag so FFmpeg stops when audio ends - Output uses H.264 video (
libx264) and AAC audio for wide compatibility
- Uses
FFmpegMergerErrorexception class for all FFmpeg-related errors
Design Decisions:
- FFmpeg path defaults to
~/.local/bin/ffmpegbut is configurable via constructor - Missing input files raise
FFmpegMergerErrorearly (before calling FFmpeg) - Output directories are created automatically if they don't exist
- Commands are logged for debugging (using Python logging)
- When concatenating a single file, copies/converts directly without using concat demuxer (optimization)
- Format detection based on file extensions to decide between stream copy and re-encoding
FFmpeg Command Details:
- Concat:
ffmpeg -f concat -safe 0 -i filelist.txt -c copy output.m4a(or-c:a aacfor format conversion) - Merge:
ffmpeg -i video.webm -i audio.aiff -filter_complex "[0:v]tpad=stop_mode=clone:stop=-1[v]" -map "[v]" -map 1:a -c:v libx264 -c:a aac -shortest output.mp4
Tests: 20 tests pass (18 unit tests with mocked subprocess, 2 integration tests with real FFmpeg)
Usage Example:
from pipeline.adapters.merger import FFmpegMerger
merger = FFmpegMerger()
# Concatenate audio segments
audio_path = merger.concat_audio(
[Path("step_0.aiff"), Path("step_1.aiff"), Path("step_2.aiff")],
Path("output/audio.m4a")
)
# Merge video and audio (video extended if audio is longer)
final_path = merger.merge(
Path("output/video.webm"),
Path("output/audio.m4a"),
Path("output/final.mp4")
)Completed 2026-01-19
Files Created:
pipeline/orchestrator.py- PipelineOrchestrator class and BuildResult dataclassbuild.py- CLI entry point for building scenestests/test_orchestrator.py- 22 tests (21 unit tests pass, 1 integration test skipped)
Implementation Details:
-
PipelineOrchestrator (
pipeline/orchestrator.py)- Wires all adapters together (TTS, Animation, Recorder, Merger)
build(scene_spec, dry_run=False)- Executes the full 6-step pipelinebuild_from_file(scene_path)- Loads JSON and buildsbuild_from_file_dry_run(scene_path)- Validates without building- Creates output subdirectory per scene (
output/{scene_id}/) - Saves intermediate files:
step_*.aiff,timing.json,animation.html,video.webm,audio.m4a,final.mp4 - Logs progress at each step using Python logging
- Returns
BuildResultwith timing info and success status
-
BuildResult (dataclass)
scene_id: Scene identifieroutput_path: Path to final.mp4timing: List of step durations in secondstotal_duration: Sum of all step durationssuccess: Boolean success flagerror: Error message if failedintermediate_files: Dict of intermediate file paths for debugging
-
build.py (CLI)
python build.py scene.json- Build single scenepython build.py scenes/ --all- Build all scenes in directory--tts macos_say- Select TTS adapter--voice Samantha- Select TTS voice-o output/- Specify output directory--dry-run- Validate without building-v- Verbose logging- Prints summary with success/failure counts
Pipeline Steps:
- Generate TTS audio for each narration step (
step_*.aiff) - Extract duration from each audio file
- Save timing.json for debugging
- Render animation HTML with timing
- Record animation to WebM video
- Concatenate audio segments to M4A
- Merge video + audio to final MP4
Error Handling:
- Each step wrapped in try/except
- Failed builds return
BuildResult(success=False, error=...) - Intermediate files preserved for debugging failed builds
- PipelineError raised for file not found or parse errors
Tests: 21 unit tests pass, 1 integration test skipped (requires all dependencies)
- Unit tests use mock adapters for isolation
- Integration test marked with
@pytest.mark.slow - Integration test checks for real dependencies before running
Usage Example:
# Validate a scene
python build.py tests/fixtures/sample_scene.json --dry-run
# Build a scene
python build.py tests/fixtures/sample_scene.json -o output/
# Build all scenes with verbose logging
python build.py scenes/ --all -vPending implementation