A native Windows desktop app that turns a folder of vacation photos and videos into a finished MP4 recap — without opening a video editor.
The edit recipe is deliberately simple: photos and videos in chronological order, 1-second cross-dissolves, an opening title card, optional per-section title cards, optional captions, and background music ducked under video audio with a side-chain compressor. That workflow is fully scriptable; this app replaces the manual editor with curation, trim, and one-click render.
- Native desktop UI — PyWebView window with real OS file dialogs; media stays on disk (no uploads, no Docker).
- Photo curation — import files or folders, Lightroom star ratings, minimum-rating filter, per-photo include/caption.
- Video trimming — in/out points with keyboard shortcuts (
I/O,J/L, frame step) and HTML5 preview via byte-range streaming. - Section markers — inject title cards at any point on the chronological timeline (e.g. "Saguaro National Park / Stop 1").
- Music — multi-track playlist, drag-to-reorder, waveform trimmer, silence auto-detect, cross-fade between songs, length advisor.
- Order — chronological timeline view of all included photos and videos; edit or bulk-nudge capture timestamps to fix ordering before render.
- Settings — 4K/1080p, photo duration, cross-fade, optional Ken Burns, duck level, music cross-fade, codec (H.264/H.265), encoder selection (auto/NVENC/AMF/QSV/software), quality tier.
- Output — H.264 or H.265/HEVC MP4 with all effects baked in.
- Project files —
MyTrip.recap.jsonautosaves selections, trims, and settings; recent projects on the start screen.
- Create or open a
.recap.jsonproject. - Add photos, videos, and music from wherever they live on disk (
C:\Pictures\...,D:\Camera\..., etc.). - Curate: ratings, trims, section markers, captions, title text.
- Generate — pick an output path; the engine builds a timeline, renders overlays with Pillow, and encodes the final MP4 with FFmpeg.
Estimated recap length updates live: title duration + photos × photo duration + trimmed video lengths − crossfade overlap. Music loops the last track if short; fades out if long.
Windows 10/11. Download VacationRecap.exe from GitHub Releases (when published). No Python, FFmpeg, or Node install needed.
Two tools to install globally — that's it:
| Tool | Notes |
|---|---|
| uv | Python version manager + package manager. Installs Python 3.13 automatically from .python-version. |
| fnm | Node version manager. Installs Node.js automatically from .node-version. |
Python 3.13 and Node.js 22 LTS are not manual installs — uv and fnm handle them when you run .\bootstrap.ps1.
FFmpeg and ExifTool require no system install either. imageio-ffmpeg ships an FFmpeg binary inside the Python venv. ffprobe and ExifTool are downloaded into the gitignored vendor\ directory by bootstrap.ps1.
This is a native Windows app — develop and run it directly on Windows (no Docker). The toolchain is two single-binary tools that install in one line each and keep their state in their own per-user dirs:
# 1. Install uv (Python toolchain) and fnm (Node version manager) — one time, global
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
winget install Schniz.fnm
# (restart the terminal so both land on PATH)
# 2. One-time project setup: Python + Node deps + vendor binaries
.\bootstrap.ps1
# 3. Day-to-day dev loop (backend + Vite + native window, all hot-reloading)
.\dev.ps1bootstrap.ps1 runs uv sync, installs Node per .node-version, and downloads the vendored ffprobe.exe + ExifTool into vendor\. dev.ps1 then starts the FastAPI backend, the Vite dev server, and the native PyWebView window (pointed at Vite via RECAP_DEV_URL, so OS file dialogs and Svelte HMR both work). Closing the window stops everything.
The sections below cover the same steps in more detail, plus testing and the release build.
# uv — Python toolchain (installs Python 3.13 + deps, no system Python needed)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# fnm — Node version manager
winget install Schniz.fnmRestart your terminal, then verify: uv --version and fnm --version. Both are single self-contained binaries; uninstalling later is just removing the binary + its per-user cache dir.
git clone https://github.com/YOUR_USERNAME/vacation-video-generator.git
cd vacation-video-generator
.\bootstrap.ps1bootstrap.ps1 does everything else:
uv sync --extra dev— reads.python-version(3.13) +pyproject.toml, creates.venv/with all runtime + dev deps. Nopip, no manual venv activation (uv runhandles it).fnm use --install-if-missing(reads.node-version) +npm installinfrontend/.- Downloads the vendored
ffprobe.exeand ExifTool intovendor/(gitignored). These run both in dev (viaFFPROBE_PATH/EXIFTOOL_PATH, set bydev.ps1) and in the bundled.exe. FFmpeg itself ships insideimageio-ffmpeg— nothing to install.
To re-fetch the vendor binaries on their own: uv run python build/vendor_ffprobe.py and uv run python build/vendor_exiftool.py.
.\dev.ps1This is the everyday workflow. It launches three processes and wires them together:
- Backend —
watchfilesrunspython -m uvicorn backend.app:create_app --factory --host 127.0.0.1 --port 8000, restarting it on Python edits. (dev.ps1uses thewatchfilesCLI rather thanuvicorn --reload: uvicorn's own Windows reloader sends a consoleCtrl+Cthat the OS broadcasts to every process sharing the terminal, which would also tear down the native window.) - Vite — the dev server on
http://localhost:5173with HMR, proxying/api→127.0.0.1:8000. - Native window (
python -m desktop.main) — the PyWebView shell, pointed at the Vite server viaRECAP_DEV_URL.
Both sides hot-reload: Svelte via Vite HMR, Python via watchfiles. Native file dialogs work because they run in the window's process through the PyWebView JS bridge (window.pywebview.api, wired up in desktop/main.py) — independent of which process serves the page. Closing the window tears down the backend and Vite.
dev.ps1 is just orchestration (it sets FFPROBE_PATH, EXIFTOOL_PATH, and RECAP_DEV_URL for you). To run by hand:
# Backend, fixed port for Vite's proxy (one terminal)
$env:FFPROBE_PATH = "$PWD\vendor\ffprobe\ffprobe.exe"
$env:EXIFTOOL_PATH = "$PWD\vendor\exiftool\exiftool.exe"
uv run uvicorn backend.app:create_app --factory --port 8000 --reload
# Vite (another terminal)
cd frontend; npm run dev
# The native window, pointed at the Vite server (a third terminal)
$env:RECAP_DEV_URL = "http://localhost:5173"; uv run python -m desktop.mainWindow URL: with
RECAP_DEV_URLset, the shell skips its in-process backend and loads that URL. Without it (the production/exe path), the shell starts FastAPI on a random free port and serves the built SPA fromfrontend/dist/— runnpm run buildfirst in that case.Browser-direct (
http://localhost:5173in a normal browser) is fine for pure layout/CSS work, but file dialogs and other native features only exist inside the PyWebView window (thewindow.pywebview.apibridge is absent in a plain browser, so pickers resolve to empty).
cd frontend
npm run buildOutput goes to frontend/dist/. The PyWebView window (and PyInstaller bundle) serve from this directory.
uv run pytestRuns all tests under tests/. The suite covers the timeline model, xfade offset math, Ken Burns keyframe generation, EXIF metadata scanning, playback-speed handling, and the two-stage render pipeline / encoder registry.
uv run pytest -v # verboseThe smoke test (tests/test_e2e_smoke.py) builds a real project from committed assets in tests/test-media/, runs the full FFmpeg render pipeline, and checks that test_render.mp4 is created and valid. Use -s so progress lines from the render print to the terminal.
.\smoke.ps1
# equivalent to: uv run pytest tests/test_e2e_smoke.py -v -sOutput is written to tests/test-output/ (test_render.mp4; gitignored). The test uses 1080p settings for a faster encode; allow a few minutes on first run.
The render has two stages: stage 1 encodes each segment independently in parallel (the slow part, ~1–2 h for a full project); stage 2 assembles the intermediates with xfade, audio mixing, and final encode (fast, seconds to minutes).
When debugging stage 2 — tweaking audio levels, encoder settings, transitions, etc. — set VRG_REUSE_INTERMEDIATES=1 before launching the app. Stage-1 intermediates are then saved to a stable directory keyed to the output path, and any already-rendered segment is skipped on the next run. The directory is printed to the log at startup.
$env:VRG_REUSE_INTERMEDIATES = "1"
.\dev.ps1 # or however you launch the appThe first run with the flag is still a full render (it builds and keeps the intermediates). Every run after that skips stage 1 and jumps straight to assembly.
Cache invalidation: the key is the output path only — it does not detect changes to resolution, fps, crossfade, trims, captions, or source media. If you change any render setting or clip, delete the printed
vrg_inter_reuse_*directory (in%TEMP%) to force a full rebuild, or unset the variable.To clear the reuse state:
Remove-Item Env:\VRG_REUSE_INTERMEDIATES
uv run ruff check .
uv run ruff format --check .# 1. Build the Svelte SPA
cd frontend
npm run build
cd ..
# 2. Confirm vendor binaries exist (bootstrap.ps1 fetches these; or run
# `uv run python build/vendor_ffprobe.py` and `build/vendor_exiftool.py`)
# 3. Run PyInstaller
uv run pyinstaller build/recap.specOutput: dist/VacationRecap.exe (~180 MB one-file bundle containing Python 3.13, FFmpeg, ExifTool, WebView2 shims, and the built SPA).
build.ps1 wraps these three steps (vendor check → SPA build → PyInstaller) into one command, and .github/workflows/build-exe.yml runs the same build in CI on a tag push — the canonical release path.
First launch unpacks to a temp directory — a 1–2 s startup delay is normal.
┌─────────────────────────────────────────────────────────────┐
│ desktop/main.py — PyWebView window, OS file dialogs, │
│ File menu (New / Open / Recent) │
└──────────────────────────┬──────────────────────────────────┘
│ http://127.0.0.1:<random port>
┌──────────────────────────▼──────────────────────────────────┐
│ backend/app.py — FastAPI: REST + SSE + /media byte-range │
│ + static SPA serving │
├─────────────────────────────────────────────────────────────┤
│ engine.py Timeline → FFmpeg filter graph → MP4 │
│ overlays.py Pillow: title card, section cards, │
│ caption pill PNGs (hash-cached) │
│ metadata.py EXIF/XMP via ExifTool, ffprobe, │
│ thumbnail generation, hash cache │
│ silence.py ffmpeg silencedetect → music trim hints │
│ models.py Pydantic data model (Project, items, …) │
└─────────────────────────────────────────────────────────────┘
↑ served as static files from frontend/dist/
┌─────────────────────────────────────────────────────────────┐
│ Svelte 4 SPA (frontend/src/) │
│ sections/ Start Title Photos Videos Order Music │
│ Sections Settings Generate │
│ components/ VideoTrim AudioTrim │
│ PhotoLightbox VideoLightbox │
│ api.ts typed REST client │
│ store.ts writable stores + autosave │
└─────────────────────────────────────────────────────────────┘
- Scan inputs — EXIF/XMP for photos (ExifTool),
ffprobefor video/audio; results cached in memory and persisted in project JSON. - Build timeline — selected, trimmed clips sorted chronologically; section markers interleaved at their declared positions.
- Render overlays — title cards, section cards (bracket frame + "Stop N" + hero blur background), caption pills — all PNGs via Pillow, content-hash cached.
- FFmpeg filter graph (two-stage, parallel) — stage 1 renders each segment independently in parallel (blur-pad, Ken Burns, caption overlay); stage 2 assembles intermediates with
xfadechain, video-clip audio,sidechaincompressmusic ducking, and fade in/out. Encoder is pluggable: NVENC → AMF → QSV → software (auto-detected; quality tier and codec — H.264 or H.265/HEVC — selectable in Settings).
Reference output spec: 3840×2160, 30 fps (ntsc=FALSE), 1 s cross-dissolves, fade to black at end.
| Effect | Filter |
|---|---|
| Aspect-preserving blur pad | split→scale-fill+boxblur bg / scale-fit fg → overlay |
| Ken Burns | zoompan=z='1+t*s':x='...':y='...':d=<frames>:fps=<fps> |
| Cross-dissolve chain | [v0][v1]xfade=fade:duration=1:offset=T (cumulative offsets) |
| Music ducking | [music][va]sidechaincompress=threshold=…:ratio=4:attack=200:release=1000 |
| Loudness normalisation | loudnorm=I=-16:TP=-1.5:LRA=11 per clip + amix normalize=0 + alimiter |
| Silence detect | ffmpeg -af silencedetect=noise=-50dB:d=0.3 -f null - |
| Key | Action |
|---|---|
Space |
Play / pause |
I |
Set in-point to playhead |
O |
Set out-point to playhead |
← / → |
Step ±1 frame |
Shift+← / Shift+→ |
Step ±1 second |
J / L |
Jump −10 s / +10 s |
desktop/
main.py PyWebView shell, JS API bridge
backend/
app.py FastAPI app (REST + SSE + static SPA)
engine.py Timeline → two-stage parallel FFmpeg render
encoders.py Pluggable encoder registry (NVENC/AMF/QSV/software, H.264+H.265)
overlays.py Pillow PNG renderers
metadata.py EXIF / XMP / ffprobe / thumbnail cache
silence.py ffmpeg silencedetect wrapper
winjob.py Kill-on-close job: ties ffmpeg children to backend lifetime (Windows)
models.py Pydantic project data model
frontend/
src/
App.svelte Top-level layout + nav
api.ts Typed REST client
store.ts Svelte stores + autosave
sections/ Start Title Photos Videos Order
Music Sections Settings Generate
components/
VideoTrim.svelte Canvas timeline + HTML5 video + keyboard
AudioTrim.svelte wavesurfer.js waveform + drag handles
PhotoLightbox.svelte Full-screen photo preview
VideoLightbox.svelte Full-screen video preview
dist/ Built SPA (committed or generated by npm run build)
build/
recap.spec PyInstaller spec
tests/
test-media/ Committed assets for the e2e smoke test
test-output/ Smoke-test render output (gitignored)
test_models.py Pydantic model round-trips
test_metadata.py EXIF/XMP scanning and ExifTool integration
test_engine_speed.py Playback-speed / atempo decomposition
test_render_pipeline.py Two-stage pipeline and encoder registry
vendor/
exiftool/ ExifTool Windows exe (gitignored, fetched by bootstrap.ps1)
ffprobe/ ffprobe.exe (gitignored, fetched by bootstrap.ps1)
pyproject.toml Project metadata + uv/pip dependencies
uv.lock Locked dependency versions
.python-version 3.13 (read by uv)
When validating a build end-to-end:
- Smoke test — 8 photos (mixed ratings), 4 videos, 2 music tracks. Rating ≥4 filter, video trim via keyboard, music silence auto-detect, title card, Generate → MP4 with cross-dissolves and ducked music, fade out.
- Ken Burns — render at strength 0, 0.3, and 1.0; confirm static / subtle / aggressive.
- Sections & captions — two markers, custom hero, subtitle override + ↺ restore, caption on a photo and on a video clip (First 3 s mode). Confirm section cards appear at right positions in MP4.
- Persistence — close and reopen
.recap.json; all selections, trim points, and settings restored. - Distribution — copy
VacationRecap.exeto a fresh Windows machine with no dev tools; double-click; confirm end-to-end.
Everything installed by bootstrap.ps1 is isolated and fully removable.
These are all gitignored. Deleting them has no effect on git history and
bootstrap.ps1 will recreate them on next run.
# Python venv (recreated by: uv sync --extra dev)
Remove-Item -Recurse -Force .venv
# Vendored binaries (recreated by: .\bootstrap.ps1)
Remove-Item -Recurse -Force vendor
# Node modules (recreated by: npm --prefix frontend install)
Remove-Item -Recurse -Force frontend\node_modules
# Built SPA (recreated by: npm --prefix frontend run build)
Remove-Item -Recurse -Force frontend\dist
# Smoke-test render output
Remove-Item -Recurse -Force tests\test-outputThe app writes its cache and recent-projects list here — separate from the dev tools and not touched by uninstalling them:
%APPDATA%\VacationRecap\
Delete this folder to remove all cached thumbnails and the recent-projects list.
Your .recap.json project files are wherever you saved them and are unaffected.
- Beat-synced cuts to music (possible later via
librosa) - Vision-model "best moment" selection inside clips
- macOS / Linux distribution (code is largely portable; only Windows
.exeis planned)