Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,16 @@ permissions:
jobs:
typecheck-and-unit:
runs-on: ubuntu-latest
strategy:
matrix:
# exercise the advertised engines floor (>=20) alongside current LTS, so
# a Node-20-incompatible API can't slip in against the stated support
node-version: [20, 22]
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4.4.0
with:
node-version: 22
node-version: ${{ matrix.node-version }}
cache: npm
- run: npm ci
- run: npm run typecheck
Expand Down
18 changes: 10 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,8 @@ Procedural palettes (generated at render time, no asset): `aurora`, `midnight`,

## 🎵 Music

Videos are silent by default. `--music` (on `render` and `generate`) muxes a looped,
`render` is silent by default; on `generate` the AI director picks the bundled track
matching your app's look. `--music` (on `render` and `generate`) muxes a looped,
loudness-normalized track with fade-in/out under the video — never re-encoding the
video and never changing its length:

Expand All @@ -216,14 +217,15 @@ supercut render --take out/take --music path/to/your-track.mp3 # your own fi
Bundled tracks (in `assets/music/` — original instrumentals made for supercut;
provenance in `assets/music/CREDITS.md`):

| track | vibe |
| ---------- | ------------ |
| `pulse` | minimal-tech |
| `daybreak` | warm |
| `midnight` | cinematic |
| `momentum` | energetic |
| track | vibe |
| ---------- | ----------------------- |
| `pulse` | minimal tech-house |
| `daybreak` | bright melodic house |
| `midnight` | dark synthwave/techno |
| `momentum` | driving minimal techno |

`--music off` (or omitting the flag) keeps the video silent.
`--music off` forces a silent cut; on `render`, omitting the flag does too. `--music`
always outranks the director's pick on `generate`.

## 🔒 Privacy

Expand Down
34 changes: 23 additions & 11 deletions assets/music/CREDITS.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,30 @@
# Bundled music — provenance & license

All four tracks are **original instrumental works produced for supercut**.
They were generated with MiniMax Music 2.5 from original style prompts written
for this project, then post-processed with ffmpeg (crossfade extension to
~90–110s beds and loudness normalization to −16 LUFS). No pre-existing songs,
samples, or melodies were referenced or used as input.
All four tracks are **original instrumental works produced for supercut**,
synthesized from pure oscillators (sub bass, drum machine, filtered noise
hats, saw-wave pads, arpeggios) by `tools/synth-music.py` — there is no vocal
source, no sample, and no pre-existing song anywhere in the signal chain, so
the tracks cannot contain vocals.

**Reproducing the beds end-to-end.** `tools/synth-music.py` regenerates
equivalent beds (not necessarily bit-identical) from scratch. It needs:

* Python packages `numpy` and `scipy` — `pip install numpy scipy`
* `ffmpeg` on your PATH

Run `python3 tools/synth-music.py assets/music`. For each mood it synthesizes a
short WAV loop, then ffmpeg self-crossfades that loop to ~92s
(`acrossfade=d=1` ×3 → `atrim=0:92`), loudness-normalizes it
(`loudnorm=I=-15:TP=-1.5:LRA=9`), and encodes a 192 kbit/s 44.1 kHz stereo MP3 —
the exact pipeline that produced the checked-in beds.

To the extent the maintainers hold any rights in these recordings, they are
dedicated to the public domain under [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/).
Use them in your videos — commercial or not — with no attribution required.

| track | vibe | length | bpm |
| -------------- | ---------------------------- | ------ | ---- |
| `pulse.mp3` | minimal tech, sleek | 100s | ~104 |
| `daybreak.mp3` | warm piano, optimistic | 93s | ~92 |
| `midnight.mp3` | cinematic ambient, premium | 110s | ~80 |
| `momentum.mp3` | driving electronic, punchy | 100s | ~122 |
| track | vibe | length | bpm |
| -------------- | ----------------------------- | ------ | ---- |
| `pulse.mp3` | minimal tech-house, sleek | 95s | ~104 |
| `daybreak.mp3` | bright melodic house, upbeat | 95s | ~110 |
| `midnight.mp3` | dark synthwave/techno, premium| 100s | ~100 |
| `momentum.mp3` | driving minimal techno | 95s | ~122 |
Binary file modified assets/music/daybreak.mp3
Binary file not shown.
Binary file modified assets/music/midnight.mp3
Binary file not shown.
Binary file modified assets/music/momentum.mp3
Binary file not shown.
Binary file modified assets/music/pulse.mp3
Binary file not shown.
21 changes: 10 additions & 11 deletions examples/demo.recipe.json
Original file line number Diff line number Diff line change
@@ -1,30 +1,29 @@
{
"version": 0,
"app_url": "http://127.0.0.1:4173",
"music_track": "institutional-01",
"music_track": "daybreak",
"scenes": [
{
"name": "landing-cta",
"name": "the-pitch-and-signup",
"priority": 1,
"entry": { "url": "http://127.0.0.1:4173/", "prelude": [] },
"depends_on": [],
"actions": [
{ "kind": "click", "selector": "#cta", "duration_ms": 1800 },
{ "kind": "type", "selector": "#email", "text": "ada@lumon.dev", "duration_ms": 2200 },
{ "kind": "click", "selector": "#join", "duration_ms": 1400 }
{ "kind": "click", "selector": "#cta", "duration_ms": 1400 },
{ "kind": "type", "selector": "#email", "text": "ada@lumon.dev", "duration_ms": 1900 },
{ "kind": "click", "selector": "#join", "focus_selector": "#signup", "duration_ms": 1500 }
],
"hold_ms": 800
"hold_ms": 900
},
{
"name": "dashboard",
"name": "the-live-product",
"priority": 2,
"entry": { "url": "http://127.0.0.1:4173/dash", "prelude": [] },
"entry": { "url": "http://127.0.0.1:4173/dash/", "prelude": [] },
"depends_on": [],
"actions": [
{ "kind": "hover", "selector": "#task-ship", "duration_ms": 1600 },
{ "kind": "wait", "duration_ms": 1200 }
{ "kind": "hover", "selector": "#task-ship", "focus_selector": "#tasks", "duration_ms": 1600 }
],
"hold_ms": 600
"hold_ms": 1000
}
]
}
2 changes: 1 addition & 1 deletion examples/pandora-demo.recipe.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"version": 0,
"app_url": "http://127.0.0.1:8455",
"music_track": "institutional-01",
"music_track": "pulse",
"scenes": [
{
"name": "trace-one-company",
Expand Down
20 changes: 20 additions & 0 deletions examples/pulse-demo.recipe.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"version": 0,
"app_url": "http://127.0.0.1:4100",
"music_track": "midnight",
"scenes": [
{
"name": "triage-the-fleet",
"priority": 1,
"entry": { "url": "http://127.0.0.1:4100/", "prelude": [] },
"depends_on": [],
"actions": [
{ "kind": "click", "selector": "[data-testid=\"service-search\"]", "duration_ms": 900 },
{ "kind": "type", "selector": "[data-testid=\"service-search\"]", "text": "payments", "submit": true, "focus_selector": "[data-testid=\"metrics-panel\"]", "duration_ms": 1700 },
{ "kind": "click", "selector": ":nth-match([data-testid=\"service-item\"], 1)", "focus_selector": "[data-testid=\"kpi-row\"]", "duration_ms": 1500 },
{ "kind": "click", "selector": ":nth-match([data-testid=\"service-item\"], 3)", "focus_selector": "[data-testid=\"metrics-panel\"]", "duration_ms": 1500 }
],
"hold_ms": 900
}
]
}
110 changes: 92 additions & 18 deletions src/director/analyze.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,17 @@ import { extractJson, type ChatPart, type LlmClient } from "./llm.js";
import type { PageDigest } from "./inventory.js";
import { redactForPrompt } from "../security/redaction.js";

/** the bundled soundtrack library (assets/music/) — the director picks one as
* part of understanding the product, so every generate run ships with music */
export const MUSIC_TRACKS = ["pulse", "daybreak", "midnight", "momentum"] as const;

export const appAnalysis = z.object({
product_summary: z.string().min(10).max(600),
/** the brand/product name for the title + close cards (e.g. "Meridian") */
/** the brand/product name for the title + close cards (e.g. "Acme") */
product_name: z.string().min(2).max(40),
/** bundled track matching the app's look/energy — enum here so a made-up
* track name bounces back at validation, never reaching the render */
music_track: z.enum(MUSIC_TRACKS),
/** the launch HOOK — the problem/promise the video opens on, in the
* customer's words, not a feature ("Three of your sites bleed cash. Which?").
* This is what removes ambiguity about what the video is selling. */
Expand All @@ -40,27 +47,73 @@ export const appAnalysis = z.object({

export type AppAnalysis = z.infer<typeof appAnalysis>;

/**
* Heal a selector the model copied with trailing junk. Inventory lines read
* `` `<selector>` [tag] "text" ``; a model that grabs past the closing backtick
* appends the ` [tag]…` annotation (e.g. `:nth-match(…, 1) [button]`). Selectors
* themselves contain `]`/`)`/quotes, so we can't regex-strip safely — instead we
* accept the LONGEST real inventory selector that `raw` starts with AND whose
* remainder is only the display annotation. A bare `startsWith` heals too much:
* `#cta-danger` starts with `#cta`, so prefix-healing would silently rewrite a
* hallucinated sibling into a real selector and bypass the whitelist gate. We
* only heal when what follows the matched selector is whitespace-then-`[tag]`
* (the annotation shape) or nothing — never a selector-continuation character.
*/
const ANNOTATION_TAIL_RE = /^\s+\[[a-z0-9-]+\]/i;

export function coerceSelector(raw: string, valid: Set<string>): string {
const s = raw.trim();
if (valid.has(s)) return s;
let best = "";
for (const v of valid) {
if (!s.startsWith(v) || v.length <= best.length) continue;
const rest = s.slice(v.length);
// `#cta-danger` / `#cta2` have a real-selector remainder → leave untouched so
// the gate rejects them; only annotation junk or pure whitespace heals
if (rest.trim() === "" || ANNOTATION_TAIL_RE.test(rest)) best = v;
}
Comment on lines +68 to +74

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject prefix-only selector matches

If the model returns a different selector whose text merely begins with a valid one, for example an inventory containing #cta and a response of #cta2 or #cta-secondary, this startsWith check rewrites it to #cta and the whitelist accepts it. That bypasses the anti-hallucination retry path and can film/click the wrong element; the heal should only apply when the suffix is the known copied annotation boundary, not any arbitrary prefix.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 00151d5. coerceSelector only heals when the remainder after a valid-selector prefix is the display-annotation shape (whitespace + [tag]) or empty. #cta2 / #cta-secondary / #cta_alt have selector-continuation remainders → returned unchanged → correctly rejected. Regression test added.

return best || s;
}

export function validateAnalysis(raw: unknown, digests: PageDigest[]): AppAnalysis {
const parsed = appAnalysis.parse(raw);
const byPage = new Map(digests.map((d) => [d.url, new Set(d.inventory.map((i) => i.selector))]));
// Models often answer with a relative path ("/setup") instead of the full
// crawled URL. Coerce by pathname match so a correct beat isn't rejected on a
// formatting nit — downstream (script.ts) needs the full crawled URL.
const byPathname = new Map<string, string>();
// formatting nit — downstream (script.ts) needs the full crawled URL. The
// crawler keys pages on pathname+search, so ONE pathname can map to several
// distinct crawled URLs (/results?view=chart vs ?view=table). Track all
// candidates per pathname: a lone candidate coerces; MULTIPLE means a bare
// path is ambiguous and must NOT be silently rewritten onto the wrong page.
const byPathname = new Map<string, string[]>();
for (const d of digests) {
try { byPathname.set(new URL(d.url).pathname.replace(/\/$/, "") || "/", d.url); } catch { /* skip */ }
try {
const key = new URL(d.url).pathname.replace(/\/$/, "") || "/";
const list = byPathname.get(key) ?? [];
if (!list.includes(d.url)) list.push(d.url);
byPathname.set(key, list);
} catch { /* skip */ }
}
for (const moment of parsed.money_moments) {
if (!byPage.has(moment.page_url)) {
let key = moment.page_url;
try { key = new URL(moment.page_url, digests[0]?.url ?? "http://localhost").pathname; } catch { /* keep */ }
const full = byPathname.get((key.replace(/\/$/, "") || "/"));
if (full) moment.page_url = full;
const candidates = byPathname.get((key.replace(/\/$/, "") || "/"));
if (candidates && candidates.length > 1) {
throw new Error(
`money moment "${moment.title}" page_url "${moment.page_url}" is ambiguous — ` +
`${candidates.length} crawled pages share that pathname (${candidates.join(", ")}); ` +
`use the full URL INCLUDING its query string to pick one`,
);
}
if (candidates && candidates.length === 1) moment.page_url = candidates[0]!;
}
const selectors = byPage.get(moment.page_url);
if (!selectors) {
throw new Error(`money moment "${moment.title}" page_url "${moment.page_url}" is not a crawled page`);
}
// heal an appended ` [tag]` annotation in-place before the whitelist check
moment.elements = moment.elements.map((sel) => coerceSelector(sel, selectors));
for (const selector of moment.elements) {
if (!selectors.has(selector)) {
throw new Error(`money moment "${moment.title}" selector "${selector}" is not in the inventory for ${moment.page_url}`);
Expand All @@ -72,49 +125,70 @@ export function validateAnalysis(raw: unknown, digests: PageDigest[]): AppAnalys

function digestText(d: PageDigest): string {
const inv = d.inventory
.map((i) => ` ${i.selector} [${i.tag}] "${redactForPrompt(i.text)}"${i.href ? ` → ${redactForPrompt(i.href)}` : ""}${i.hidden ? " (HIDDEN until revealed)" : ""}`)
.map((i) => ` \`${i.selector}\` [${i.tag}] "${redactForPrompt(i.text)}"${i.href ? ` → ${redactForPrompt(i.href)}` : ""}${i.hidden ? " (HIDDEN until revealed)" : ""}`)
.join("\n");
return `PAGE ${d.url}\ntitle: ${d.title}\nheadings: ${d.headings.join(" | ")}\nelements:\n${inv}`;
// look signal: lets a text-only model ground vibe/music choices in the
// page's actual appearance, not just its copy
const look = d.theme ? `\ntheme: ${d.theme}${d.accentColor ? ` (accent ${d.accentColor})` : ""}` : "";
// title/headings are egress and display-only, so redact them — a secret in a
// page title reaches the provider otherwise. The URL is NOT redacted here: it
// is a validation KEY (scene.entry.url must round-trip exactly against the raw
// crawled URL), so a redacted URL would break the recipe gate. Pages whose URL
// itself carries a secret are dropped upstream in crawlApp, so no token-URL
// reaches this prompt to leak.
const title = redactForPrompt(d.title);
const headings = d.headings.map(redactForPrompt).join(" | ");
return `PAGE ${d.url}\ntitle: ${title}${look}\nheadings: ${headings}\nelements:\n${inv}`;
}

const SYSTEM = `You are the director AND copywriter of a 60-second product launch video (Screen-Studio / ChatGPT-launch style), not a website tour. You study a web product and turn it into a PERSUASIVE STORY with a crystal-clear message: a viewer must understand within seconds what problem it solves and why it's good. Ambiguity is failure.
const SYSTEM = `You are the director AND copywriter of a 60-second product launch video — a polished launch film, not a website tour. You study a web product and turn it into a PERSUASIVE STORY with a crystal-clear message: a viewer must understand within seconds what problem it solves and why it's good. Ambiguity is failure.

Write the story as a problem → solution → payoff arc:
- headline: the HOOK. Open on the customer's PAIN or the promise, in their words — not a feature. ("You run 12 sites. Three bleed cash — which?") This single line must make the whole video unambiguous.
- money_moments (2-4), ordered as the storyboard:
1. hook beat: the first move that starts solving the problem
2. proof beat: the core workflow / differentiator
3. payoff beat: the most visual result — the moment the value lands
- Prefer beats where the UI VISIBLY RESPONDS — a panel switches, results appear, a form confirms. A beat that only points at static content films as dead air.
- For EACH beat write a "caption": ONE short benefit line (≤52 chars) in outcome voice — what the viewer GAINS, never a feature label. "Record a location" is a label (BAD). "Drop in every site in seconds" is a caption (GOOD). "See ranked revenue" is a label (BAD). "Your weakest sites, surfaced instantly" is a caption (GOOD).
- product_name: the brand name for the title/close cards. tagline: the closing line under it.
- music_track: the bundled soundtrack matching the app's LOOK and energy. "pulse" = sleek, minimal tools and dev products; "daybreak" = bright, friendly consumer/marketing SaaS; "midnight" = dark-themed, premium data/infra products; "momentum" = fast, energetic, action-heavy products. Ground the choice in each page's "theme:" line (dark/light + accent), the copy, and the screenshots when provided — a dark dashboard with a bright cheerful track (or the reverse) feels wrong.

Prefer beats with visible payoff (something appears, changes, completes). The "title" field stays a short internal label; the "caption" is the on-screen copy and must be benefit-framed.

Prefer beats with visible payoff (something appears, changes, completes). The "title" field stays a short internal label; the "caption" is the on-screen copy and must be benefit-framed. Respond ONLY with a JSON object matching:
{ "product_summary": string, "product_name": string, "headline": string, "tagline": string, "money_moments": [{ "title": string, "caption": string, "why": string, "page_url": string (one crawled URL), "elements": [selector strings COPIED EXACTLY from the inventory] }] }`;
Each inventory line is: \`<selector>\` [tag] "text". In "elements", copy ONLY the exact text INSIDE the backticks — never the [tag] or the "text" that follows it. Respond ONLY with a JSON object matching:
{ "product_summary": string, "product_name": string, "headline": string, "tagline": string, "music_track": "pulse"|"daybreak"|"midnight"|"momentum", "money_moments": [{ "title": string, "caption": string, "why": string, "page_url": string (one crawled URL), "elements": [selectors copied from between the backticks] }] }`;

export async function analyzeApp(
llm: LlmClient,
digests: PageDigest[],
repoNotes?: string,
): Promise<AppAnalysis> {
const parts: ChatPart[] = [];
parts.push({
const textPart: ChatPart = {
type: "text",
text:
(repoNotes ? `REPO NOTES:\n${repoNotes.slice(0, 4000)}\n\n` : "") +
digests.map(digestText).join("\n\n"),
});
};
const imageParts: ChatPart[] = [];
for (const d of digests) {
if (d.screenshotB64) {
parts.push({ type: "text", text: `screenshot of ${d.url}:` });
parts.push({ type: "image", dataUrl: `data:image/jpeg;base64,${d.screenshotB64}` });
imageParts.push({ type: "text", text: `screenshot of ${d.url}:` });
imageParts.push({ type: "image", dataUrl: `data:image/jpeg;base64,${d.screenshotB64}` });
}
}

let feedback = "";
for (let attempt = 0; attempt < 3; attempt++) {
// Screenshots are sent ONCE, on attempt 0. A schema-retry resends the text
// digest + the corrective feedback but NOT the images — each stateless call
// that re-uploads every JPEG would multiply the vision-token bill for a
// formatting fix the text feedback already pinpoints. Tradeoff: the retry
// reasons from the DOM digest, not the pixels; acceptable because the digest
// carries the selectors/labels a correction needs.
const user: ChatPart[] = feedback
? [...parts, { type: "text", text: `Your previous response was invalid: ${feedback}. Return corrected JSON only.` }]
: parts;
? [textPart, { type: "text", text: `Your previous response was invalid: ${feedback}. Return corrected JSON only.` }]
: [textPart, ...imageParts];
// generous budget: a richer source-seeded crawl (many pages) means a bigger
// prompt AND a bigger response; 4k truncated mid-JSON on real apps
const raw = await llm.chat({ system: SYSTEM, user, json: true, maxTokens: 8000 });
Expand Down
Loading
Loading