Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,8 @@ test/.tmp/
examples/demo-app/node_modules/
examples/demo-app/.next/
.roast/

# launch working dir + scratch demo app (not part of the package)
demos/
examples/pulse-app/
node_modules
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ node dist/cli/index.js doctor # check Chromium + ffmpeg are installed

> Browser + video need Chromium and ffmpeg: `npx playwright install chromium` and an `ffmpeg` on your PATH.

> Any command accepts `--help` to print its own usage (e.g. `node dist/cli/index.js generate --help`).

### 🤖 Or: let your coding agent set it up

Already living in **Claude Code, Codex, opencode, Cursor, or Cline**? Don't run the
Expand Down
52 changes: 45 additions & 7 deletions src/cli/doctor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -65,16 +65,54 @@ const checks: Check[] = [
{
// render needs the FULL chromium channel (the headless shell has no
// WebCodecs) — a doctor that only checks the shell passes while render
// cannot launch
name: "full chromium (render)",
// cannot launch.
//
// A4: launching is necessary but NOT sufficient — render encodes via the
// in-page WebCodecs VideoEncoder, so actually probe H.264 support here
// rather than punting it to render time (the old check only launched and
// closed, hiding a missing/unsupported codec until 10 min into a run).
name: "Chromium + WebCodecs H.264",
run: async () => {
let server: import("node:http").Server | undefined;
let browser: import("playwright").Browser | undefined;
try {
// import INSIDE the try: a missing/broken playwright must surface as a
// FAILED check (doctor's whole job) — not throw past doctor() to the
// top-level handler, which is exactly the dep-diagnosis path doctor exists for.
const { chromium } = await import("playwright");
const browser = await chromium.launch({ headless: true, channel: "chromium", timeout: 20_000 });
await browser.close();
return { ok: true, detail: "launches (WebCodecs verified at render time)" };
} catch {
return { ok: false, detail: "cannot launch — run `npx playwright install chromium` (installs both)" };
const { createServer } = await import("node:http");
// VideoEncoder is SecureContext-gated, so it's undefined on the opaque
// about:blank origin — evaluating there would falsely FAIL. Probe over a
// real 127.0.0.1 origin, which Chromium treats as a secure context.
server = createServer((_req, res) => res.end("<!doctype html>"));
await new Promise<void>((r) => server!.listen(0, "127.0.0.1", r));
const { port } = server.address() as { port: number };
browser = await chromium.launch({ channel: "chromium", timeout: 20_000 });
const page = await browser.newPage();
await page.goto(`http://127.0.0.1:${port}/`);
const supported = await page.evaluate(async () => {
if (typeof VideoEncoder === "undefined") return false;
const r = await VideoEncoder.isConfigSupported({
codec: "avc1.640028",
width: 1920,
height: 1080,
bitrate: 8_000_000,
framerate: 60,
});
return !!r.supported;
});
return supported
? { ok: true, detail: "ok" }
: { ok: false, detail: "FAIL — Chromium launched but WebCodecs H.264 (avc1.640028) is unsupported" };
} catch (err) {
return {
ok: false,
detail: `FAIL — ${err instanceof Error ? err.message : String(err)} (run \`npx playwright install chromium\`)`,
};
} finally {
// always release the browser + server, even if import/launch threw mid-way
await browser?.close().catch(() => {});
if (server) await new Promise<void>((r) => server!.close(() => r()));
}
},
},
Expand Down
58 changes: 48 additions & 10 deletions src/cli/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import { doctor } from "./doctor.js";
/**
* supercut — point it at your app, get the supercut.
*
* supercut generate --url <app> [--repo <path>] [--config <file>] full pipeline
* supercut generate --url <app> [--repo <path>] full pipeline
* supercut record --recipe <file> [--out <dir>] [--seed <n>] stage 3 only
* supercut render --take <dir> [--out <mp4>] [--bg <stage>] stage 5 only
* supercut doctor check deps
Expand All @@ -14,7 +14,7 @@ import { doctor } from "./doctor.js";
const HELP = `supercut — institutional-grade 60s launch videos from your real app

Usage:
supercut generate --url <running app URL> [--repo <path>] [--config <file>]
supercut generate --url <running app URL> [--repo <path>]
supercut record --recipe <recipe.json> [--out <dir>] [--seed <n>]
supercut render --take <dir> [--out <file.mp4>] [--bg aurora|midnight|dusk|paper|<asset>|<image>]
supercut doctor
Expand All @@ -26,8 +26,20 @@ async function main(): Promise<number> {

switch (command) {
case "doctor":

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doctor --help still runs the dependency checks instead of printing usage, despite the top-level help/README saying any command accepts --help. This can take seconds and can exit non-zero on a machine missing ffmpeg/Chromium. Please intercept rest.includes('--help') || rest.includes('-h') for doctor too, or narrow the docs to only generate/record/render.

if (rest.includes("--help") || rest.includes("-h")) {
console.log("usage: supercut doctor (checks ffmpeg + Chromium/WebCodecs H.264 — takes no flags)");
return 0;
}
return doctor();
case "record": {
const recordUsage =
"usage: supercut record --recipe <recipe.json> [--out <dir>] [--seed <n>] [--block-private-network]";
// A1: subcommands advertised "--help" but strict parseArgs would throw on
// it — intercept before parsing and print this command's usage.
if (rest.includes("--help") || rest.includes("-h")) {
console.log(recordUsage);
return 0;
}
const { values } = parseArgs({
args: rest,
options: {
Expand All @@ -39,9 +51,16 @@ async function main(): Promise<number> {
},
});
if (!values.recipe) {
console.error("usage: supercut record --recipe <recipe.json> [--out <dir>] [--seed <n>] [--block-private-network]");
console.error(recordUsage);
return 1;
}
// A1: --allow-private-network is parsed for back-compat but ignored;
// warn that it no longer does anything so callers don't rely on it.
if (values["allow-private-network"]) {
console.error(
"--allow-private-network is deprecated and ignored; private/localhost is allowed by default — use --block-private-network to restrict",
);
}
const { readFileSync } = await import("node:fs");
const { parseRecipe } = await import("../schema/index.js");
const { record } = await import("../capture/index.js");
Expand All @@ -64,6 +83,15 @@ async function main(): Promise<number> {
return res.aborted ? 1 : 0;
}
case "render": {
const renderUsage =
"usage: supercut render --take <take dir from record> [--out <file.mp4>] " +
"[--bg aurora|midnight|dusk|paper|<image path>]";
// A1: print this command's usage on --help instead of letting strict
// parseArgs throw on the unknown flag.
if (rest.includes("--help") || rest.includes("-h")) {
console.log(renderUsage);
return 0;
}
const { values } = parseArgs({
args: rest,
options: {
Expand All @@ -73,10 +101,7 @@ async function main(): Promise<number> {
},
});
if (!values.take) {
console.error(
"usage: supercut render --take <take dir from record> [--out <file.mp4>] " +
"[--bg aurora|midnight|dusk|paper|<image path>]",
);
console.error(renderUsage);
return 1;
}
const { renderTake } = await import("../render/index.js");
Expand All @@ -94,6 +119,15 @@ async function main(): Promise<number> {
return 0;
}
case "generate": {
const generateUsage =
"usage: supercut generate --url <running app URL> [--repo <path>] [--out <dir>] " +
"[--bg <stage>] [--seed <n>] [--model <id>] [--env-file <file>] [--block-private-network] [--allow-destructive] [--no-vision]";
// A1: print this command's usage on --help instead of letting strict
// parseArgs throw on the unknown flag.
if (rest.includes("--help") || rest.includes("-h")) {
console.log(generateUsage);
return 0;
}
const { values } = parseArgs({
args: rest,
options: {
Expand All @@ -120,11 +154,15 @@ async function main(): Promise<number> {
},
});
if (!values.url) {
console.error(generateUsage);
return 1;
}
// A1: --allow-private-network is parsed for back-compat but ignored;
// warn that it no longer does anything so callers don't rely on it.
if (values["allow-private-network"]) {
console.error(
"usage: supercut generate --url <running app URL> [--repo <path>] [--out <dir>] " +
"[--bg <stage>] [--seed <n>] [--model <id>] [--env-file <file>] [--block-private-network] [--allow-destructive] [--no-vision]",
"--allow-private-network is deprecated and ignored; private/localhost is allowed by default — use --block-private-network to restrict",
);
return 1;
}
const { loadDotEnv, resolveProvider } = await import("../director/config.js");
const { generate } = await import("../director/generate.js");
Expand Down
20 changes: 13 additions & 7 deletions src/director/inventory.ts
Original file line number Diff line number Diff line change
Expand Up @@ -58,14 +58,20 @@ const cssEscape = (s: string) => s.replace(/["\\]/g, "\\$&");
* non-destructive actions (Sign in, Submit, Add, Save, Open, View, Create,
* Next, Continue) do NOT match.
*/
// Deliberately NARROW: only genuinely irreversible / data-destroying /
// money-committing verbs. We do NOT match common, reversible, or hero-action
// words (send, remove, reset, disable, archive, unsubscribe, transfer) — those
// are exactly the core interactions a launch video exists to show, and silently
// dropping a chat app's "Send" or a list's "Remove" would gut the demo. The set
// here is "things you'd almost never want filmed and that can't be undone."
// Deliberately NARROW: only clearly-irreversible / high-blast-radius verbs that
// are almost never a legitimate demo "money moment". We still do NOT match
// common, reversible, or hero-action words (send, remove, reset, disable,
// archive, unsubscribe, save, submit, search) — those are exactly the core
// interactions a launch video exists to show, and silently dropping a chat
// app's "Send" or a list's "Remove" would gut the demo.
// B4 (review): broadened conservatively with publish, transfer, regenerate,
// suspend, terminate, downgrade — each is irreversible or high blast-radius
// (goes public, moves money/ownership, throws away generated state, kills
// access/an account, drops a paid tier) and would almost never be the action
// you intend to film. Criterion: add a verb ONLY if firing it by accident on a
// live app is genuinely costly AND it is rarely the intended payoff beat.
export const DESTRUCTIVE_RE =
/\b(delete|deactivate|wipe|erase|destroy|cancel\s+(subscription|account|plan)|pay|purchase|buy\s+now|checkout|place\s+order|withdraw|confirm\s+(payment|order)|revoke)\b/i;
/\b(delete|deactivate|wipe|erase|destroy|cancel\s+(subscription|account|plan)|pay|purchase|buy\s+now|checkout|place\s+order|withdraw|confirm\s+(payment|order)|revoke|publish|transfer\s+(funds|money|ownership|account|domain)|regenerate|suspend|terminate|downgrade)\b/i;

// links the crawler must NOT navigate to: file downloads (PDF/zip/images/docs),
// and non-http protocols. Navigating to a PDF triggers a download that crashes
Expand Down
10 changes: 7 additions & 3 deletions src/director/llm.ts
Original file line number Diff line number Diff line change
Expand Up @@ -118,14 +118,18 @@ export class OpenAICompatibleClient implements LlmClient {
if (!text) throw new Error(`LLM returned an empty response (${this.label})`);
return text;
}
// A2: drain the body, but the raw provider response can echo prompt text
// or account metadata. Only surface it when SUPERCUT_VERBOSE is set;
// otherwise keep status + provider label (+ auth hint) and omit the body.
const snippet = (await res.text()).slice(0, 300);
const detail = process.env.SUPERCUT_VERBOSE ? ` ${snippet}` : "";
if (res.status === 401 || res.status === 403) {
throw new Error(`LLM auth failed (${res.status}, ${this.label}) — check your API key. ${snippet}`);
throw new Error(`LLM auth failed (${res.status}, ${this.label}) — check your API key.${detail}`);
}
if (res.status !== 429 && res.status < 500) {
throw new Error(`LLM request rejected (${res.status}, ${this.label}): ${snippet}`);
throw new Error(`LLM request rejected (${res.status}, ${this.label}):${detail}`);
}
lastErr = `${res.status}: ${snippet}`;
lastErr = `${res.status}:${detail}`;
await new Promise((r) => setTimeout(r, 1500 * (attempt + 1)));
}
throw new Error(`LLM unavailable after 4 attempts (${this.label}): ${lastErr}`);
Expand Down
57 changes: 47 additions & 10 deletions src/director/qc.ts
Original file line number Diff line number Diff line change
Expand Up @@ -98,14 +98,14 @@ async function frameJpegB64(takeDir: string, t: number): Promise<string | null>
}
}

const SYSTEM = `You are the quality judge for a cinematic product launch video. For each scene you get a captured frame at its key interaction moment. Judge ONLY:
const SYSTEM = `You are the quality judge for a cinematic product launch video. For each scene you get SEVERAL captured frames sampled across the scene (its key interaction moment, a mid point, and its final hold). Judge the scene across ALL of its frames. Judge ONLY:
- is the interaction's payoff visible (did something happen)?
- is there an error page, blank screen, overlay, or cookie banner ruining the shot?
- is there an error page, blank screen, overlay, or cookie banner ruining the shot — in ANY of the frames?
- does the scene need a longer hold to land (slow content)?
Respond ONLY with JSON: { "verdicts": [{ "scene": string, "verdict": "ok"|"patch"|"cut", "reason": string, "patch": { "hold_ms"?: int } }] }
Rules: "cut" only for ruined shots (error/blank/banner). "patch" with hold_ms 400-2000 for shots that need breathing room. Otherwise "ok". One verdict per scene, scene names exactly as given.`;
Rules: if ANY sampled frame is an error page, blank/empty screen, or shows a banner ruining the shot, prefer "cut" (a late error still ruins the clip). "patch" with hold_ms 400-2000 for shots that need breathing room. Otherwise "ok". One verdict per scene, scene names exactly as given.`;

/** Layer (b): vision QC on the event frame of each scene. */
/** Layer (b): vision QC on multiple frames per scene. */
export async function visionQc(
llm: LlmClient,
takeDir: string,
Expand All @@ -115,20 +115,57 @@ export async function visionQc(
const parts: ChatPart[] = [];
const sceneNames: string[] = [];

// the take's last CAPTURED frame time. Capture keeps emitting frames through
// hold_ms without emitting any event, so the final scene's hold must be
// sampled against the last frame, not the last event (else a late blank/error
// during a closing hold is missed). Fall back to event time if no index.
let lastFrameT = 0;
try {
const idx = JSON.parse(readFileSync(join(takeDir, "frames-index.json"), "utf8")) as { t_source: number }[];
lastFrameT = idx.reduce((m, e) => Math.max(m, e.t_source), 0);
} catch {
/* no frame index — final scene falls back to last event time below */
}

for (let i = 0; i < scenes.length; i++) {
const s = scenes[i]!;
if (s.type !== "scene") continue;
const end = i + 1 < scenes.length ? scenes[i + 1]!.t : Infinity;
const firstInteraction = log.events.find(
(e) => (e.type === "click" || e.type === "hover" || e.type === "type") && e.t >= s.t && e.t < end,
);
// judge the moment AFTER the payoff, not the moment of the click
const judgeT = (firstInteraction?.t ?? s.t) + 800;
const b64 = await frameJpegB64(takeDir, judgeT);
if (!b64) continue;
// B6 (review): one frame per scene let LATE errors (a result that errors out
// after the click, a modal that pops during the hold) pass QC. Sample up to
// 3 frames per scene — the key moment (after the payoff), a mid frame, and
// the scene's final hold frame — so a late blank/error is caught. Capped at
// 3 to bound vision token cost.
const keyT = (firstInteraction?.t ?? s.t) + 800;
// the last frame we can attribute to this scene; for the final scene `end`
// is Infinity, so fall back to the take's last captured frame time.
const lastEventT = log.events.reduce((m, e) => Math.max(m, e.t), s.t);
// final scene: end at the last captured FRAME (covers the hold), not the
// last event — see lastFrameT note above.
const sceneEndT = end === Infinity ? Math.max(lastFrameT, lastEventT) : end;
const holdT = Math.max(keyT, sceneEndT - 200); // just inside the final hold
Comment on lines +145 to +149

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Sample the final scene using the last captured frame

When this is the last scene, there is no following scene marker, and lastEventT is only the maximum event timestamp, not the take's last captured frame time. The recorder does not emit an event at the end of an action or at hold_ms, so a closing scene with a long action/hold still samples around firstInteraction + 800ms and can miss the late blank/error/overlay this change is meant to catch.

Useful? React with 👍 / 👎.

const midT = (keyT + holdT) / 2;
// de-dupe near-identical sample times (short scenes collapse to one frame)
const sampleTs = [keyT, midT, holdT].filter(
(t, idx, arr) => arr.findIndex((u) => Math.abs(u - t) < 200) === idx,
);

const labels = ["its key moment", "mid-scene", "its final hold"];
const sceneParts: ChatPart[] = [];
for (let k = 0; k < sampleTs.length; k++) {
const b64 = await frameJpegB64(takeDir, sampleTs[k]!);
if (!b64) continue;
const label = sampleTs.length === 1 ? "its key moment" : (labels[k] ?? "another moment");
sceneParts.push({ type: "text", text: `scene "${s.name}" — ${label}:` });
sceneParts.push({ type: "image", dataUrl: `data:image/jpeg;base64,${b64}` });
}
// need at least one real frame to judge the scene at all
if (sceneParts.length === 0) continue;
sceneNames.push(s.name);
parts.push({ type: "text", text: `scene "${s.name}" at its key moment:` });
parts.push({ type: "image", dataUrl: `data:image/jpeg;base64,${b64}` });
parts.push(...sceneParts);
}
if (sceneNames.length === 0) return [];

Expand Down
Loading
Loading