refactor(codegen): delete codegen.mjs; outer agent owns script generation by ziruihao · Pull Request #128 · browserbase/skills

ziruihao · 2026-06-04T22:04:54Z

Summary

PR #125 introduced scripts/codegen.mjs as a one-shot completion-API pipeline that templates a framework prompt, calls the LLM, writes the emitted message text as the script, verifies, and rewrites on failure. The sub-process boundary turned out to be the wrong contract.

Every bug we shipped fixes for since #125 merged was caused by it:

bug	wouldn't exist if outer agent owned codegen
LLM preamble bleeds into `.ts` file	`Write({content})` is a structured argument — script bytes never ride the natural-language channel
Shared `--out` dir `package.json` collision (`70dae51`)	One agent writes one `package.json`, deduped at source
`pkg-hash` install-stamp gate (`55e7d4c`)	Agent re-installs when it knows deps changed
Cross-runner `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD` leak	Agent sets env when it spawns verify
Parent verify timeout < child runner timeout (`8638b7e`)	One process, no nested timeouts
Stagehand script vanishing pre-upload	The agent that decides "stagehand failed" is the same one curling the upload

Verified end-to-end against the 2026-06-04 preview run: 4 of 5 browser/hybrid skills lost their stagehand.ts because the LLM's rewrite output prepended a paragraph of self-narration before the imports. Cache file at /tmp/skill/etsy.com/search-products/autobrowse/codegen-cache/6c78b599d4d5a9d4.txt:

"The error is clear: the previous attempt's output started with explanation text ("Looking at the error...") before the imports, which caused esbuild to fail parsing. The script content must start directly with imports. Here is the complete corrected script:"

…followed by valid imports. tsx chokes on the prose, verify fails, retry loop never converges. The earlier draft of this PR added a stripPreamble defensive boundary; this version removes the boundary problem entirely.

What changes

Delete scripts/codegen.mjs (515 lines)
Delete codegen/runners/ (tsx-runner.mjs, playwright.mjs, stagehand.mjs)
Delete codegen/scaffolds/ — inlined into the new reference docs
Move + reframe codegen/prompts/{playwright,stagehand}.md → references/codegen/{playwright,stagehand}.md. Technical content (CDP-attach pattern, Stagehand v3 constructor shape, locator priorities, snap convention, JSON stdout contract) is preserved; framing shifts from "you ARE this LLM emitting verbatim" to "here's the spec for the file you're writing".
Update SKILL.md's "Generate a runnable script" section to describe the agent-driven loop (Read trace/refs → Write script → Bash verify → iterate or delete on persistent failure).

Net diff: −626 lines.

Companion change

The browse.sh §4b rewrite — replacing the node codegen.mjs --frameworks ... invocation with the inlined Read/Write/Bash loop — ships in a separate browse.sh PR (to follow).

Test plan

No external callers in this repo reference the deleted paths
Companion browse.sh PR opens
After both land + BB_SKILLS_SHA bumps on the preview, re-regenerate the 5 stagehand-missing skills (etsy, google-search-flights, nursys, amazon-search-products, uspto-search-patents) and confirm both playwright.ts and stagehand.ts upload

🤖 Generated with Claude Code

Note

Medium Risk
Large removal of the codegen/verify harness shifts behavior to agent-driven steps; any caller still invoking codegen.mjs or npm run codegen will break until updated (e.g. browse.sh companion PR).

Overview
Removes the standalone codegen pipeline (scripts/codegen.mjs plus runners, scaffolds, and LLM prompt templates) and moves script generation to the outer agent via Read / Write / Bash as documented in SKILL.md.

The old node codegen.mjs flow (cached Anthropic completion, scaffold drop, nested verify runners) is replaced by an explicit loop: read converged trace + strategy.md + references/codegen/<framework>.md, write playwright.ts / stagehand.ts and merged scaffolds, run npm install + npx tsx against a fresh Browserbase session, iterate on stderr or delete broken scripts before upload.

Playwright and Stagehand specs are reframed from “system prompts for a sub-process LLM” to references/codegen/playwright.md and stagehand.md, with inlined package.json / tsconfig guidance (including HTTP-only Playwright, Stagehand skip rules, and shared-dir package.json merge). Net effect is a large deletion (~600+ lines) with workflow ownership shifted to the skill agent; browse.sh’s codegen.mjs invocation is expected to follow in a companion PR.

^{Reviewed by Cursor Bugbot for commit 7dbe854. Bugbot is set up for automated code reviews on this repo. Configure here.}

…tion PR #125 introduced scripts/codegen.mjs as a one-shot completion-API pipeline that templates a framework prompt, calls the LLM, writes the emitted message text to disk as the script, then verifies and rewrites on failure. The sub-process boundary turned out to be the wrong contract: • Script content rides the model's natural-language output channel, so it competes with the model's conversational instincts. The LLM keeps prepending self-narration ("The error is clear:", "Here is the corrected script:") on the rewrite path, breaking tsx parse — see /tmp/skill/etsy.com/search-products/autobrowse/codegen-cache/ 6c78b599d4d5a9d4.txt from the 2026-06-04 preview run. • Multi-framework runs into a shared --out dir collide on package.json + node_modules (PR #125 fixed this with deep-merge + pkg-hash stamp; the bug only existed because of the sub-process split). • Runner timeouts and the parent verify timeout had to be hand-aligned so the parent doesn't SIGTERM a healthy child mid-install. • Trace/strategy/script artifacts get reasoned about in two places (codegen.mjs writes scripts, the outer agent's bash uploads them). All of those classes of bug disappear when the outer agent owns codegen. It already has the context, the tools (Read/Write/Bash), and the judgment loop. The Write tool's structured `content` argument means script bytes never ride the natural-language channel — no preamble bug. A single agent process means no cross-process timeout coordination, no deps merging across sub-process invocations, and no separate place to reason about "this stagehand failed, drop it before upload". Changes: - Delete scripts/codegen.mjs (515 lines) - Delete codegen/runners/ (tsx-runner.mjs, playwright.mjs, stagehand.mjs) - Delete codegen/scaffolds/ (inlined into the new reference docs) - Move + reframe codegen/prompts/{playwright,stagehand}.md to references/codegen/{playwright,stagehand}.md. The technical content (CDP attach pattern, Stagehand v3 constructor shape, locator priorities, snap convention, JSON stdout contract) is preserved; what changed is framing — these are now reference docs an outer agent reads on demand, not completion-API system prompts. - Update SKILL.md's "Generate a runnable script" section to describe the agent-driven loop (Read trace/refs → Write script → Bash verify → iterate or delete on persistent failure). Net diff: -626 lines. The companion change in browse.sh's §4b system prompt — replacing the `node codegen.mjs --frameworks ...` invocation with the inlined Read/Write/Bash loop — lives in a separate PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit a768cb2. Configure here.}

The previous SKILL.md step 2 mixed two valid output-dir layouts in one example — `tasks/<task>/<framework>/<task>.ts` (per-framework subdir, script named after task) and "flattened upload root" (script named after framework) — but step 4's verify command only mentioned `<framework>.ts`. An agent following the per-framework-subdir example would `Write` `<task>.ts` and then `tsx <framework>.ts` against a file that doesn't exist. Make the two shapes explicit, pick one per task, and key steps 4/5/7's filenames off step 2's choice rather than baking one of the two conventions into every step. Also fix the trace path to `run-NNN/{trace.json,unified-events.jsonl}` (matching what unify-trace.mjs actually writes) instead of just `latest/` — autobrowse maintains the `latest` symlink but the explicit zero-padded form is what the rest of the docs use. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread skills/autobrowse/scripts/codegen.mjs Outdated

shrey150 approved these changes Jun 5, 2026

View reviewed changes

ziruihao force-pushed the fix/codegen-strip-llm-preamble branch from 32393d5 to a768cb2 Compare June 5, 2026 22:09

ziruihao changed the title ~~fix(codegen): strip LLM-introduced preamble before writing scripts~~ refactor(codegen): delete codegen.mjs; outer agent owns script generation Jun 5, 2026

cursor Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread skills/autobrowse/SKILL.md Outdated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(codegen): delete codegen.mjs; outer agent owns script generation#128

refactor(codegen): delete codegen.mjs; outer agent owns script generation#128
ziruihao wants to merge 2 commits into
mainfrom
fix/codegen-strip-llm-preamble

ziruihao commented Jun 4, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ziruihao commented Jun 4, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changes

Companion change

Test plan

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ziruihao commented Jun 4, 2026 •

edited by cursor Bot

Loading