refactor(codegen): delete codegen.mjs; outer agent owns script generation#128
Open
ziruihao wants to merge 2 commits into
Open
refactor(codegen): delete codegen.mjs; outer agent owns script generation#128ziruihao wants to merge 2 commits into
ziruihao wants to merge 2 commits into
Conversation
shrey150
approved these changes
Jun 5, 2026
…tion PR #125 introduced scripts/codegen.mjs as a one-shot completion-API pipeline that templates a framework prompt, calls the LLM, writes the emitted message text to disk as the script, then verifies and rewrites on failure. The sub-process boundary turned out to be the wrong contract: • Script content rides the model's natural-language output channel, so it competes with the model's conversational instincts. The LLM keeps prepending self-narration ("The error is clear:", "Here is the corrected script:") on the rewrite path, breaking tsx parse — see /tmp/skill/etsy.com/search-products/autobrowse/codegen-cache/ 6c78b599d4d5a9d4.txt from the 2026-06-04 preview run. • Multi-framework runs into a shared --out dir collide on package.json + node_modules (PR #125 fixed this with deep-merge + pkg-hash stamp; the bug only existed because of the sub-process split). • Runner timeouts and the parent verify timeout had to be hand-aligned so the parent doesn't SIGTERM a healthy child mid-install. • Trace/strategy/script artifacts get reasoned about in two places (codegen.mjs writes scripts, the outer agent's bash uploads them). All of those classes of bug disappear when the outer agent owns codegen. It already has the context, the tools (Read/Write/Bash), and the judgment loop. The Write tool's structured `content` argument means script bytes never ride the natural-language channel — no preamble bug. A single agent process means no cross-process timeout coordination, no deps merging across sub-process invocations, and no separate place to reason about "this stagehand failed, drop it before upload". Changes: - Delete scripts/codegen.mjs (515 lines) - Delete codegen/runners/ (tsx-runner.mjs, playwright.mjs, stagehand.mjs) - Delete codegen/scaffolds/ (inlined into the new reference docs) - Move + reframe codegen/prompts/{playwright,stagehand}.md to references/codegen/{playwright,stagehand}.md. The technical content (CDP attach pattern, Stagehand v3 constructor shape, locator priorities, snap convention, JSON stdout contract) is preserved; what changed is framing — these are now reference docs an outer agent reads on demand, not completion-API system prompts. - Update SKILL.md's "Generate a runnable script" section to describe the agent-driven loop (Read trace/refs → Write script → Bash verify → iterate or delete on persistent failure). Net diff: -626 lines. The companion change in browse.sh's §4b system prompt — replacing the `node codegen.mjs --frameworks ...` invocation with the inlined Read/Write/Bash loop — lives in a separate PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
32393d5 to
a768cb2
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a768cb2. Configure here.
The previous SKILL.md step 2 mixed two valid output-dir layouts in one
example — `tasks/<task>/<framework>/<task>.ts` (per-framework subdir,
script named after task) and "flattened upload root" (script named after
framework) — but step 4's verify command only mentioned
`<framework>.ts`. An agent following the per-framework-subdir example
would `Write` `<task>.ts` and then `tsx <framework>.ts` against a file
that doesn't exist.
Make the two shapes explicit, pick one per task, and key steps 4/5/7's
filenames off step 2's choice rather than baking one of the two
conventions into every step. Also fix the trace path to
`run-NNN/{trace.json,unified-events.jsonl}` (matching what
unify-trace.mjs actually writes) instead of just `latest/` — autobrowse
maintains the `latest` symlink but the explicit zero-padded form is
what the rest of the docs use.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
PR #125 introduced
scripts/codegen.mjsas a one-shot completion-API pipeline that templates a framework prompt, calls the LLM, writes the emitted message text as the script, verifies, and rewrites on failure. The sub-process boundary turned out to be the wrong contract.Every bug we shipped fixes for since #125 merged was caused by it:
.tsfileWrite({content})is a structured argument — script bytes never ride the natural-language channel--outdirpackage.jsoncollision (70dae51)package.json, deduped at sourcepkg-hashinstall-stamp gate (55e7d4c)PLAYWRIGHT_SKIP_BROWSER_DOWNLOADleakVerified end-to-end against the 2026-06-04 preview run: 4 of 5 browser/hybrid skills lost their
stagehand.tsbecause the LLM's rewrite output prepended a paragraph of self-narration before the imports. Cache file at/tmp/skill/etsy.com/search-products/autobrowse/codegen-cache/6c78b599d4d5a9d4.txt:…followed by valid imports.
tsxchokes on the prose, verify fails, retry loop never converges. The earlier draft of this PR added astripPreambledefensive boundary; this version removes the boundary problem entirely.What changes
scripts/codegen.mjs(515 lines)codegen/runners/(tsx-runner.mjs,playwright.mjs,stagehand.mjs)codegen/scaffolds/— inlined into the new reference docscodegen/prompts/{playwright,stagehand}.md→references/codegen/{playwright,stagehand}.md. Technical content (CDP-attach pattern, Stagehand v3 constructor shape, locator priorities, snap convention, JSON stdout contract) is preserved; framing shifts from "you ARE this LLM emitting verbatim" to "here's the spec for the file you're writing".SKILL.md's "Generate a runnable script" section to describe the agent-driven loop (Read trace/refs → Write script → Bash verify → iterate or delete on persistent failure).Net diff: −626 lines.
Companion change
The browse.sh §4b rewrite — replacing the
node codegen.mjs --frameworks ...invocation with the inlined Read/Write/Bash loop — ships in a separate browse.sh PR (to follow).Test plan
BB_SKILLS_SHAbumps on the preview, re-regenerate the 5 stagehand-missing skills (etsy, google-search-flights, nursys, amazon-search-products, uspto-search-patents) and confirm bothplaywright.tsandstagehand.tsupload🤖 Generated with Claude Code
Note
Medium Risk
Large removal of the codegen/verify harness shifts behavior to agent-driven steps; any caller still invoking
codegen.mjsornpm run codegenwill break until updated (e.g. browse.sh companion PR).Overview
Removes the standalone codegen pipeline (
scripts/codegen.mjsplus runners, scaffolds, and LLM prompt templates) and moves script generation to the outer agent viaRead/Write/Bashas documented inSKILL.md.The old
node codegen.mjsflow (cached Anthropic completion, scaffold drop, nested verify runners) is replaced by an explicit loop: read converged trace +strategy.md+references/codegen/<framework>.md, writeplaywright.ts/stagehand.tsand merged scaffolds, runnpm install+npx tsxagainst a fresh Browserbase session, iterate on stderr or delete broken scripts before upload.Playwright and Stagehand specs are reframed from “system prompts for a sub-process LLM” to
references/codegen/playwright.mdandstagehand.md, with inlinedpackage.json/tsconfigguidance (including HTTP-only Playwright, Stagehand skip rules, and shared-dirpackage.jsonmerge). Net effect is a large deletion (~600+ lines) with workflow ownership shifted to the skill agent; browse.sh’scodegen.mjsinvocation is expected to follow in a companion PR.Reviewed by Cursor Bugbot for commit 7dbe854. Bugbot is set up for automated code reviews on this repo. Configure here.