Skip to content

refactor(codegen): delete codegen.mjs; outer agent owns script generation#128

Open
ziruihao wants to merge 2 commits into
mainfrom
fix/codegen-strip-llm-preamble
Open

refactor(codegen): delete codegen.mjs; outer agent owns script generation#128
ziruihao wants to merge 2 commits into
mainfrom
fix/codegen-strip-llm-preamble

Conversation

@ziruihao

@ziruihao ziruihao commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

PR #125 introduced scripts/codegen.mjs as a one-shot completion-API pipeline that templates a framework prompt, calls the LLM, writes the emitted message text as the script, verifies, and rewrites on failure. The sub-process boundary turned out to be the wrong contract.

Every bug we shipped fixes for since #125 merged was caused by it:

bug wouldn't exist if outer agent owned codegen
LLM preamble bleeds into .ts file Write({content}) is a structured argument — script bytes never ride the natural-language channel
Shared --out dir package.json collision (70dae51) One agent writes one package.json, deduped at source
pkg-hash install-stamp gate (55e7d4c) Agent re-installs when it knows deps changed
Cross-runner PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD leak Agent sets env when it spawns verify
Parent verify timeout < child runner timeout (8638b7e) One process, no nested timeouts
Stagehand script vanishing pre-upload The agent that decides "stagehand failed" is the same one curling the upload

Verified end-to-end against the 2026-06-04 preview run: 4 of 5 browser/hybrid skills lost their stagehand.ts because the LLM's rewrite output prepended a paragraph of self-narration before the imports. Cache file at /tmp/skill/etsy.com/search-products/autobrowse/codegen-cache/6c78b599d4d5a9d4.txt:

"The error is clear: the previous attempt's output started with explanation text ("Looking at the error...") before the imports, which caused esbuild to fail parsing. The script content must start directly with imports. Here is the complete corrected script:"

…followed by valid imports. tsx chokes on the prose, verify fails, retry loop never converges. The earlier draft of this PR added a stripPreamble defensive boundary; this version removes the boundary problem entirely.

What changes

  • Delete scripts/codegen.mjs (515 lines)
  • Delete codegen/runners/ (tsx-runner.mjs, playwright.mjs, stagehand.mjs)
  • Delete codegen/scaffolds/ — inlined into the new reference docs
  • Move + reframe codegen/prompts/{playwright,stagehand}.mdreferences/codegen/{playwright,stagehand}.md. Technical content (CDP-attach pattern, Stagehand v3 constructor shape, locator priorities, snap convention, JSON stdout contract) is preserved; framing shifts from "you ARE this LLM emitting verbatim" to "here's the spec for the file you're writing".
  • Update SKILL.md's "Generate a runnable script" section to describe the agent-driven loop (Read trace/refs → Write script → Bash verify → iterate or delete on persistent failure).

Net diff: −626 lines.

Companion change

The browse.sh §4b rewrite — replacing the node codegen.mjs --frameworks ... invocation with the inlined Read/Write/Bash loop — ships in a separate browse.sh PR (to follow).

Test plan

  • No external callers in this repo reference the deleted paths
  • Companion browse.sh PR opens
  • After both land + BB_SKILLS_SHA bumps on the preview, re-regenerate the 5 stagehand-missing skills (etsy, google-search-flights, nursys, amazon-search-products, uspto-search-patents) and confirm both playwright.ts and stagehand.ts upload

🤖 Generated with Claude Code


Note

Medium Risk
Large removal of the codegen/verify harness shifts behavior to agent-driven steps; any caller still invoking codegen.mjs or npm run codegen will break until updated (e.g. browse.sh companion PR).

Overview
Removes the standalone codegen pipeline (scripts/codegen.mjs plus runners, scaffolds, and LLM prompt templates) and moves script generation to the outer agent via Read / Write / Bash as documented in SKILL.md.

The old node codegen.mjs flow (cached Anthropic completion, scaffold drop, nested verify runners) is replaced by an explicit loop: read converged trace + strategy.md + references/codegen/<framework>.md, write playwright.ts / stagehand.ts and merged scaffolds, run npm install + npx tsx against a fresh Browserbase session, iterate on stderr or delete broken scripts before upload.

Playwright and Stagehand specs are reframed from “system prompts for a sub-process LLM” to references/codegen/playwright.md and stagehand.md, with inlined package.json / tsconfig guidance (including HTTP-only Playwright, Stagehand skip rules, and shared-dir package.json merge). Net effect is a large deletion (~600+ lines) with workflow ownership shifted to the skill agent; browse.sh’s codegen.mjs invocation is expected to follow in a companion PR.

Reviewed by Cursor Bugbot for commit 7dbe854. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread skills/autobrowse/scripts/codegen.mjs Outdated
…tion

PR #125 introduced scripts/codegen.mjs as a one-shot completion-API
pipeline that templates a framework prompt, calls the LLM, writes the
emitted message text to disk as the script, then verifies and rewrites
on failure. The sub-process boundary turned out to be the wrong contract:

  • Script content rides the model's natural-language output channel,
    so it competes with the model's conversational instincts. The LLM
    keeps prepending self-narration ("The error is clear:", "Here is the
    corrected script:") on the rewrite path, breaking tsx parse — see
    /tmp/skill/etsy.com/search-products/autobrowse/codegen-cache/
    6c78b599d4d5a9d4.txt from the 2026-06-04 preview run.
  • Multi-framework runs into a shared --out dir collide on package.json
    + node_modules (PR #125 fixed this with deep-merge + pkg-hash stamp;
    the bug only existed because of the sub-process split).
  • Runner timeouts and the parent verify timeout had to be hand-aligned
    so the parent doesn't SIGTERM a healthy child mid-install.
  • Trace/strategy/script artifacts get reasoned about in two places
    (codegen.mjs writes scripts, the outer agent's bash uploads them).

All of those classes of bug disappear when the outer agent owns codegen.
It already has the context, the tools (Read/Write/Bash), and the
judgment loop. The Write tool's structured `content` argument means
script bytes never ride the natural-language channel — no preamble bug.
A single agent process means no cross-process timeout coordination, no
deps merging across sub-process invocations, and no separate place to
reason about "this stagehand failed, drop it before upload".

Changes:
- Delete scripts/codegen.mjs (515 lines)
- Delete codegen/runners/ (tsx-runner.mjs, playwright.mjs, stagehand.mjs)
- Delete codegen/scaffolds/ (inlined into the new reference docs)
- Move + reframe codegen/prompts/{playwright,stagehand}.md to
  references/codegen/{playwright,stagehand}.md. The technical content
  (CDP attach pattern, Stagehand v3 constructor shape, locator
  priorities, snap convention, JSON stdout contract) is preserved; what
  changed is framing — these are now reference docs an outer agent
  reads on demand, not completion-API system prompts.
- Update SKILL.md's "Generate a runnable script" section to describe
  the agent-driven loop (Read trace/refs → Write script → Bash verify
  → iterate or delete on persistent failure).

Net diff: -626 lines.

The companion change in browse.sh's §4b system prompt — replacing the
`node codegen.mjs --frameworks ...` invocation with the inlined
Read/Write/Bash loop — lives in a separate PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ziruihao ziruihao force-pushed the fix/codegen-strip-llm-preamble branch from 32393d5 to a768cb2 Compare June 5, 2026 22:09
@ziruihao ziruihao changed the title fix(codegen): strip LLM-introduced preamble before writing scripts refactor(codegen): delete codegen.mjs; outer agent owns script generation Jun 5, 2026

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a768cb2. Configure here.

Comment thread skills/autobrowse/SKILL.md Outdated
The previous SKILL.md step 2 mixed two valid output-dir layouts in one
example — `tasks/<task>/<framework>/<task>.ts` (per-framework subdir,
script named after task) and "flattened upload root" (script named after
framework) — but step 4's verify command only mentioned
`<framework>.ts`. An agent following the per-framework-subdir example
would `Write` `<task>.ts` and then `tsx <framework>.ts` against a file
that doesn't exist.

Make the two shapes explicit, pick one per task, and key steps 4/5/7's
filenames off step 2's choice rather than baking one of the two
conventions into every step. Also fix the trace path to
`run-NNN/{trace.json,unified-events.jsonl}` (matching what
unify-trace.mjs actually writes) instead of just `latest/` — autobrowse
maintains the `latest` symlink but the explicit zero-padded form is
what the rest of the docs use.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants