These checks validate the real Chrome automation path and the optional live Responses API smoke suite. Run the browser steps whenever you touch Chrome automation (lifecycle, cookie sync, prompt injection, Markdown capture, etc.), and run the live API suite before shipping major transport changes.
- macOS with Chrome installed (default profile signed in to ChatGPT Pro).
- Node 22+ and
pnpm installalready completed. - Headful display access (no
--browser-headless). - When debugging, add
--browser-keep-browserso Chrome stays open after Oracle exits, then connect withpnpm exec tsx scripts/browser-tools.ts ...(screenshot, eval, DOM picker, etc.). - Ensure no Chrome instances are force-terminated mid-run; let Oracle clean up once you’re done capturing state.
- Clipboard checks (
browser-tools.ts eval "navigator.clipboard.readText()") trigger a permission dialog in Chrome—approve it for debugging, but remember that we can’t rely on readText in unattended runs.
pnpm test:browser— launches headful Chrome and checks the DevTools endpoint is reachable. SetORACLE_BROWSER_PORT(orORACLE_BROWSER_DEBUG_PORT) to reuse a fixed port when you’ve already opened a firewall rule.
Run this whenever you touch the Gemini web client or the --generate-image / --edit-image plumbing.
Prereqs:
- Chrome profile is signed into
gemini.google.com.
- Generate an image:
pnpm run oracle -- --engine browser --model gemini-3-pro --prompt "a cute robot holding a banana" --generate-image /tmp/gemini-gen.jpg --aspect 1:1 --wait --verbose- Confirm the output file exists and is a real image (
file /tmp/gemini-gen.jpg).
- Confirm the output file exists and is a real image (
- Edit an image:
pnpm run oracle -- --engine browser --model gemini-3-pro --prompt "add sunglasses" --edit-image /tmp/gemini-gen.jpg --output /tmp/gemini-edit.jpg --wait --verbose- Confirm
/tmp/gemini-edit.jpgexists.
- Confirm
Run this whenever you touch the session store, CLI session views, or TUI wiring for multi-model runs.
- Kick off an API multi-run:
pnpm run oracle -- --models "gpt-5.1-pro,gemini-3-pro" --prompt "Compare the moon & sun."- Expect stdout to print sequential sections, one per model (
[gpt-5.1-pro] …followed by[gemini-3-pro] …). No interleaved tokens.
- Expect stdout to print sequential sections, one per model (
- Capture the session ID from the summary line. Run
oracle session --status --model gpt-5.1-pro.- Table should collapse to sessions that include GPT-5.1 Pro and show status icons (✓/⌛/✖) per model.
- Inspect detailed logs:
oracle session <id>- The metadata header now includes a
Models:block with one line per model plus token counts. - When prompted, pick
View gemini-3-pro logand confirm only that model’s stream renders. Refresh should keep completed models intact even if others still run.
- The metadata header now includes a
- Model filter path:
oracle session <id> --model gemini-3-pro- Attach mode should error if that model is missing (double-check by filtering for a bogus model), otherwise it should render the prompt + single-model log only.
Run this when touching session serialization, file IO helpers, or CLI flag plumbing.
ORACLE_LIVE_TEST=1 OPENAI_API_KEY=<real key> pnpm vitest run tests/live/write-output-live.test.ts --runInBand- Expect the test to create a temp
write-output-live.mdfile containingwrite-output e2e.
- Expect the test to create a temp
- Manual spot-check:
oracle --prompt "answer file smoke" --write-output /tmp/out.md --wait- Confirm
/tmp/out.mdexists with the answer text and a trailing newline.
- Confirm
- Multi-model spot-check:
oracle --models "gpt-5.1-pro,gemini-3-pro" --prompt "two files" --write-output /tmp/out.md --wait- Confirm
/tmp/out.gpt-5.1-pro.mdand/tmp/out.gemini-3-pro.mdexist with distinct content.
- Confirm
Before running any agent-driven debugging, you can rely on the TypeScript CLI in scripts/browser-tools.ts:
# Show help / available commands
pnpm tsx scripts/browser-tools.ts --help
# Launch Chrome with your normal profile so you stay logged in
pnpm tsx scripts/browser-tools.ts start --profile
# Drive the active tab
pnpm tsx scripts/browser-tools.ts nav https://example.com
pnpm tsx scripts/browser-tools.ts eval 'document.title'
pnpm tsx scripts/browser-tools.ts screenshot
pnpm tsx scripts/browser-tools.ts pick "Select checkout button"
pnpm tsx scripts/browser-tools.ts cookies
pnpm tsx scripts/browser-tools.ts inspect # show DevTools-enabled Chrome PIDs/ports/tabs
pnpm tsx scripts/browser-tools.ts kill --all --force # tear down straggler DevTools sessionsThis mirrors Mario Zechner’s “What if you don’t need MCP?” technique and is handy when you just need a few quick interactions without spinning up additional tooling.
Debug note: when you have a live ChatGPT tab open under a DevTools port and need a quick DOM dump of the last assistant turn, run pnpm tsx scripts/debug/extract-chatgpt-response.ts <port>.
-
Prompt Submission & Model Switching
- With Chrome signed in and cookie sync enabled, run
pnpm run oracle -- --engine browser --model "GPT-5.2" \ --prompt "Line 1\nLine 2\nLine 3"
- Observe logs for:
Prompt textarea ready (xxx chars queued)(twice: initial + after model switch).Model picker: ... 5.2 ....Clicked send button(or Enter fallback).
- In the attached Chrome window, verify the multi-line prompt appears exactly as sent.
- With Chrome signed in and cookie sync enabled, run
-
Markdown Capture
- Prompt:
pnpm run oracle -- --engine browser --model "GPT-5.2" \ --prompt "Produce a short bullet list with code fencing."
- Expected CLI output:
Answer:section containing bullet list with Markdown preserved (e.g.,- item, fenced code).- Session log (
oracle session <id>) should show the assistant markdown (confirm viagrep -n '```' ~/.oracle/sessions/<id>/output.log).
- Prompt:
-
Stop Button Handling
- Start a long prompt (
"Write a detailed essay about browsers") and once ChatGPT responds, manually click “Stop generating” inside Chrome. - Oracle should detect the assistant message (partial) and still store the markdown.
- Override Flag
- Run with
--browser-allow-cookie-errorswhile intentionally breaking bindings. - Confirm log shows
Cookie sync failed (continuing with override)and the run proceeds headless/logged-out. - Remember: the browser composer now pastes only the user prompt (plus any inline file blocks). If you see the default “You are Oracle…” text or other system-prefixed content in the ChatGPT composer, something regressed in
assembleBrowserPromptand you should stop and file a bug. - Heartbeats: Browser runs do not emit
--heartbeatlogs today. Heartbeat settings apply to streaming API runs only; ignore heartbeat toggles when validating browser mode.
oracle session <id>should replay the transcript with markdown.~/.oracle/sessions/<id>/meta.jsonmust includebrowser.configmetadata (model label, cookie settings) andbrowser.runtime(PID/port).
Document results (pass/fail, session IDs) in PR descriptions so reviewers can audit real-world behavior.
- 2025-11-18 — API gpt-5.1 (
api-smoke-give-two-words): returned “blue sky” in 2.5s. - 2025-11-18 — API gpt-5.1-pro (
api-smoke-pro-three-words): completed in 3m08s with “Fast API verification”. - 2025-11-18 — Browser gpt-5.1 Instant (
browser-smoke-instant-two-words): completed in ~10s; replied with a clarification prompt. - 2025-11-18 — Browser gpt-5.1-pro (
browser-smoke-pro-three-words): completed in ~1m33s; response noted “Search tool used.”. - 2025-11-18 (rerun) — API gpt-5.1 (
api-smoke-give-two-words): reconfirmed OK; same answer + cost bracket. - 2025-11-18 (rerun) — Browser gpt-5.1-pro (
browser-smoke-pro-three-words): reconfirmed OK; included heartbeat progress and search tool note. - 2025-11-20 — Browser gpt-5.1 via
oracle serve(remote host on same Mac): fetched https://example.com; title “Example Domain”; first sentence “This domain is for use in documentation examples without needing permission.” (ran via tmux sessionsoracle-serveandoracle-client).
Run these four smoke tests whenever we touch browser automation:
Fast-path note:
- Tests 1-4 below are quick browser-path checks only. They use
gpt-5.2-instant, which currently targets the ChatGPT Instant 5.3 picker. They are not a substitute for Pro validation.
-
Fast browser simple prompt
pnpm run oracle -- --engine browser --model gpt-5.2-instant --prompt "Return exactly one line and nothing else: pro-ok"
Expect the answer body to containpro-okverbatim on its own line. Note the session ID. -
Fast browser exact-line prompt
pnpm run oracle -- --engine browser --model gpt-5.2-instant --prompt "Return exactly these three lines and nothing else:\n\``js\nconsole.log('thinking-ok')\n```"Confirm the answer includes the fencedjscode block andconsole.log('thinking-ok')` verbatim. -
Fast browser + attachment
Prepare/tmp/browser-md.txtwith a short note, then run
pnpm run oracle -- --engine browser --model gpt-5.2-instant --prompt "Return exactly one line and nothing else: note=<paste the file contents exactly>" --file /tmp/browser-md.txt
Ensure upload logs show “Attachment queued” and the answer containsnote=plus the attached file contents exactly. -
Fast browser + attachment (verbose)
Prepare/tmp/browser-report.txtwith faux metrics, then run
pnpm run oracle -- --engine browser --model gpt-5.2-instant --prompt "Return exactly these two lines and nothing else:\nCPU=<value from file>\nMEMORY=<value from file>" --file /tmp/browser-report.txt --verbose
Verify verbose logs show attachment upload and the final answer contains the exact CPU and memory values from the file.
Run these when the change might affect Pro-specific behavior, long thinking, or reattach.
-
Pro markdown capture
pnpm run oracle -- --engine browser --model gpt-5.4-pro --prompt "Return exactly these three lines and nothing else:\n\``js\nconsole.log('thinking-ok')\n```"Confirm the answer preserves the fencedjs` code block. -
Pro reattach flow
Usescripts/browser-smoke.shor run a manual--browser-keep-browsersession withgpt-5.4-pro, then kill the controller and verifyoracle session <slug> --render-plainstill shows the expected answer.
Record session IDs and outcomes in the PR description (pass/fail, notable delays). This ensures reviewers can audit real runs.
Run this whenever you touch CDP connection logic (remote chrome lifecycle, attachment transfer) or before executing remote sessions in CI.
- Launch a throwaway Chrome instance with remote debugging enabled (adjust the path per OS):
REMOTE_PROFILE=/tmp/oracle-remote-test-profile rm -rf "$REMOTE_PROFILE" "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ --headless=new \ --disable-gpu \ --remote-debugging-port=9333 \ --remote-allow-origins=* \ --user-data-dir="$REMOTE_PROFILE" \ >/tmp/oracle-remote-chrome.log 2>&1 & export REMOTE_CHROME_PID=$! sleep 3
- Run the helper to verify CDP connectivity:
Expect ✓ logs for connection, protocol info, navigation to https://chatgpt.com/, and the final “POC successful!” line.
pnpm tsx scripts/test-remote-chrome.ts localhost 9333
- Tear down the temporary browser:
Use
kill "$REMOTE_CHROME_PID" rm -rf "$REMOTE_PROFILE"
pkill -f oracle-remote-test-profileif Chrome refuses to exit cleanly.
Capture the pass/fail result (include the helper’s log snippet) in your PR description alongside other manual browser tests.
Use this when you need to inspect the live ChatGPT composer (DOM state, markdown text, screenshots, etc.). For smaller ad‑hoc pokes, you can often rely on pnpm tsx scripts/browser-tools.ts … instead.
-
Launch within tmux
tmux new -d -s oracle-browser \\ "pnpm run oracle -- --engine browser --browser-keep-browser \\ --model 'GPT-5.4 Pro' --prompt 'Debug via DevTools.'"
Keeping the run in tmux prevents your shell from blocking and ensures Chrome stays open afterward.
-
Grab the DevTools port
tmux capture-pane -pt oracle-browserto read the logs (Launched Chrome … on port 56663).- Verify the endpoint:
Note the
curl http://127.0.0.1:<PORT>/json/version
webSocketDebuggerUrlfor reference.
-
Attach Chrome DevTools MCP
- One-off:
CHROME_DEVTOOLS_URL=http://127.0.0.1:<PORT> npx -y chrome-devtools-mcp@latest mcporterconfig snippet:{ "chrome-devtools": { "command": "npx", "args": ["-y", "chrome-devtools-mcp@latest", "--browserUrl", "http://127.0.0.1:<PORT>"] } }- Once the server prints
chrome-devtools-mcp exposes…, you can list/call tools viamcporter.
- One-off:
-
Interact & capture
- Use MCP tools (
click,evaluate_js,screenshot, etc.) to debug the composer contents. - Record any manual actions you take (e.g., “fired evaluate_js to dump #prompt-textarea.innerText”).
- Use MCP tools (
-
Cleanup
tmux kill-session -t oracle-browserpkill -f oracle-browser-<slug>if Chrome is still running.
Tip: Running
npx chrome-devtools-mcp@latest --helplists additional switches (custom Chrome binary, headless, viewport, etc.).
These Vitest cases hit the real OpenAI API to exercise both transports:
- Export a real key and explicitly opt in (default runs stay fast):
export OPENAI_API_KEY=sk-... export ORACLE_LIVE_TEST=1 pnpm vitest run tests/live/openai-live.test.ts
- The first two tests target the current fast browser picker path (
gpt-5.2-instantaliasing to Instant 5.3). The later background tests sendgpt-5.4-proandgpt-5.2-proprompts and expect the CLI to stay in background mode until OpenAI finishes (up to 30 minutes). - Watch the console for
Reconnected to OpenAI background response...if you're debugging transport flakiness; the test will fail if the response status isn'tcompletedor if the text doesn't contain the hard-coded smoke strings.
Skip these unless you're intentionally validating the production API; they are
fully gated behind ORACLE_LIVE_TEST=1 to avoid accidental CI runs.