Add OpenAI Realtime + Browserbase voice agent example#85
Conversation
A voice agent (OpenAI Realtime) talks with the user while a persistent Claude browser agent operates a shared Browserbase session underneath it, so the conversation stays in sync with what the browser is actually doing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 9ab41f8. Configure here.
| } | ||
|
|
||
| for (const item of getFunctionCalls(event)) { | ||
| void handleBrowserFunctionCall(sideband, item); |
There was a problem hiding this comment.
Overlapping browser tool races
High Severity
Each control_browser handler is started with void and is not serialized. A second call can assign a new activeRunId while an earlier handler is still in waitForDemoRunToSettle for the previous run. That waiter then exits immediately or hangs until timeout and may return the wrong snapshot to the voice model.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 9ab41f8. Configure here.
| const browserSession = await browserbase.sessions.create({ | ||
| projectId: browserbaseProjectId, | ||
| keepAlive: true | ||
| }); |
There was a problem hiding this comment.
Browserbase sessions never released
Medium Severity
The demo creates Browserbase sessions with keepAlive: true and stores each demoId in a global in-memory map, but nothing removes entries or ends sessions when voice ends or the tab reloads. Every new client UUID leaves another long-lived remote browser running.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 9ab41f8. Configure here.
| function sendRealtimeEvent(ws: WebSocket, event: Record<string, unknown>) { | ||
| if (ws.readyState !== WebSocket.OPEN) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Tool output dropped if socket closes
Low Severity
The exported DemoStartResponse interface is defined but never imported or referenced anywhere in the example app, so it is dead API surface that can drift from real routes.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 9ab41f8. Configure here.


Summary
Adds
examples/integrations/openai/— a runnable prototype that gives a voice agent access to the whole web.A voice agent (OpenAI Realtime, speech-to-speech) talks with the user. A persistent Claude browser agent operates a real Browserbase session underneath it — opening sites, clicking, reading pages — and remembers the whole call, so the user can refer back ("go back to the first result and compare"). Because the tool call only returns once the browser work has actually happened, and answers are grounded in and quoted from the live page, the spoken conversation stays in sync with what's on screen instead of narrating ahead of it.
It's the OpenAI counterpart to the ElevenLabs example, and is meant to inspire anyone building voice agents — the pattern works with any speech-to-speech runtime in front of a Browserbase-backed browser agent.
How it works
control_browser.navigate/click/type_text/press_key/go_back/read_page) via the Browse CLI, shown live in an iframe.Standalone Next.js app (
pnpm install && pnpm dev, http://127.0.0.1:3002). RequiresOPENAI_API_KEY,ANTHROPIC_API_KEY,BROWSERBASE_API_KEY,BROWSERBASE_PROJECT_ID. Also adds theopenai/entry to the top-level README tree.Type of Change
🤖 Generated with Claude Code
Note
Low Risk
Self-contained example under
examples/integrations/openai/with no changes to shared packages or production code paths.Overview
Adds a new
examples/integrations/openai/Next.js demo and documents it in the root README tree.The app pairs OpenAI Realtime (WebRTC voice +
control_browsertool) with a persistent Claude browser agent on Browserbase, bridged by a server WebSocket sideband that runs instructions and returns grounded page context before the voice model speaks again. The UI shows a live Browserbase iframe, SSE session updates, and a merged voice/browser transcript.Supporting pieces include demo REST routes (
/api/realtime/connect,/api/demo/*), in-memory session state with Browse CLI automation, and env/setup docs (.env.example, integration README).Reviewed by Cursor Bugbot for commit 9ab41f8. Bugbot is set up for automated code reviews on this repo. Configure here.