Skip to content

Add OpenAI Realtime + Browserbase voice agent example#85

Open
shubh24 wants to merge 1 commit into
mainfrom
shubh24/openai-integration
Open

Add OpenAI Realtime + Browserbase voice agent example#85
shubh24 wants to merge 1 commit into
mainfrom
shubh24/openai-integration

Conversation

@shubh24

@shubh24 shubh24 commented Jun 10, 2026

Copy link
Copy Markdown

Summary

Adds examples/integrations/openai/ — a runnable prototype that gives a voice agent access to the whole web.

A voice agent (OpenAI Realtime, speech-to-speech) talks with the user. A persistent Claude browser agent operates a real Browserbase session underneath it — opening sites, clicking, reading pages — and remembers the whole call, so the user can refer back ("go back to the first result and compare"). Because the tool call only returns once the browser work has actually happened, and answers are grounded in and quoted from the live page, the spoken conversation stays in sync with what's on screen instead of narrating ahead of it.

It's the OpenAI counterpart to the ElevenLabs example, and is meant to inspire anyone building voice agents — the pattern works with any speech-to-speech runtime in front of a Browserbase-backed browser agent.

How it works

  • Voice plane — browser ↔ OpenAI Realtime over WebRTC; the voice agent has one tool, control_browser.
  • Server bridge — the connect route creates the Realtime call, then attaches a server-side WebSocket to the same call to answer tool calls and speak results back in the same conversation.
  • Browser plane — one persistent Claude agent per call drives the Browserbase session through compact tools (navigate/click/type_text/press_key/go_back/read_page) via the Browse CLI, shown live in an iframe.

Standalone Next.js app (pnpm install && pnpm dev, http://127.0.0.1:3002). Requires OPENAI_API_KEY, ANTHROPIC_API_KEY, BROWSERBASE_API_KEY, BROWSERBASE_PROJECT_ID. Also adds the openai/ entry to the top-level README tree.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Refactoring (no functional changes)

🤖 Generated with Claude Code


Note

Low Risk
Self-contained example under examples/integrations/openai/ with no changes to shared packages or production code paths.

Overview
Adds a new examples/integrations/openai/ Next.js demo and documents it in the root README tree.

The app pairs OpenAI Realtime (WebRTC voice + control_browser tool) with a persistent Claude browser agent on Browserbase, bridged by a server WebSocket sideband that runs instructions and returns grounded page context before the voice model speaks again. The UI shows a live Browserbase iframe, SSE session updates, and a merged voice/browser transcript.

Supporting pieces include demo REST routes (/api/realtime/connect, /api/demo/*), in-memory session state with Browse CLI automation, and env/setup docs (.env.example, integration README).

Reviewed by Cursor Bugbot for commit 9ab41f8. Bugbot is set up for automated code reviews on this repo. Configure here.

A voice agent (OpenAI Realtime) talks with the user while a persistent
Claude browser agent operates a shared Browserbase session underneath it,
so the conversation stays in sync with what the browser is actually doing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shubh24 shubh24 requested a review from a team as a code owner June 10, 2026 01:41

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 9ab41f8. Configure here.

}

for (const item of getFunctionCalls(event)) {
void handleBrowserFunctionCall(sideband, item);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overlapping browser tool races

High Severity

Each control_browser handler is started with void and is not serialized. A second call can assign a new activeRunId while an earlier handler is still in waitForDemoRunToSettle for the previous run. That waiter then exits immediately or hangs until timeout and may return the wrong snapshot to the voice model.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 9ab41f8. Configure here.

const browserSession = await browserbase.sessions.create({
projectId: browserbaseProjectId,
keepAlive: true
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Browserbase sessions never released

Medium Severity

The demo creates Browserbase sessions with keepAlive: true and stores each demoId in a global in-memory map, but nothing removes entries or ends sessions when voice ends or the tab reloads. Every new client UUID leaves another long-lived remote browser running.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 9ab41f8. Configure here.

function sendRealtimeEvent(ws: WebSocket, event: Record<string, unknown>) {
if (ws.readyState !== WebSocket.OPEN) {
return;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tool output dropped if socket closes

Low Severity

The exported DemoStartResponse interface is defined but never imported or referenced anywhere in the example app, so it is dead API surface that can drift from real routes.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 9ab41f8. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant