Stagehand Project

This is a project that uses Stagehand V3, a browser automation framework with AI-powered act, extract, observe, and agent methods.

The main class can be imported as Stagehand from @browserbasehq/stagehand.

Key Classes:

Stagehand: Main orchestrator class providing act, extract, observe, and agent methods
context: A V3Context object that manages browser contexts and pages
page: Individual page objects accessed via stagehand.context.pages()[i] or created with stagehand.context.newPage()

Initialize

import { Stagehand } from "@browserbasehq/stagehand";

const stagehand = new Stagehand({
  env: "LOCAL", // or "BROWSERBASE"
  verbose: 2, // 0, 1, or 2
  model: "openai/gpt-4.1-mini", // or any supported model
});

await stagehand.init();

// Access the browser context and pages
const page = stagehand.context.pages()[0];
const context = stagehand.context;

// Create new pages if needed
const page2 = await stagehand.context.newPage();

Act

Actions are called on the stagehand instance (not the page). Use atomic, specific instructions:

// Act on the current active page
await stagehand.act("click the sign in button");

// Act on a specific page (when you need to target a page that isn't currently active)
await stagehand.act("click the sign in button", { page: page2 });

Important: Act instructions should be atomic and specific:

✅ Good: "Click the sign in button" or "Type 'hello' into the search input"
❌ Bad: "Order me pizza" or "Type in the search bar and hit enter" (multi-step)

Observe + Act Pattern (Recommended)

Cache the results of observe to avoid unexpected DOM changes:

const instruction = "Click the sign in button";

// Get candidate actions
const actions = await stagehand.observe(instruction);

// Execute the first action
await stagehand.act(actions[0]);

To target a specific page:

const actions = await stagehand.observe("select blue as the favorite color", {
  page: page2,
});
await stagehand.act(actions[0], { page: page2 });

Extract

Extract data from pages using natural language instructions. The extract method is called on the stagehand instance.

Basic Extraction (with schema)

import { z } from "zod";

// Extract with explicit schema
const data = await stagehand.extract(
  "extract all apartment listings with prices and addresses",
  z.object({
    listings: z.array(
      z.object({
        price: z.string(),
        address: z.string(),
      }),
    ),
  }),
);

console.log(data.listings);

Simple Extraction (without schema)

// Extract returns a default object with 'extraction' field
const result = await stagehand.extract("extract the sign in button text");

console.log(result);
// Output: { extraction: "Sign in" }

// Or destructure directly
const { extraction } = await stagehand.extract(
  "extract the sign in button text",
);
console.log(extraction); // "Sign in"

Targeted Extraction

Extract data from a specific element using a selector:

const reason = await stagehand.extract(
  "extract the reason why script injection fails",
  z.string(),
  { selector: "/html/body/div[2]/div[3]/iframe/html/body/p[2]" },
);

URL Extraction

When extracting links or URLs, use z.string().url():

const { links } = await stagehand.extract(
  "extract all navigation links",
  z.object({
    links: z.array(z.string().url()),
  }),
);

Extracting from a Specific Page

// Extract from a specific page (when you need to target a page that isn't currently active)
const data = await stagehand.extract(
  "extract the placeholder text on the name field",
  { page: page2 },
);

Observe

Plan actions before executing them. Returns an array of candidate actions:

// Get candidate actions on the current active page
const [action] = await stagehand.observe("Click the sign in button");

// Execute the action
await stagehand.act(action);

Observing on a specific page:

// Target a specific page (when you need to target a page that isn't currently active)
const actions = await stagehand.observe("find the next page button", {
  page: page2,
});
await stagehand.act(actions[0], { page: page2 });

Agent

Use the agent method to autonomously execute complex, multi-step tasks.

Basic Agent Usage

const page = stagehand.context.pages()[0];
await page.goto("https://www.google.com");

const agent = stagehand.agent({
  model: "google/gemini-2.0-flash",
  executionModel: "google/gemini-2.0-flash",
});

const result = await agent.execute({
  instruction: "Search for the stock price of NVDA",
  maxSteps: 20,
});

console.log(result.message);

Computer Use Agent (CUA)

For more advanced scenarios using computer-use models:

const agent = stagehand.agent({
  mode: "cua", // Enable Computer Use Agent mode
  model: "anthropic/claude-sonnet-4-20250514",
  // or "google/gemini-2.5-computer-use-preview-10-2025"
  systemPrompt: `You are a helpful assistant that can use a web browser.
    Do not ask follow up questions, the user will trust your judgement.`,
});

await agent.execute({
  instruction: "Apply for a library card at the San Francisco Public Library",
  maxSteps: 30,
});

Agent with Custom Model Configuration

const agent = stagehand.agent({
  model: {
    modelName: "google/gemini-2.5-computer-use-preview-10-2025",
    apiKey: process.env.GEMINI_API_KEY,
  },
  systemPrompt: `You are a helpful assistant.`,
});

Agent with Integrations (MCP/External Tools)

const agent = stagehand.agent({
  integrations: [`https://mcp.exa.ai/mcp?exaApiKey=${process.env.EXA_API_KEY}`],
  systemPrompt: `You have access to the Exa search tool.`,
});

Agent Hybrid Mode

Hybrid mode uses both DOM-based and coordinate-based tools (act, click, type, dragAndDrop) for visual interactions. This requires experimental: true and models that support reliable coordinate-based actions.

Recommended models for hybrid mode:

google/gemini-3-flash-preview
anthropic/claude-sonnet-4-20250514, anthropic/claude-sonnet-4-5-20250929, anthropic/claude-haiku-4-5-20251001

const stagehand = new Stagehand({
  env: "LOCAL",
  experimental: true, // Required for hybrid mode
});
await stagehand.init();

const agent = stagehand.agent({
  mode: "hybrid",
  model: "google/gemini-3-flash-preview",
});

await agent.execute({
  instruction: "Click the submit button and fill the form",
  maxSteps: 20,
  highlightCursor: true, // Enabled by default in hybrid mode
});

Agent modes:

"dom" (default): Uses DOM-based tools (act, fillForm) - works with any model
"hybrid": Uses both DOM-based and coordinate-based tools (act, click, type, dragAndDrop) - requires grounding-capable models
"cua": Uses Computer Use Agent providers

Advanced Features

DeepLocator (XPath Targeting)

Target specific elements across shadow DOM and iframes:

await page
  .deepLocator("/html/body/div[2]/div[3]/iframe/html/body/p")
  .highlight({
    durationMs: 5000,
    contentColor: { r: 255, g: 0, b: 0 },
  });

Multi-Page Workflows

const page1 = stagehand.context.pages()[0];
await page1.goto("https://example.com");

const page2 = await stagehand.context.newPage();
await page2.goto("https://example2.com");

// Act/extract/observe operate on the current active page by default
// Pass { page } option to target a specific page
await stagehand.act("click button", { page: page1 });
await stagehand.extract("get title", { page: page2 });

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stagehand Project

Initialize

Act

Observe + Act Pattern (Recommended)

Extract

Basic Extraction (with schema)

Simple Extraction (without schema)

Targeted Extraction

URL Extraction

Extracting from a Specific Page

Observe

Agent

Basic Agent Usage

Computer Use Agent (CUA)

Agent with Custom Model Configuration

Agent with Integrations (MCP/External Tools)

Agent Hybrid Mode

Advanced Features

DeepLocator (XPath Targeting)

Multi-Page Workflows

FilesExpand file tree

claude.md

Latest commit

History

claude.md

File metadata and controls

Stagehand Project

Initialize

Act

Observe + Act Pattern (Recommended)

Extract

Basic Extraction (with schema)

Simple Extraction (without schema)

Targeted Extraction

URL Extraction

Extracting from a Specific Page

Observe

Agent

Basic Agent Usage

Computer Use Agent (CUA)

Agent with Custom Model Configuration

Agent with Integrations (MCP/External Tools)

Agent Hybrid Mode

Advanced Features

DeepLocator (XPath Targeting)

Multi-Page Workflows