Skip to content

esokullu/webbrain

Β 
Β 

Repository files navigation

WebBrain

Lang Lang Lang License

Claude Chrome vs WebBrain

Open-source AI browser agent for Chrome and Firefox. Chat with any web page, automate browser tasks, and run multi-step agent workflows β€” powered by your choice of LLM.

Features

  • Page Reading β€” Extracts text, links, forms, tables, and interactive elements from any page
  • Browser Actions β€” Click, type, scroll, navigate, and interact with page elements
  • Ask / Act Modes β€” Read-only mode by default, full agent mode with confirmation
  • Multi-Step Agent β€” Autonomous task execution with tool-use loops (configurable, default 130 steps)
  • Continue from Limit β€” When the agent hits the step limit, click Continue to keep going
  • Multi-Provider LLM β€” Supports local and cloud models:
    • llama.cpp (local, default) β€” No API key needed. Also Ollama and LM Studio
    • OpenAI (GPT-5.5, etc.)
    • Anthropic Claude (native API)
    • Google Gemini, Mistral AI, DeepSeek, xAI Grok, Groq
    • MiniMax, Alibaba Cloud (Qwen)
    • Nvidia NIM
    • OpenRouter (default model: stepfun/step-3.7-flash; access 100+ models)
  • Onboarding Wizard β€” First-launch walkthrough covering Act mode safety and provider setup
  • Side Panel UI β€” Clean chat interface that lives alongside your browsing
  • Per-Tab Conversations β€” Each tab has its own chat history
  • Streaming β€” Real-time token streaming from all providers
  • Smart Context β€” Token-aware auto-compaction (summarizes older turns once the conversation nears the model's context window, with a visible "Context automatically compacted" notice), tool result limits, and emergency overflow recovery
  • Copy Support β€” Copy buttons on code blocks and full messages
  • Page Inspection Banner β€” Visual indicator when the agent is interacting with the page
  • Stop Button β€” Abort the agent mid-execution at any time
  • Deterministic Act Mode β€” Act mode uses temperature 0.15 for browser-control decisions; Ask mode uses 0.3, and dedicated vision screenshot descriptions use 0

Quick Start

Chrome

git clone https://github.com/esokullu/webbrain.git
  1. Open Chrome β†’ chrome://extensions/
  2. Enable Developer mode (top right)
  3. Click Load unpacked β†’ select the webbrain folder

Firefox

git clone https://github.com/esokullu/webbrain.git
  1. Open Firefox β†’ about:debugging#/runtime/this-firefox
  2. Click Load Temporary Add-on
  3. Navigate to src/firefox/ and select manifest.json

Note: Temporary add-ons are removed when Firefox restarts. For permanent installation, the extension needs to be signed via addons.mozilla.org.

Start a local LLM (default)

# Using llama.cpp
llama-server -m your-model.gguf --port 8080

# Or using Ollama (OpenAI-compatible)
ollama serve
# Then set base URL to http://localhost:11434/v1 in settings

Context window: For reliable agent runs, load a local model with at least a 16k-token context window (the usable minimum). 8k can work with Compact mode enabled (Settings β†’ per-provider checkbox); 4k is too small to hold the system prompt + tool schemas. WebBrain auto-compacts the conversation as it nears the window β€” it assumes 16k for local models unless you set an explicit context size, so give the model server (e.g. llama-server -c 16384) enough room.

Use it

Click the WebBrain icon β†’ the side panel opens. Type a message like:

  • "Summarize this page"
  • "Find all links about pricing"
  • "Fill in the search box with 'AI agents' and click Search"
  • "Navigate to github.com and find trending repositories"

Configuration

Click the gear icon or go to the extension's Options page to configure:

Display Settings:

  • Verbose Mode β€” Show full tool call JSON (off by default)
  • Screenshot Fallback β€” Use screenshots when DOM reading fails
  • Max Agent Steps β€” Configurable step limit (5-200, default 60)

Providers:

Provider Base URL API Key Default Model
llama.cpp http://localhost:8080 Not needed (your loaded model)
Ollama http://localhost:11434/v1 Not needed (your loaded model)
LM Studio http://localhost:1234/v1 Not needed (your loaded model)
OpenAI https://api.openai.com/v1 Required gpt-5.5
Anthropic Claude https://api.anthropic.com Required claude-sonnet-4-6
Google Gemini https://generativelanguage.googleapis.com/v1beta/openai Required gemini-3.1-flash
Mistral AI https://api.mistral.ai/v1 Required mistral-large-latest
DeepSeek https://api.deepseek.com/v1 Required deepseek-v4-flash
xAI Grok https://api.x.ai/v1 Required grok-4.3
Nvidia NIM https://integrate.api.nvidia.com/v1 Required meta/llama-3.1-8b-instruct
Groq https://api.groq.com/openai/v1 Required llama-3.3-70b-versatile
MiniMax https://api.minimax.chat/v1 Required minimax-m2.7
Alibaba Cloud (Qwen) https://dashscope.aliyuncs.com/compatible-mode/v1 Required qwen-max
OpenRouter https://openrouter.ai/api/v1 Required stepfun/step-3.7-flash

Architecture

src/chrome/                        src/firefox/
β”œβ”€β”€ manifest.json (MV3)            β”œβ”€β”€ manifest.json (MV2)
β”œβ”€β”€ src/                           β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ background.js              β”‚   β”œβ”€β”€ background.js (+ background.html)
β”‚   β”œβ”€β”€ agent/                     β”‚   β”œβ”€β”€ agent/
β”‚   β”œβ”€β”€ content/                   β”‚   β”œβ”€β”€ content/
β”‚   β”œβ”€β”€ providers/                 β”‚   β”œβ”€β”€ providers/
β”‚   β”œβ”€β”€ network/                   β”‚   β”œβ”€β”€ network/
β”‚   β”œβ”€β”€ trace/                     β”‚   β”œβ”€β”€ trace/
β”‚   β”œβ”€β”€ ui/                        β”‚   └── ui/
β”‚   └── offscreen/                 β”œβ”€β”€ styles/
β”œβ”€β”€ styles/                        β”œβ”€β”€ icons/
└── icons/                         └── LICENSE

web/
β”œβ”€β”€ index.html
β”œβ”€β”€ privacy.html
└── vercel.json

Key difference: Chrome uses Manifest V3 (service worker, chrome.scripting, sidePanel API), Firefox uses Manifest V2 (background page, browser.tabs.executeScript, sidebar_action).

Deeper docs live in docs/: architecture, site adapters, providers and models, security model, prompt-injection defense, privacy and data flow, accessibility tree and refs, localization, adding a tool, and test scenarios.

Agent Tools

Tool Ask Act Compact Description
get_accessibility_tree Yes Yes Yes Flat indented text of the page's accessibility tree with persistent ref_ids
read_page Yes Yes Yes Extract page text, links, forms (legacy prose fallback)
read_pdf Yes Yes -- Extract text from PDF documents via vendored pdfjs-dist
screenshot Yes Yes Yes Capture visible tab (with optional save:true to Downloads)
full_page_screenshot Yes Yes -- Capture full scrollable page (Chrome only)
get_interactive_elements Yes Yes -- List all clickable/interactive elements (legacy, pierces shadow DOM)
get_frames Yes Yes -- List all iframes on the page
get_shadow_dom Yes Yes -- Read shadow DOM trees
scroll Yes Yes Yes Scroll the page
extract_data Yes Yes Yes Extract tables, headings, images
get_selection Yes Yes Yes Get highlighted text
click_ax -- Yes Yes Click an element by accessibility tree ref_id (preferred)
type_ax -- Yes Yes Type into a field by ref_id. Supports lang: "tr-deasciify"
set_field -- Yes Yes One-shot focus + clear + type + verify by ref_id. Supports lang: "tr-deasciify"
click -- Yes Yes Click elements by selector, index, or coordinates (legacy fallback)
type_text -- Yes Yes Type into input fields. Supports lang: "tr-deasciify"
press_keys -- Yes Yes Press Escape, Tab, or Enter
hover -- Yes -- CDP-trusted hover for reveal-on-hover menus (Chrome only)
drag_drop -- Yes -- Drag-and-drop via CDP pointer events (Chrome only)
navigate -- Yes Yes Go to a URL
new_tab -- Yes Yes Open a new tab
wait_for_element -- Yes Yes Wait for a selector to appear
wait_for_stable -- Yes -- Wait until page is idle (no DOM mutations + no network)
upload_file -- Yes -- Upload a file to a file input (Chrome only)
execute_js -- Yes -- Run custom JavaScript (Firefox only β€” blocked by MV3 CSP on Chrome)
fetch_url Yes Yes Yes Fetch a URL from the background with the user's cookies
research_url Yes Yes -- Open a URL in a hidden tab, wait for JS rendering, return content
download_files -- Yes -- Download one or more files (single url or array, max 3 concurrent)
download_resource_from_page -- Yes -- Download an <img>/<video>/blob URL from the current page
download_social_media -- Yes Yes One-shot social media download; DOM/CDN first, optional visible-media vision crop fallback
list_downloads Yes Yes -- List recent downloads with status and source URLs
read_downloaded_file -- Yes -- Re-fetch a downloaded file's content (text or base64)
iframe_read / iframe_click / iframe_type -- Yes -- Read/click/type inside cross-origin iframes
record_tab / stop_recording -- Yes -- Record tab video+audio into .webm with optional Whisper transcription (Chrome only)
scratchpad_write Yes Yes Yes Pin a note in context that survives summarization
clarify Yes Yes Yes Pause and ask the user a question
verify_form -- Yes -- Verify form fields before submitting
solve_captcha -- Yes Yes Solve CAPTCHAs via CapSolver API (optional, requires API key)
done Yes Yes Yes Signal task completion

Compact mode is a reduced tool set + shorter system prompt designed for small local models (2B-8B). In both Chrome and Firefox builds, it cuts the Act-mode schema from 40+ tools to about 20, reducing decision surface and hallucination. Enable it per-provider in Settings (checkbox on llama.cpp, Ollama, LM Studio; off by default).

Shadow DOM note: The accessibility tree only traverses light DOM. On Web Component-heavy pages (Stripe, Salesforce, Shopify), use get_interactive_elements (pierces open shadow roots) or get_shadow_dom / shadow_dom_query for targeted reads.

LM Studio plugin

The fetch_url and research_url tools also ship as a standalone LM Studio plugin at webbrain/web-tools, for users who want web-fetching tool-use inside LM Studio chats without running the full browser extension. Pure Node, no headless browser.

lms clone webbrain/web-tools

Source: lmstudio-plugin/.

Slash Commands

WebBrain accepts slash commands as the first thing on a line in the input box. Type /help to see the list inside the panel.

Command What it does
/help Show the list of available commands
/allow-api Per-conversation API mutation override. Lifts the UI-first restriction so the agent may use POST/PUT/PATCH/DELETE via fetch_url when UI is failing. Badge appears while active; clears on /reset.
/compact Toggle verbose/compact tool display (same as the toolbar button)
/reset Clear the conversation and all per-conversation flags
/screenshot Capture the visible tab and display the image inline in chat
/export Download the current conversation as a Markdown file
/profile Toggle profile auto-fill on/off without opening Settings
/vision Toggle vision mode (screenshot understanding) on the active provider

The default UI-first rule exists because API actions are invisible (you don't see what's being sent), often require separate auth tokens you may not have configured, and can have a much larger blast radius than a visible mis-click. Only use /allow-api when you've decided you want that tradeoff for a specific job.

Known Issues

  • Firefox is meaningfully weaker than Chrome. Firefox has no equivalent to Chrome DevTools Protocol via chrome.debugger, so several Chrome-only features are missing in the Firefox build:
    • Click/type goes through the content-script path (document.querySelector + el.click()) instead of CDP Input.dispatchMouseEvent. This means no shadow-DOM piercing, no real trusted mouse events (some React/Vue handlers won't fire), no closed-shadow-root traversal, and no resolveSelector retry budget.
    • No SPA-navigation-aware retry extension.
    • No conversation persistence across background restarts.
    • No CDP screenshots. Auto-screenshot uses tabs.captureVisibleTab instead, which works for active tabs only and at slightly lower quality.
    • No closed shadow root support for read/extract tools.
    • Site adapters, vision detection, loop detection, the auto-screenshot loop, and the opt-in compact prompt/tool set are mirrored to Firefox.
  • SPA navigation detection in Firefox. Some single-page applications may not trigger content-script re-injection after client-side navigation.
  • Firefox temporary add-on β€” Firefox requires the extension to be loaded as a temporary add-on during development, which is removed on restart.

What's New

See CHANGELOG.md for the full version history. Recent highlights: native PDF reading with Claude passthrough (8.x), 65+ bug fixes in 8.5.0, compact mode going fully opt-in (8.3.0), Turkish deasciification (8.2.x), on-page agent indicator and tab-group-scoped side panel (6.0.x).

Roadmap

  • Conversation export/import β€” Save and load chat histories
  • Custom tool definitions β€” User-defined tools via settings
  • Keyboard shortcuts β€” Hotkeys for opening panel, sending messages, switching modes
  • Context menu integration β€” Right-click β†’ "Ask WebBrain about this"
  • Screenshot/vision tool β€” Send screenshots to multimodal models for visual understanding
  • Chrome Web Store / Firefox AMO β€” Official store listings

Adding a New Provider

  1. Create a new class extending BaseLLMProvider in src/providers/
  2. Implement chat() and optionally chatStream()
  3. Register it in src/providers/manager.js

All providers normalize to a common response format:

{ content: string, toolCalls: Array|null, usage: Object|null }

License

MIT β€” built by Emre Sokullu

About

Open-source AI browser agent for Chrome and Firefox (monorepo) 🧠

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • JavaScript 54.5%
  • HTML 42.9%
  • Python 1.1%
  • CSS 0.9%
  • TypeScript 0.6%
  • Shell 0.0%