- 🔧 Tool Router Mode: Composio's intelligent tool routing for accessing Gmail, Slack, GitHub, and 500+ integrations
- ◉ Browser Tools Mode: Gemini 2.5 Computer Use for visual browser automation with screenshots, clicks, typing, scrolling, and navigation
- Sidebar Chat Interface: Clean, modern React-based chat UI accessible from any tab
- Direct Browser Automation: No backend required - all API calls made directly from extension
- Visual Feedback: Blue click indicators and element highlighting during automation
- Smart Coordinate Scaling: Automatically scales Gemini's 1000x1000 coordinate system to actual viewport dimensions
- Safety Features: Confirmation dialogs for sensitive actions (checkout, payment, etc.)
- Node.js 18+ and npm
- Chrome or Edge browser (Manifest V3 support)
- Google API key for Gemini (required)
- Composio API key (optional, for Tool Router mode)
- Clone this repository
- Install dependencies:
npm install- Build the extension:
npm run build- Load the extension in Chrome:
- Open Chrome and navigate to
chrome://extensions/ - Enable "Developer mode" in the top right
- Click "Load unpacked"
- Select the
distfolder - Open Settings (⚙️ icon) to configure your API keys
- Open Chrome and navigate to
-
Google API Key (Required)
- Get your key from Google AI Studio
- Add it in Settings under "Google API Key"
- Supports: Gemini 2.5 Pro, Flash, and Flash Lite
-
Composio API Key (Optional - for Tool Router mode)
- Get your key from Composio Dashboard
- Add it in Settings under "Composio API Key"
- Enables access to 500+ app integrations
- Enable Browser Tools by clicking the ◉ button in the chat header
- The extension automatically uses Gemini 2.5 Computer Use Preview
- Provide natural language instructions to control the browser
Example prompts:
- "Navigate to reddit.com and scroll down"
- "Click on the search box and type 'puppies'"
- "Take a screenshot of this page"
- "Click the first image on the page"
- Add your Composio API key in Settings
- Click ◉ to disable Browser Tools (or keep it off)
- Chat normally - the AI will automatically use Composio tools when needed
Example prompts:
- "Check my Gmail for unread messages"
- "Create a GitHub issue titled 'Bug in login flow'"
- "Send a Slack message to #general with 'Hello team!'"
Run with hot reload:
npm run devThen reload the extension in Chrome after each change.
Atlas consists of three main components that work together:
┌─────────────────┐
│ sidepanel.tsx │ ← React UI with chat interface
│ (React UI) │
└────────┬────────┘
│
↓ Messages
┌─────────────────┐
│ background.ts │ ← Service worker (screenshots, navigation)
│ (Service │
│ Worker) │
└────────┬────────┘
│
↓ Execute actions
┌─────────────────┐
│ content.ts │ ← Content script (DOM manipulation)
│ (Injected │
│ on all tabs) │
└─────────────────┘
The main React component handles:
Browser Tools Mode (streamWithGeminiComputerUse):
- Takes initial screenshot of current page
- Sends screenshot + conversation history to Gemini 2.5 Computer Use
- Receives function calls (click, type, navigate, scroll, etc.)
- Executes actions via
executeBrowserAction() - Re-takes screenshot after each action
- Scales coordinates from Gemini's 1000x1000 grid to actual viewport
- Supports up to 30 turns of action-response loops
- Shows visual feedback in UI
Tool Router Mode (streamWithAISDKAndMCP):
- Connects to Composio's MCP (Model Context Protocol) server
- Uses Vercel AI SDK for streaming responses
- Auto-discovers and calls tools via MCP
- Manages MCP client lifecycle and session persistence
Key Functions:
executeTool()- Sends message to background script to perform browser actionsscaleCoordinates()- Converts Gemini coordinates to viewport coordinatesexecuteBrowserAction()- Maps Gemini function names to actual browser actionsrequiresUserConfirmation()- Safety checks for sensitive actionsloadSettings()- Manages Composio session initialization
The service worker provides:
Screenshot Functionality:
- Captures visible tab using
chrome.tabs.captureVisibleTab() - Auto-handles restricted pages (chrome://, about:, etc.) by navigating to Google.com
- Filters for actual visible tabs (not devtools or hidden windows)
- Returns data URL for screenshot
Action Execution:
- Relays commands from sidepanel to content script
- Ensures content script is injected before execution
- Handles tab targeting and message passing
Browser APIs:
- Gets browser history
- Manages bookmarks
- Navigates tabs
- Tracks recent pages for memory
Key Message Types:
TAKE_SCREENSHOT- Capture current tabGET_PAGE_CONTEXT- Extract page metadataEXECUTE_ACTION- Run browser action (click, type, etc.)NAVIGATE- Change page URLGET_HISTORY- Fetch browsing history
Injected into every webpage to:
Extract Page Context:
- URL, title, text content
- Links, images, forms
- Viewport dimensions (width, height, scroll position)
- Metadata (description, keywords, author)
Execute DOM Actions:
Click (highlightElement() + executePageAction('click')):
- Can click by CSS selector OR by coordinates
- Dispatches full mouse event sequence (mousedown, mouseup, click)
- Shows blue pulsing animation at click location
- Highlights element with blue outline
- Returns element info for debugging
Type (keyboard_type action):
- Types character by character to simulate real keyboard input
- Works with regular inputs, textareas, and contenteditable elements
- Dispatches input/change events for React/Vue/Angular compatibility
- For
type_text_at: clicks coordinates, waits for focus, clears existing text, then types
Scroll:
- Supports up/down by pixels
- Can scroll to top/bottom
- Can scroll element into view
Navigate:
- Uses
chrome.tabs.update()to change URLs
Special Actions:
hover- Mouse over at coordinatesdrag_drop- Drag from one point to anotherkey_combination- Press keyboard shortcutsclear_input- Clear focused field
Visual Feedback:
- Blue outline on clicked elements
- Blue pulsing circle at click coordinates
- Animation automatically cleans up after 600ms
When you enable Browser Tools (◉ button):
1. User sends message: "Navigate to reddit.com and scroll"
↓
2. sidepanel.tsx → streamWithGeminiComputerUse()
↓
3. Take initial screenshot via executeTool('screenshot')
↓
4. Send to Gemini 2.5 Computer Use with:
- Screenshot as inline_data (base64 PNG)
- Conversation history
- System instruction with available functions
↓
5. Gemini responds with function call: navigate({url: "https://reddit.com"})
↓
6. executeBrowserAction() maps to executeTool('navigate')
↓
7. background.ts receives EXECUTE_ACTION message
↓
8. chrome.tabs.update() navigates to URL
↓
9. Wait 2.5 seconds for page to load
↓
10. Take new screenshot
↓
11. Send function_response back to Gemini:
{ url: "https://reddit.com", success: true, [screenshot] }
↓
12. Gemini calls scroll_down()
↓
13. Execute via content.ts → window.scrollBy()
↓
14. Take another screenshot
↓
15. Continue loop (up to 30 turns) until task complete
Coordinate Scaling Explained:
Gemini Computer Use uses normalized coordinates (0-1000 on both axes). Atlas automatically scales them:
// Gemini returns: x=500, y=300 (in 1000x1000 space)
// Actual viewport: 1920x1080
scaledX = (500 / 1000) * 1920 = 960
scaledY = (300 / 1000) * 1080 = 324This ensures clicks land in the right place regardless of screen size.
The extension has built-in safety checks:
Always Requires Confirmation:
- Keyboard combinations (Ctrl+A, Alt+Tab, etc.)
Context-Aware Confirmation:
- Sensitive Pages: Checkout, payment, login, admin pages
- Sensitive Data: Detecting passwords or credit cards being typed
- Form Submissions: When typing with
press_enter: true
Confirmation appears as a browser dialog before executing the action.
When Composio API key is provided:
1. initializeComposioToolRouter() creates session
↓
2. Gets MCP URLs (chat_session_mcp_url, tool_router_mcp_url)
↓
3. Connects to MCP via StreamableHTTPClientTransport
↓
4. Queries available tools: mcpClient.tools()
↓
5. Merges MCP tools with local tools (getBrowserHistory)
↓
6. Passes all tools to AI SDK: streamText({ tools: allTools })
↓
7. AI SDK orchestrates tool calls via MCP
↓
8. Composio executes integration actions
↓
9. Results streamed back to user
Available Tools in Tool Router Mode:
- Composio Tools - 500+ integrations (Gmail, Slack, GitHub, etc.)
- getBrowserHistory - Built-in browser history search
The MCP client persists across messages but is recreated on "New Chat" to refresh available tools.
Chrome Storage (chrome.storage.local):
atlasSettings- API keys, model selectioncomposioSessionId- Active Composio sessioncomposioChatMcpUrl- Chat MCP endpointcomposioToolRouterMcpUrl- Tool Router MCP endpointextensionUserId- Unique user ID for rate limitingbrowserMemory- Recent pages, preferences
Session Management:
- Composio sessions expire after 24 hours
- New chat resets session to refresh tools
- Session automatically recreated if expired
When using Tool Router mode (with Composio API key), the agent has access to a built-in getBrowserHistory tool that allows it to search through your browsing history.
Tool Features:
- Search by keyword - Filter history by page title or URL
- Time range - Default searches last 7 days, configurable
- Result limit - Default returns 20 results, adjustable
Example Usage:
- "What GitHub repositories did I visit this week?"
- "Find the Reddit post I looked at yesterday"
- "Show me recent news articles I read"
- "What programming tutorials did I visit last month?"
The tool respects Chrome's history permissions and only accesses data you've already stored in your browser history.
atlas/
├── sidepanel.tsx # Main React component (1426 lines)
│ ├── Browser Tools mode with Gemini Computer Use
│ ├── Tool Router mode with MCP
│ ├── Message parsing with React Markdown
│ └── Coordinate scaling and safety checks
│
├── content.ts # Content script (714 lines)
│ ├── Page context extraction
│ ├── DOM manipulation (click, type, scroll)
│ ├── Visual feedback (blue click indicators)
│ └── Keyboard simulation
│
├── background.ts # Service worker (302 lines)
│ ├── Screenshot capture
│ ├── Tab navigation
│ ├── Browser history
│ └── Message routing
│
├── settings.tsx # Settings page (163 lines)
│ └── API key configuration UI
│
├── tools.ts # Composio integration (68 lines)
│ └── MCP session management
│
├── types.ts # TypeScript definitions (271 lines)
│ ├── Zod schemas for validation
│ └── Interface definitions
│
├── manifest.json # Extension manifest (Manifest V3)
│ ├── Permissions (tabs, history, bookmarks, etc.)
│ └── Content script injection
│
└── vite.config.ts # Build configuration
- React 18 - UI framework with hooks
- TypeScript - Type safety
- Vite - Build tool and bundler
- Vercel AI SDK - Streaming AI responses
- React Markdown - Markdown rendering
- Zod - Runtime validation
- Chrome Extension APIs - Manifest V3
- Google Gemini API - AI models (2.5 Pro/Flash/Lite/Computer Use)
- Composio MCP - Tool Router integration
- StreamableHTTPClientTransport - MCP transport
When clicking on a page:
- Blue outline appears around element (3px solid #007AFF)
- Light blue background highlight (rgba(0, 122, 255, 0.1))
- Pulsing circle animation at click coordinates
- All effects auto-remove after 600ms
The keyboard_type action:
- Types character-by-character
- Dispatches keydown, keypress, keyup for each char
- Triggers input/change events for React compatibility
- Works with INPUT, TEXTAREA, and contenteditable elements
- Screenshots have retry logic (3 attempts with 1.5s delays)
- Connection errors automatically retry
- Graceful fallbacks for missing elements
- Detailed error messages in UI
Contributions welcome! Please:
- Open an issue first to discuss changes
- Fork the repository
- Create a feature branch
- Submit a pull request
- Composio Tool Router Documentation - Learn how to use Tool Router to route tool calls across 500+ integrations
- Composio Platform - Intelligent tool routing for AI agents
- Composio GitHub - Python and TS SDK
- ChatGPT Atlas - OpenAI's browser automation AI agent
- Gemini Computer Use Model - Google's AI model for browser automation
- Gemini API Documentation - Official documentation for Gemini Computer Use
MIT
