|
| 1 | + |
| 2 | +--- |
| 3 | +name: agent-browser |
| 4 | +description: Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages. |
| 5 | +allowed-tools: Bash(agent-browser:*) |
| 6 | +--- |
| 7 | + |
| 8 | +# Browser Automation with agent-browser |
| 9 | + |
| 10 | +## Quick start |
| 11 | + |
| 12 | +```bash |
| 13 | +agent-browser open <url> # Navigate to page |
| 14 | +agent-browser snapshot -i # Get interactive elements with refs |
| 15 | +agent-browser click @e1 # Click element by ref |
| 16 | +agent-browser fill @e2 "text" # Fill input by ref |
| 17 | +agent-browser close # Close browser |
| 18 | +``` |
| 19 | + |
| 20 | +## Core workflow |
| 21 | + |
| 22 | +1. Navigate: `agent-browser open <url>` |
| 23 | +2. Snapshot: `agent-browser snapshot -i` (returns elements with refs like `@e1`, `@e2`) |
| 24 | +3. Interact using refs from the snapshot |
| 25 | +4. Re-snapshot after navigation or significant DOM changes |
| 26 | + |
| 27 | +## Commands |
| 28 | + |
| 29 | +### Navigation |
| 30 | +```bash |
| 31 | +agent-browser open <url> # Navigate to URL |
| 32 | +agent-browser back # Go back |
| 33 | +agent-browser forward # Go forward |
| 34 | +agent-browser reload # Reload page |
| 35 | +agent-browser close # Close browser |
| 36 | +``` |
| 37 | + |
| 38 | +### Snapshot (page analysis) |
| 39 | +```bash |
| 40 | +agent-browser snapshot # Full accessibility tree |
| 41 | +agent-browser snapshot -i # Interactive elements only (recommended) |
| 42 | +agent-browser snapshot -c # Compact output |
| 43 | +agent-browser snapshot -d 3 # Limit depth to 3 |
| 44 | +agent-browser snapshot -s "#main" # Scope to CSS selector |
| 45 | +``` |
| 46 | + |
| 47 | +### Interactions (use @refs from snapshot) |
| 48 | +```bash |
| 49 | +agent-browser click @e1 # Click |
| 50 | +agent-browser dblclick @e1 # Double-click |
| 51 | +agent-browser focus @e1 # Focus element |
| 52 | +agent-browser fill @e2 "text" # Clear and type |
| 53 | +agent-browser type @e2 "text" # Type without clearing |
| 54 | +agent-browser press Enter # Press key |
| 55 | +agent-browser press Control+a # Key combination |
| 56 | +agent-browser keydown Shift # Hold key down |
| 57 | +agent-browser keyup Shift # Release key |
| 58 | +agent-browser hover @e1 # Hover |
| 59 | +agent-browser check @e1 # Check checkbox |
| 60 | +agent-browser uncheck @e1 # Uncheck checkbox |
| 61 | +agent-browser select @e1 "value" # Select dropdown |
| 62 | +agent-browser scroll down 500 # Scroll page |
| 63 | +agent-browser scrollintoview @e1 # Scroll element into view |
| 64 | +agent-browser drag @e1 @e2 # Drag and drop |
| 65 | +agent-browser upload @e1 file.pdf # Upload files |
| 66 | +``` |
| 67 | +
|
| 68 | +### Get information |
| 69 | +```bash |
| 70 | +agent-browser get text @e1 # Get element text |
| 71 | +agent-browser get html @e1 # Get innerHTML |
| 72 | +agent-browser get value @e1 # Get input value |
| 73 | +agent-browser get attr @e1 href # Get attribute |
| 74 | +agent-browser get title # Get page title |
| 75 | +agent-browser get url # Get current URL |
| 76 | +agent-browser get count ".item" # Count matching elements |
| 77 | +agent-browser get box @e1 # Get bounding box |
| 78 | +``` |
| 79 | +
|
| 80 | +### Check state |
| 81 | +```bash |
| 82 | +agent-browser is visible @e1 # Check if visible |
| 83 | +agent-browser is enabled @e1 # Check if enabled |
| 84 | +agent-browser is checked @e1 # Check if checked |
| 85 | +``` |
| 86 | +
|
| 87 | +### Screenshots & PDF |
| 88 | +```bash |
| 89 | +agent-browser screenshot # Screenshot to stdout |
| 90 | +agent-browser screenshot path.png # Save to file |
| 91 | +agent-browser screenshot --full # Full page |
| 92 | +agent-browser pdf output.pdf # Save as PDF |
| 93 | +``` |
| 94 | +
|
| 95 | +### Video recording |
| 96 | +```bash |
| 97 | +agent-browser record start ./demo.webm # Start recording (uses current URL + state) |
| 98 | +agent-browser click @e1 # Perform actions |
| 99 | +agent-browser record stop # Stop and save video |
| 100 | +agent-browser record restart ./take2.webm # Stop current + start new recording |
| 101 | +``` |
| 102 | +Recording creates a fresh context but preserves cookies/storage from your session. If no URL is provided, it automatically returns to your current page. For smooth demos, explore first, then start recording. |
| 103 | +
|
| 104 | +### Wait |
| 105 | +```bash |
| 106 | +agent-browser wait @e1 # Wait for element |
| 107 | +agent-browser wait 2000 # Wait milliseconds |
| 108 | +agent-browser wait --text "Success" # Wait for text |
| 109 | +agent-browser wait --url "**/dashboard" # Wait for URL pattern |
| 110 | +agent-browser wait --load networkidle # Wait for network idle |
| 111 | +agent-browser wait --fn "window.ready" # Wait for JS condition |
| 112 | +``` |
| 113 | +
|
| 114 | +### Mouse control |
| 115 | +```bash |
| 116 | +agent-browser mouse move 100 200 # Move mouse |
| 117 | +agent-browser mouse down left # Press button |
| 118 | +agent-browser mouse up left # Release button |
| 119 | +agent-browser mouse wheel 100 # Scroll wheel |
| 120 | +``` |
| 121 | +
|
| 122 | +### Semantic locators (alternative to refs) |
| 123 | +```bash |
| 124 | +agent-browser find role button click --name "Submit" |
| 125 | +agent-browser find text "Sign In" click |
| 126 | +agent-browser find label "Email" fill "user@test.com" |
| 127 | +agent-browser find first ".item" click |
| 128 | +agent-browser find nth 2 "a" text |
| 129 | +``` |
| 130 | +
|
| 131 | +### Browser settings |
| 132 | +```bash |
| 133 | +agent-browser set viewport 1920 1080 # Set viewport size |
| 134 | +agent-browser set device "iPhone 14" # Emulate device |
| 135 | +agent-browser set geo 37.7749 -122.4194 # Set geolocation |
| 136 | +agent-browser set offline on # Toggle offline mode |
| 137 | +agent-browser set headers '{"X-Key":"v"}' # Extra HTTP headers |
| 138 | +agent-browser set credentials user pass # HTTP basic auth |
| 139 | +agent-browser set media dark # Emulate color scheme |
| 140 | +``` |
| 141 | +
|
| 142 | +### Cookies & Storage |
| 143 | +```bash |
| 144 | +agent-browser cookies # Get all cookies |
| 145 | +agent-browser cookies set name value # Set cookie |
| 146 | +agent-browser cookies clear # Clear cookies |
| 147 | +agent-browser storage local # Get all localStorage |
| 148 | +agent-browser storage local key # Get specific key |
| 149 | +agent-browser storage local set k v # Set value |
| 150 | +agent-browser storage local clear # Clear all |
| 151 | +``` |
| 152 | +
|
| 153 | +### Network |
| 154 | +```bash |
| 155 | +agent-browser network route <url> # Intercept requests |
| 156 | +agent-browser network route <url> --abort # Block requests |
| 157 | +agent-browser network route <url> --body '{}' # Mock response |
| 158 | +agent-browser network unroute [url] # Remove routes |
| 159 | +agent-browser network requests # View tracked requests |
| 160 | +agent-browser network requests --filter api # Filter requests |
| 161 | +``` |
| 162 | +
|
| 163 | +### Tabs & Windows |
| 164 | +```bash |
| 165 | +agent-browser tab # List tabs |
| 166 | +agent-browser tab new [url] # New tab |
| 167 | +agent-browser tab 2 # Switch to tab |
| 168 | +agent-browser tab close # Close tab |
| 169 | +agent-browser window new # New window |
| 170 | +``` |
| 171 | +
|
| 172 | +### Frames |
| 173 | +```bash |
| 174 | +agent-browser frame "#iframe" # Switch to iframe |
| 175 | +agent-browser frame main # Back to main frame |
| 176 | +``` |
| 177 | +
|
| 178 | +### Dialogs |
| 179 | +```bash |
| 180 | +agent-browser dialog accept [text] # Accept dialog |
| 181 | +agent-browser dialog dismiss # Dismiss dialog |
| 182 | +``` |
| 183 | +
|
| 184 | +### JavaScript |
| 185 | +```bash |
| 186 | +agent-browser eval "document.title" # Run JavaScript |
| 187 | +``` |
| 188 | +
|
| 189 | +## Example: Form submission |
| 190 | +
|
| 191 | +```bash |
| 192 | +agent-browser open https://example.com/form |
| 193 | +agent-browser snapshot -i |
| 194 | +# Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3] |
| 195 | + |
| 196 | +agent-browser fill @e1 "user@example.com" |
| 197 | +agent-browser fill @e2 "password123" |
| 198 | +agent-browser click @e3 |
| 199 | +agent-browser wait --load networkidle |
| 200 | +agent-browser snapshot -i # Check result |
| 201 | +``` |
| 202 | +
|
| 203 | +## Example: Authentication with saved state |
| 204 | +
|
| 205 | +```bash |
| 206 | +# Login once |
| 207 | +agent-browser open https://app.example.com/login |
| 208 | +agent-browser snapshot -i |
| 209 | +agent-browser fill @e1 "username" |
| 210 | +agent-browser fill @e2 "password" |
| 211 | +agent-browser click @e3 |
| 212 | +agent-browser wait --url "**/dashboard" |
| 213 | +agent-browser state save auth.json |
| 214 | + |
| 215 | +# Later sessions: load saved state |
| 216 | +agent-browser state load auth.json |
| 217 | +agent-browser open https://app.example.com/dashboard |
| 218 | +``` |
| 219 | +
|
| 220 | +## Sessions (parallel browsers) |
| 221 | +
|
| 222 | +```bash |
| 223 | +agent-browser --session test1 open site-a.com |
| 224 | +agent-browser --session test2 open site-b.com |
| 225 | +agent-browser session list |
| 226 | +``` |
| 227 | +
|
| 228 | +## JSON output (for parsing) |
| 229 | +
|
| 230 | +Add `--json` for machine-readable output: |
| 231 | +```bash |
| 232 | +agent-browser snapshot -i --json |
| 233 | +agent-browser get text @e1 --json |
| 234 | +``` |
| 235 | +
|
| 236 | +## Debugging |
| 237 | +
|
| 238 | +```bash |
| 239 | +agent-browser open example.com --headed # Show browser window |
| 240 | +agent-browser console # View console messages |
| 241 | +agent-browser errors # View page errors |
| 242 | +agent-browser record start ./debug.webm # Record from current page |
| 243 | +agent-browser record stop # Save recording |
| 244 | +agent-browser open example.com --headed # Show browser window |
| 245 | +agent-browser --cdp 9222 snapshot # Connect via CDP |
| 246 | +agent-browser console # View console messages |
| 247 | +agent-browser console --clear # Clear console |
| 248 | +agent-browser errors # View page errors |
| 249 | +agent-browser errors --clear # Clear errors |
| 250 | +agent-browser highlight @e1 # Highlight element |
| 251 | +agent-browser trace start # Start recording trace |
| 252 | +agent-browser trace stop trace.zip # Stop and save trace |
| 253 | +``` |
0 commit comments