You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If AUTH_TOKEN is set, all requests (except /health) require authentication:
Authorization: Bearer <token>
Or pass it as a query param: ?auth_token=<token> (useful for MCP clients that can't set headers).
Cluster Mode Restriction
When running in cluster mode (NUM_REPLICAS > 1), only run_script, ping, and sleep actions are allowed. All other actions return an error directing you to use run_script. This prevents stale content bugs from calls hitting different browser instances. See cluster-mode.md for details.
Request Serialization
In single-instance mode, only one request runs at a time. Additional requests queue up and execute sequentially. /health and /state are never blocked.
Request / Response Format
Command format:
POST http://localhost:8080/
Content-Type: application/json
{"action": "action_name", "param1": "value1", "param2": "value2"}
Response format:
{
"success": true,
"timestamp": 1234567890.123,
"data": { ... },
"error": "only present when success is false"
}
The data field contains action-specific results (page text, element coordinates, cookie values, etc.).
Example: Full Login Flow (Undetectable)
API=http://localhost:8080
# 1. Navigate to login page
curl -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "goto", "url": "https://example.com/login"}'# 2. Find all interactive elements (buttons, inputs, links)
curl -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "get_interactive_elements"}'# Returns elements with x, y coordinates, text, and CSS selectors# 3. Click the email field (use coordinates from step 2)
curl -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "system_click", "x": 400, "y": 200}'# 4. Type email with human-like keystrokes
curl -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "system_type", "text": "user@example.com"}'# 5. Tab to password field
curl -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "send_key", "key": "tab"}'# 6. Type password
curl -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "system_type", "text": "secretpassword"}'# 7. Press Enter to submit
curl -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "send_key", "key": "enter"}'# 8. Wait for redirect to dashboard
curl -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "wait_for_url", "url": "**/dashboard", "timeout": 15}'# 9. Verify we're logged in
curl -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "get_text"}'
Every interaction (clicks, typing, key presses) uses OS-level input. The site sees a real human typing at a natural speed with randomized delays. No CDP signals. No automation fingerprints.
Actions Reference
All actions are sent as POST / with JSON body {"action": "name", ...params}.
Navigation
Action
Parameters
What It Does
goto
url, wait_until, referer
Navigate to a URL. wait_until: "domcontentloaded" (default), "load", "networkidle". referer: set the HTTP Referer header (useful for sites that check referrer).
refresh
wait_until (optional)
Reload the current page. Returns URL and title.
System Input (OS-Level, Undetectable — Last Resort)
Action
Parameters
What It Does
system_click
x, y, duration
Moves the mouse to viewport coordinates with a human-like curved path (random jitter, eased acceleration), then clicks. Last resort — prefer click with a CSS selector. Only use this when the site detects DOM event injection. Requires calibrate to have been called first or coordinates will be wrong. duration controls movement time (random 0.2-0.6s if omitted).
mouse_move
x, y, duration
Moves the mouse with human-like movement but does not click. Use to hover over elements (trigger dropdown menus, tooltips) or simulate natural mouse behavior between actions.
mouse_click
x, y (optional)
Clicks at a position or wherever the mouse currently is. Unlike system_click, this does not do the smooth mouse movement first — it's a direct click. Use after mouse_move when you want to separate movement and click.
system_type
text, interval
Types text character-by-character via real OS keystrokes. Each key has a randomized delay (jittered around interval, default 0.08s) to mimic human typing speed. You must focus an input field first.
send_key
key
Sends a keyboard key or combo. Examples: "enter", "tab", "escape", "backspace", "ctrl+a", "ctrl+shift+t". Uses PyAutoGUI key names.
scroll
amount, x, y
Scrolls using the mouse wheel. Negative = scroll down, positive = scroll up. If x, y are provided, moves the mouse there first (useful for scrolling inside a specific element).
Playwright Input (DOM Events — Use These First)
Action
Parameters
What It Does
click
selector
Preferred click method. Clicks an element by CSS selector or XPath (xpath=//button[@id='submit']). Fast and reliable. Only fall back to system_click if the site actively detects and blocks DOM event injection.
fill
selector, value
Sets an input field's value instantly. Clears existing content first. Fast but doesn't generate individual keystroke events — detectable.
type
selector, text, delay
Types into an element character-by-character via Playwright. Middle ground between fill (instant) and system_type (OS-level). delay defaults to 0.05s between keys.
Page Inspection
Action
Parameters
What It Does
get_interactive_elements
visible_only
Scans the page and returns every interactive element (buttons, links, inputs, selects, textareas) with their viewport coordinates (x, y), dimensions (w, h), text, CSS selector, and visible status. This is how you find what to click.
get_text
—
Returns all visible text from the page body (truncated to 10,000 chars). Usually the first thing to call after navigating — tells you what's on the page without a screenshot.
get_html
—
Returns the full HTML source of the page. Use when get_text doesn't give enough structure.
eval
expression
Executes JavaScript in the page context and returns the result. Example: "document.title", "document.querySelectorAll('a').length".
Wait Conditions
Use these instead of sleep — they wait for actual page state, not arbitrary time.
Action
Parameters
What It Does
wait_for_element
selector, state, timeout
Waits for an element to reach a state. state: "visible" (default), "hidden", "attached", "detached". timeout in seconds (default 30). CSS or XPath.
wait_for_text
text, timeout
Waits for specific text to appear anywhere on the page (substring match).
wait_for_url
url, timeout
Waits for the URL to match a glob pattern. * matches any chars except /, ** matches everything. Example: "**/dashboard".
wait_for_network_idle
timeout
Waits until no network requests have been made for 500ms. Useful for pages that load content dynamically.
Tab Management
Action
Parameters
What It Does
list_tabs
—
Returns all open tabs with their index, URL, and which one is active.
new_tab
url, wait_until
Opens a new tab (becomes the active tab). Optionally navigates to a URL.
switch_tab
index
Switches the active tab by index (0-based). All subsequent actions operate on the active tab.
close_tab
index (optional)
Closes a tab. If no index, closes the active tab. After closing, the last remaining tab becomes active.
Dialog Handling
Browsers have modal dialogs (alert, confirm, prompt). By default, dialogs are auto-accepted (clicks OK). Use handle_dialog to dismiss or provide prompt text.
Action
Parameters
What It Does
handle_dialog
accept, text
Pre-configures how the next dialog will be handled. accept: true = click OK, false = click Cancel. text: response for prompt dialogs. Call this BEFORE the action that triggers the dialog. If you don't, the dialog is auto-accepted (clicks OK). You only need this if you want to dismiss (Cancel) or provide prompt text.
get_last_dialog
—
Returns info about the last dialog: type (alert/confirm/prompt/beforeunload), message, default_value, buttons.
Cookies
Action
Parameters
What It Does
get_cookies
urls (optional)
Returns all browser cookies. Optionally filter by URL list. Each cookie includes name, value, domain, path, httpOnly, secure, etc.
set_cookie
name, value, url/domain, ...
Sets a cookie. Needs at minimum: name, value, and either url or domain. Accepts all standard cookie fields (path, httpOnly, secure, sameSite, expires).
delete_cookies
—
Clears all cookies from the browser context.
Storage
Access the page's localStorage and sessionStorage. Storage is per-origin — you must be on the right page.
Action
Parameters
What It Does
get_storage
type
Returns all items as key-value pairs. type: "local" (default) or "session".
set_storage
type, key, value
Sets a single key-value pair.
clear_storage
type
Clears all items.
Downloads & Uploads
Action
Parameters
What It Does
get_last_download
—
Returns info about the most recent file download: url, filename, and local path inside the container. Returns null if nothing downloaded yet.
upload_file
selector, file_path
Programmatically sets a file on an <input type="file"> element without opening the OS file picker. File must exist inside the container (use docker cp to copy files in). You still need to submit the form after.
Network Logging
Record all HTTP requests and responses the page makes. Useful for finding API endpoints, debugging, or verifying resources loaded.
Action
Parameters
What It Does
enable_network_log
—
Starts recording. Each entry captures: URL, method, resource type (fetch/document/script/image/etc), status code, and timestamp.
disable_network_log
—
Stops recording. Already-captured entries remain.
get_network_log
—
Returns all captured entries with their count.
clear_network_log
—
Deletes captured entries. Keeps logging on if it was on.
getclear_network_log
—
Returns all captured entries and clears the log in one call.
Console Logging
Capture console.log, console.error, console.warn, and other console output from the page. Useful for debugging page behavior or extracting data logged by scripts.
Action
Parameters
What It Does
enable_console_log
—
Starts capturing console messages. Each entry has: type (log/error/warning/info/debug/trace/table/etc), text, location, timestamp.
disable_console_log
—
Stops capturing. Already-captured entries remain.
get_console_log
—
Returns all captured entries with their count.
clear_console_log
—
Deletes captured entries. Keeps capturing on if it was on.
getclear_console_log
—
Returns all captured entries and clears the log in one call.
Display & Calibration
Action
Parameters
What It Does
calibrate
—
Recalculates the mapping between viewport coordinates (from get_interactive_elements) and screen coordinates (what PyAutoGUI uses). The browser window has chrome (title bar, etc.) that offsets the content area. Call this after entering/exiting fullscreen, or if system_click seems to be hitting the wrong spot. Auto-calculated at startup.
get_resolution
—
Returns the virtual display resolution (width, height).
enter_fullscreen
—
Puts the browser in fullscreen mode (hides address bar and window chrome). Call calibrate after.
exit_fullscreen
—
Exits fullscreen mode. Call calibrate after.
Scrolling
Action
Parameters
What It Does
scroll_to_bottom
delay
Scrolls the entire page top-to-bottom using JavaScript (window.scrollBy), then back to top. Useful for triggering lazy-loaded content. delay (default 0.4s) is the pause between scroll steps. This is fast but uses JS, not OS-level input.
scroll_to_bottom_humanized
min_clicks, max_clicks, delay
Same goal as above but uses real OS-level mouse wheel scrolling (PyAutoGUI) with randomized scroll amounts and jittered delays. Undetectable by behavioral analysis. Slower but stealthy.
Utility
Action
Parameters
What It Does
run_script
steps or yaml, name, on_error
Execute multiple actions as a single atomic request. steps: list of action dicts. yaml: inline YAML string (same format as --script mode). on_error: "stop" (default) or "continue". Steps with output_id collect results.
ping
—
Health check that returns "pong" and the current page URL.
sleep
duration
Pauses for N seconds. Prefer wait_for_element or wait_for_text when waiting for page content.
close
—
Shuts down the browser. The container stops after this.
save_screenshot
output_id, path, type, width, height, whLargest
Captures a screenshot. type: "browser" (default) or "desktop". Optional path to also write PNG to disk. In script mode, output_id collects the base64 PNG into the outputs dict. Supports resize via width/height/whLargest.
Screenshots
Both screenshot endpoints support resize parameters. The default resolution is 1920x1080 — that's a big image. You almost always want to resize.
# Resize to 512px on longest side (best default — keeps aspect ratio, manageable size)
curl http://localhost:8080/screenshot/browser?whLargest=512 -o screenshot.png
# Resize to 800px wide
curl http://localhost:8080/screenshot/browser?width=800 -o screenshot.png
# Exact 400x400 dimensions
curl http://localhost:8080/screenshot/browser?width=400&height=400 -o screenshot.png
# Full desktop (includes browser chrome, taskbar, etc.)
curl http://localhost:8080/screenshot/desktop?whLargest=512 -o desktop.png
Parameter
What It Does
whLargest=512
Scales so the largest dimension is 512px, keeps aspect ratio. Use this by default.