HTTP API Reference

Endpoints

Endpoint	Method	What It Does
`/`	POST	Execute any browser action (see Actions Reference below)
`/screenshot/browser`	GET	Browser viewport as PNG — what the page looks like
`/screenshot/desktop`	GET	Full virtual desktop as PNG — including browser chrome
`/state`	GET	Current URL, page title, and window offset as JSON
`/health`	GET	Returns `ok` when the browser is ready
`/mcp/`	POST	MCP (Model Context Protocol) Streamable HTTP endpoint

Authentication

If AUTH_TOKEN is set, all requests (except /health) require authentication:

Authorization: Bearer <token>

Or pass it as a query param: ?auth_token=<token> (useful for MCP clients that can't set headers).

Cluster Mode Restriction

When running in cluster mode (NUM_REPLICAS > 1), only run_script, ping, and sleep actions are allowed. All other actions return an error directing you to use run_script. This prevents stale content bugs from calls hitting different browser instances. See cluster-mode.md for details.

Request Serialization

In single-instance mode, only one request runs at a time. Additional requests queue up and execute sequentially. /health and /state are never blocked.

Request / Response Format

Command format:

POST http://localhost:8080/
Content-Type: application/json

{"action": "action_name", "param1": "value1", "param2": "value2"}

Response format:

{
  "success": true,
  "timestamp": 1234567890.123,
  "data": { ... },
  "error": "only present when success is false"
}

The data field contains action-specific results (page text, element coordinates, cookie values, etc.).

Example: Full Login Flow (Undetectable)

API=http://localhost:8080

# 1. Navigate to login page
curl -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "goto", "url": "https://example.com/login"}'

# 2. Find all interactive elements (buttons, inputs, links)
curl -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "get_interactive_elements"}'
# Returns elements with x, y coordinates, text, and CSS selectors

# 3. Click the email field (use coordinates from step 2)
curl -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "system_click", "x": 400, "y": 200}'

# 4. Type email with human-like keystrokes
curl -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "system_type", "text": "user@example.com"}'

# 5. Tab to password field
curl -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "send_key", "key": "tab"}'

# 6. Type password
curl -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "system_type", "text": "secretpassword"}'

# 7. Press Enter to submit
curl -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "send_key", "key": "enter"}'

# 8. Wait for redirect to dashboard
curl -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "wait_for_url", "url": "**/dashboard", "timeout": 15}'

# 9. Verify we're logged in
curl -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "get_text"}'

Every interaction (clicks, typing, key presses) uses OS-level input. The site sees a real human typing at a natural speed with randomized delays. No CDP signals. No automation fingerprints.

Actions Reference

All actions are sent as POST / with JSON body {"action": "name", ...params}.

Navigation

Action	Parameters	What It Does
`goto`	`url`, `wait_until`, `referer`	Navigate to a URL. `wait_until`: `"domcontentloaded"` (default), `"load"`, `"networkidle"`. `referer`: set the HTTP Referer header (useful for sites that check referrer).
`refresh`	`wait_until` (optional)	Reload the current page. Returns URL and title.

System Input (OS-Level, Undetectable — Last Resort)

Action	Parameters	What It Does
`system_click`	`x`, `y`, `duration`	Moves the mouse to viewport coordinates with a human-like curved path (random jitter, eased acceleration), then clicks. Last resort — prefer `click` with a CSS selector. Only use this when the site detects DOM event injection. Requires `calibrate` to have been called first or coordinates will be wrong. `duration` controls movement time (random 0.2-0.6s if omitted).
`mouse_move`	`x`, `y`, `duration`	Moves the mouse with human-like movement but does not click. Use to hover over elements (trigger dropdown menus, tooltips) or simulate natural mouse behavior between actions.
`mouse_click`	`x`, `y` (optional)	Clicks at a position or wherever the mouse currently is. Unlike `system_click`, this does not do the smooth mouse movement first — it's a direct click. Use after `mouse_move` when you want to separate movement and click.
`system_type`	`text`, `interval`	Types text character-by-character via real OS keystrokes. Each key has a randomized delay (jittered around `interval`, default 0.08s) to mimic human typing speed. You must focus an input field first.
`send_key`	`key`	Sends a keyboard key or combo. Examples: `"enter"`, `"tab"`, `"escape"`, `"backspace"`, `"ctrl+a"`, `"ctrl+shift+t"`. Uses PyAutoGUI key names.
`scroll`	`amount`, `x`, `y`	Scrolls using the mouse wheel. Negative = scroll down, positive = scroll up. If `x`, `y` are provided, moves the mouse there first (useful for scrolling inside a specific element).

Playwright Input (DOM Events — Use These First)

Action	Parameters	What It Does
`click`	`selector`	Preferred click method. Clicks an element by CSS selector or XPath (`xpath=//button[@id='submit']`). Fast and reliable. Only fall back to `system_click` if the site actively detects and blocks DOM event injection.
`fill`	`selector`, `value`	Sets an input field's value instantly. Clears existing content first. Fast but doesn't generate individual keystroke events — detectable.
`type`	`selector`, `text`, `delay`	Types into an element character-by-character via Playwright. Middle ground between `fill` (instant) and `system_type` (OS-level). `delay` defaults to 0.05s between keys.

Page Inspection

Action	Parameters	What It Does
`get_interactive_elements`	`visible_only`	Scans the page and returns every interactive element (buttons, links, inputs, selects, textareas) with their viewport coordinates (`x`, `y`), dimensions (`w`, `h`), `text`, CSS `selector`, and `visible` status. This is how you find what to click.
`get_text`	—	Returns all visible text from the page body (truncated to 10,000 chars). Usually the first thing to call after navigating — tells you what's on the page without a screenshot.
`get_html`	—	Returns the full HTML source of the page. Use when `get_text` doesn't give enough structure.
`eval`	`expression`	Executes JavaScript in the page context and returns the result. Example: `"document.title"`, `"document.querySelectorAll('a').length"`.

Wait Conditions

Use these instead of sleep — they wait for actual page state, not arbitrary time.

Action	Parameters	What It Does
`wait_for_element`	`selector`, `state`, `timeout`	Waits for an element to reach a state. `state`: `"visible"` (default), `"hidden"`, `"attached"`, `"detached"`. `timeout` in seconds (default 30). CSS or XPath.
`wait_for_text`	`text`, `timeout`	Waits for specific text to appear anywhere on the page (substring match).
`wait_for_url`	`url`, `timeout`	Waits for the URL to match a glob pattern. `` matches any chars except `/`, `` matches everything. Example: `"*/dashboard"`.
`wait_for_network_idle`	`timeout`	Waits until no network requests have been made for 500ms. Useful for pages that load content dynamically.

Tab Management

Action	Parameters	What It Does
`list_tabs`	—	Returns all open tabs with their index, URL, and which one is active.
`new_tab`	`url`, `wait_until`	Opens a new tab (becomes the active tab). Optionally navigates to a URL.
`switch_tab`	`index`	Switches the active tab by index (0-based). All subsequent actions operate on the active tab.
`close_tab`	`index` (optional)	Closes a tab. If no index, closes the active tab. After closing, the last remaining tab becomes active.

Dialog Handling

Browsers have modal dialogs (alert, confirm, prompt). By default, dialogs are auto-accepted (clicks OK). Use handle_dialog to dismiss or provide prompt text.

Action	Parameters	What It Does
`handle_dialog`	`accept`, `text`	Pre-configures how the next dialog will be handled. `accept`: `true` = click OK, `false` = click Cancel. `text`: response for prompt dialogs. Call this BEFORE the action that triggers the dialog. If you don't, the dialog is auto-accepted (clicks OK). You only need this if you want to dismiss (Cancel) or provide prompt text.
`get_last_dialog`	—	Returns info about the last dialog: `type` (alert/confirm/prompt/beforeunload), `message`, `default_value`, `buttons`.

Cookies

Action	Parameters	What It Does
`get_cookies`	`urls` (optional)	Returns all browser cookies. Optionally filter by URL list. Each cookie includes name, value, domain, path, httpOnly, secure, etc.
`set_cookie`	`name`, `value`, `url`/`domain`, ...	Sets a cookie. Needs at minimum: `name`, `value`, and either `url` or `domain`. Accepts all standard cookie fields (path, httpOnly, secure, sameSite, expires).
`delete_cookies`	—	Clears all cookies from the browser context.

Storage

Access the page's localStorage and sessionStorage. Storage is per-origin — you must be on the right page.

Action	Parameters	What It Does
`get_storage`	`type`	Returns all items as key-value pairs. `type`: `"local"` (default) or `"session"`.
`set_storage`	`type`, `key`, `value`	Sets a single key-value pair.
`clear_storage`	`type`	Clears all items.

Downloads & Uploads

Action	Parameters	What It Does
`get_last_download`	—	Returns info about the most recent file download: `url`, `filename`, and local `path` inside the container. Returns `null` if nothing downloaded yet.
`upload_file`	`selector`, `file_path`	Programmatically sets a file on an `<input type="file">` element without opening the OS file picker. File must exist inside the container (use `docker cp` to copy files in). You still need to submit the form after.

Network Logging

Record all HTTP requests and responses the page makes. Useful for finding API endpoints, debugging, or verifying resources loaded.

Action	Parameters	What It Does
`enable_network_log`	—	Starts recording. Each entry captures: URL, method, resource type (fetch/document/script/image/etc), status code, and timestamp.
`disable_network_log`	—	Stops recording. Already-captured entries remain.
`get_network_log`	—	Returns all captured entries with their count.
`clear_network_log`	—	Deletes captured entries. Keeps logging on if it was on.
`getclear_network_log`	—	Returns all captured entries and clears the log in one call.

Console Logging

Capture console.log, console.error, console.warn, and other console output from the page. Useful for debugging page behavior or extracting data logged by scripts.

Action	Parameters	What It Does
`enable_console_log`	—	Starts capturing console messages. Each entry has: `type` (log/error/warning/info/debug/trace/table/etc), `text`, `location`, `timestamp`.
`disable_console_log`	—	Stops capturing. Already-captured entries remain.
`get_console_log`	—	Returns all captured entries with their count.
`clear_console_log`	—	Deletes captured entries. Keeps capturing on if it was on.
`getclear_console_log`	—	Returns all captured entries and clears the log in one call.

Display & Calibration

Action	Parameters	What It Does
`calibrate`	—	Recalculates the mapping between viewport coordinates (from `get_interactive_elements`) and screen coordinates (what PyAutoGUI uses). The browser window has chrome (title bar, etc.) that offsets the content area. Call this after entering/exiting fullscreen, or if `system_click` seems to be hitting the wrong spot. Auto-calculated at startup.
`get_resolution`	—	Returns the virtual display resolution (width, height).
`enter_fullscreen`	—	Puts the browser in fullscreen mode (hides address bar and window chrome). Call `calibrate` after.
`exit_fullscreen`	—	Exits fullscreen mode. Call `calibrate` after.

Scrolling

Action	Parameters	What It Does
`scroll_to_bottom`	`delay`	Scrolls the entire page top-to-bottom using JavaScript (`window.scrollBy`), then back to top. Useful for triggering lazy-loaded content. `delay` (default 0.4s) is the pause between scroll steps. This is fast but uses JS, not OS-level input.
`scroll_to_bottom_humanized`	`min_clicks`, `max_clicks`, `delay`	Same goal as above but uses real OS-level mouse wheel scrolling (PyAutoGUI) with randomized scroll amounts and jittered delays. Undetectable by behavioral analysis. Slower but stealthy.

Utility

Action	Parameters	What It Does
`run_script`	`steps` or `yaml`, `name`, `on_error`	Execute multiple actions as a single atomic request. `steps`: list of action dicts. `yaml`: inline YAML string (same format as `--script` mode). `on_error`: `"stop"` (default) or `"continue"`. Steps with `output_id` collect results.
`ping`	—	Health check that returns `"pong"` and the current page URL.
`sleep`	`duration`	Pauses for N seconds. Prefer `wait_for_element` or `wait_for_text` when waiting for page content.
`close`	—	Shuts down the browser. The container stops after this.
`save_screenshot`	`output_id`, `path`, `type`, `width`, `height`, `whLargest`	Captures a screenshot. `type`: `"browser"` (default) or `"desktop"`. Optional `path` to also write PNG to disk. In script mode, `output_id` collects the base64 PNG into the outputs dict. Supports resize via `width`/`height`/`whLargest`.

Screenshots

Both screenshot endpoints support resize parameters. The default resolution is 1920x1080 — that's a big image. You almost always want to resize.

# Resize to 512px on longest side (best default — keeps aspect ratio, manageable size)
curl http://localhost:8080/screenshot/browser?whLargest=512 -o screenshot.png

# Resize to 800px wide
curl http://localhost:8080/screenshot/browser?width=800 -o screenshot.png

# Exact 400x400 dimensions
curl http://localhost:8080/screenshot/browser?width=400&height=400 -o screenshot.png

# Full desktop (includes browser chrome, taskbar, etc.)
curl http://localhost:8080/screenshot/desktop?whLargest=512 -o desktop.png

Parameter	What It Does
`whLargest=512`	Scales so the largest dimension is 512px, keeps aspect ratio. Use this by default.
`width=800`	Scales to 800px wide, keeps aspect ratio.
`height=300`	Scales to 300px tall, keeps aspect ratio.
`width=400&height=400`	Forces exact dimensions (may stretch).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP API Reference

Endpoints

Authentication

Cluster Mode Restriction

Request Serialization

Request / Response Format

Example: Full Login Flow (Undetectable)

Actions Reference

Navigation

System Input (OS-Level, Undetectable — Last Resort)

Playwright Input (DOM Events — Use These First)

Page Inspection

Wait Conditions

Tab Management

Dialog Handling

Cookies

Storage

Downloads & Uploads

Network Logging

Console Logging

Display & Calibration

Scrolling

Utility

Screenshots

FilesExpand file tree

api.md

Latest commit

History

api.md

File metadata and controls

HTTP API Reference

Endpoints

Authentication

Cluster Mode Restriction

Request Serialization

Request / Response Format

Example: Full Login Flow (Undetectable)

Actions Reference

Navigation

System Input (OS-Level, Undetectable — Last Resort)

Playwright Input (DOM Events — Use These First)

Page Inspection

Wait Conditions

Tab Management

Dialog Handling

Cookies

Storage

Downloads & Uploads

Network Logging

Console Logging

Display & Calibration

Scrolling

Utility

Screenshots