macos-cua is a macOS-only low-level computer-use runtime for agents.
macos-cua is the most compatible GUI computer-use runtime: if a human can do
it, macos-cua can usually do it too. It is not the optimal choice for
browser workflows or strong-a11y apps.
In a larger CU stack:
- use
macos-cuawhen the task is fundamentally pointer, keyboard, screenshot, clipboard, or window-control work - use Codex Computer Use when the target app exposes a strong enough accessibility tree that element-level actions are cleaner than coordinate loops
- use
bb-browserwhen the task is primarily a website workflow inside a controlled Chrome session
If you are choosing between runtimes, see the runtime selection guide.
Its default path is simple:
- coordinates are absolute
- coordinates are window-first
- stdout is human-readable by default
Build locally:
swift buildRun with either:
swift run macos-cua <command>or:
./.build/debug/macos-cua <command>macos-cua needs two standard macOS permissions:
Accessibilityfor mouse, keyboard, and window actionsScreen Recordingfor screenshots
First check readiness:
swift run macos-cua doctorThen start the built-in onboarding flow:
swift run macos-cua onboardIf you are running in a terminal and want it to wait for you:
swift run macos-cua onboard --wait --timeout 180Daily use should start with absolute coordinates.
- Inspect the current coordinate space:
swift run macos-cua state- Capture the frontmost window:
swift run macos-cua screenshot /tmp/frontmost.png- Or Capture a region in the active coordinate space:
swift run macos-cua screenshot --region 100 100 300 200 /tmp/region.png- Move and click with absolute coordinates:
swift run macos-cua move 800 400 --precise
swift run macos-cua click 800 400 --fast- Use screen-global coordinates only when needed:
swift run macos-cua screenshot --screen /tmp/screen.png
swift run macos-cua move 800 400 --screen --precise
swift run macos-cua click 800 400 --screen --fastKeep the command surface small. Outside the screenshot-and-action loop, the main helpers are:
clipboard get|set|copy|pastefor moving text across appsopen-url <url>for direct URL handoff, while browser workflows should still preferbb-browserapp list|activatefor bringing the target app frontmostwindow list|activatefor picking the right window before resuming visual actions
Examples:
swift run macos-cua clipboard set "https://example.com"
swift run macos-cua open-url https://example.com
swift run macos-cua app activate Safari
swift run macos-cua window listWhen an agent needs multiple stateful steps, prefer a single shell command with
&& so each step runs only after the previous one exits successfully.
This avoids race conditions between separate macos-cua processes, especially
for flows such as click -> type -> submit.
Prefer:
swift run macos-cua click 120 73 --precise \
&& swift run macos-cua type "Slack" \
&& swift run macos-cua wait 150 \
&& swift run macos-cua keypress returnAvoid launching independent macos-cua commands in parallel when order matters.
For example, do not fire type and keypress return at the same time.
When a page is visually dense and the target is a small icon, URL, or toolbar
item, treat screenshot --region as the fallback inspection step instead of
guessing a final click from the full screenshot alone.
Recommended pattern:
- Capture the full frontmost window for global context.
- Take a second local crop tightly around the likely target area.
- Re-read the local crop, then issue the final click.
Example:
swift run macos-cua screenshot /tmp/frontmost.png
swift run macos-cua screenshot --region 720 88 220 96 /tmp/toolbar-crop.png
swift run macos-cua click 812 132 --fastUse absolute coordinates when your screenshot can be consumed at full useful resolution by the model. If the image must be resized or compressed before inference, switch to the relative-mode workflow in the docs.
As of 2026-04-18, public vendor docs indicate the following:
| Model | Documented vision resolution support |
|---|---|
gpt-5.4 |
Up to 6000 px max dimension with detail: "original"; 2048 px max dimension with detail: "high". |
gpt-5.4-mini |
Up to 2048 px max dimension with detail: "high". |
| Claude Opus 4.7 | Up to 2576 px on the long edge. |
| Claude Sonnet 4.5 / 4.6 and other current Claude vision models | Up to 1568 px on the long edge. See the relative-mode workflow. |
Older or weaker vision models are not recommended for coordinate-driven desktop control.
Advanced workflows and reference material live under docs/README.md.