diff --git a/README.md b/README.md index 5dd1ac4..f09898c 100644 --- a/README.md +++ b/README.md @@ -29,6 +29,7 @@ Other included sample [computer environments](#computer-environments): - [Docker](https://docker.com/) (containerized desktop) - [Browserbase](https://www.browserbase.com/) (remote browser, requires account) - [Scrapybara](https://scrapybara.com) (remote browser or computer, requires account) +- [cua-computer](https://github.com/trycua/cua/tree/main/libs/computer) (virtual machines using lume virtualization) - ...or implement your own `Computer`! ## Overview @@ -58,10 +59,17 @@ The CLI (`cli.py`) is the easiest way to get started with CUA. It accepts the fo ### Run examples (optional) -The `examples` folder contains more examples of how to use CUA. +The `examples` folder contains more examples of how to use CUA with different environments: ```shell +# General weather example using Scrapybara python -m examples.weather_example + +# Example with function calling +python -m examples.function_calling_example + +# Example for macOS Finder +python -m examples.macos_finder_example # Work with Finder on macOS ``` For reference, the file `simple_cua_loop.py` implements the basics of the CUA loop. @@ -89,13 +97,13 @@ CUA can work with any `Computer` environment that can handle the [CUA actions](h This sample app provides a set of implemented `Computer` examples, but feel free to add your own! -| Computer | Option | Type | Description | Requirements | -| ------------------- | ------------------ | --------- | --------------------------------- | ---------------------------------------------------------------- | -| `LocalPlaywright` | local-playwright | `browser` | Local browser window | [Playwright SDK](https://playwright.dev/) | -| `Docker` | docker | `linux` | Docker container environment | [Docker](https://docs.docker.com/engine/install/) running | -| `Browserbase` | browserbase | `browser` | Remote browser environment | [Browserbase](https://www.browserbase.com/) API key in `.env` | -| `ScrapybaraBrowser` | scrapybara-browser | `browser` | Remote browser environment | [Scrapybara](https://scrapybara.com/dashboard) API key in `.env` | -| `ScrapybaraUbuntu` | scrapybara-ubuntu | `linux` | Remote Ubuntu desktop environment | [Scrapybara](https://scrapybara.com/dashboard) API key in `.env` | +| Computer | Option | Type | Description | Requirements | +| ------------------- | ------------------ | -------------------- | --------------------------------- | ---------------------------------------------------------------- | +| `LocalPlaywright` | local-playwright | `browser` | Local browser window | [Playwright SDK](https://playwright.dev/) | +| `Docker` | docker | `linux` | Docker container environment | [Docker](https://docs.docker.com/engine/install/) running | +| `Browserbase` | browserbase | `browser` | Remote browser environment | [Browserbase](https://www.browserbase.com/) API key in `.env` | +| `ScrapybaraBrowser` | scrapybara-browser | `browser` | Remote browser environment | [Scrapybara](https://scrapybara.com/dashboard) API key in `.env` | +| `ScrapybaraUbuntu` | scrapybara-ubuntu | `linux` | Remote Ubuntu desktop environment | [Scrapybara](https://scrapybara.com/dashboard) API key in `.env` | Using the CLI, you can run the sample app with different computer environments using the options listed above: @@ -113,7 +121,7 @@ python cli.py --show --computer docker | Computer | Option | Type | Description | Requirements | | -------- | ------ | ---- | ----------- | ------------ | -| `tbd` | tbd | tbd | tbd | tbd | +| `CuaMacOSComputer` | cua-macos | `mac` | macOS VM with lume virtualization | [cua-computer](https://github.com/trycua/cua/tree/main/libs/computer) package and [lume CLI](https://github.com/trycua/cua/tree/main/libs/lume) | > [!NOTE] > If you've implemented a new computer, please add it to the "Contributed Computers" section of the README.md file. Clearly indicate any auth / signup requirements. See the [Contributing](#contributing) section for more details. @@ -145,6 +153,36 @@ docker run --rm -it --name cua-sample-app -p 5900:5900 --dns=1.1.1.3 -e DISPLAY= > docker rm -f cua-sample-app > ``` +### Cua MacOS Setup + +To use the `cua-macos` computer environment, you need to install the `cua-computer` package and the `lume` CLI: + +1. **Install cua-computer package**: + ```shell + pip install cua-computer + ``` + +2. **Install lume CLI**: + ```shell + sudo /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)" + ``` + +3. **Start the lume daemon**: + ```shell + lume serve + ``` + +4. **Pull the macOS VM image**: + ```shell + lume pull macos-sequoia-cua:latest --no-cache + ``` + +> [!NOTE] +> - Initial download requires 80GB of free space +> - After first run, space usage reduces to ~30GB due to macOS's sparse file system +> - VMs are stored in `~/.lume` +> - Cached images are stored in `~/.lume/cache` + ### Hosted environment setup This repository contains example implementations of third-party hosted environments. diff --git a/cli.py b/cli.py index a96595e..79e574a 100644 --- a/cli.py +++ b/cli.py @@ -79,4 +79,4 @@ def main(): if __name__ == "__main__": - main() + main() \ No newline at end of file diff --git a/computers/config.py b/computers/config.py index 699f1a8..2f05739 100644 --- a/computers/config.py +++ b/computers/config.py @@ -7,4 +7,5 @@ "browserbase": BrowserbaseBrowser, "scrapybara-browser": ScrapybaraBrowser, "scrapybara-ubuntu": ScrapybaraUbuntu, + "cua-macos": CuaMacOSComputer, } diff --git a/computers/contrib/__init__.py b/computers/contrib/__init__.py index e69de29..d783750 100644 --- a/computers/contrib/__init__.py +++ b/computers/contrib/__init__.py @@ -0,0 +1 @@ +from .cua import CuaMacOSComputer \ No newline at end of file diff --git a/computers/contrib/cua.py b/computers/contrib/cua.py new file mode 100644 index 0000000..042d17e --- /dev/null +++ b/computers/contrib/cua.py @@ -0,0 +1,210 @@ +import asyncio +import base64 +import os +import time +from typing import Dict, List, Optional, Tuple, Literal + +try: + from computer import Computer as CuaComputer +except ImportError: + raise ImportError("The cua-computer package is required. Install it with 'pip install cua-computer'") + +class CuaComputerAdapter: + """Adapter class to convert between sync and async methods for cua-computer.""" + + def __init__(self, computer): + self.computer = computer + self.loop = asyncio.get_event_loop() + + def _run_async(self, coro): + """Run an async coroutine in a synchronous context.""" + return self.loop.run_until_complete(coro) + + def screenshot(self): + """Take a screenshot of the VM.""" + screenshot_bytes = self._run_async(self.computer.interface.screenshot()) + return base64.b64encode(screenshot_bytes).decode('utf-8') + + def click(self, x: int, y: int, button: str = "left"): + """Click at the specified coordinates.""" + self._run_async(self.computer.interface.move_cursor(x, y)) + if button == "right": + self._run_async(self.computer.interface.right_click()) + else: + self._run_async(self.computer.interface.left_click()) + + def double_click(self, x: int, y: int): + """Double click at the specified coordinates.""" + self._run_async(self.computer.interface.move_cursor(x, y)) + self._run_async(self.computer.interface.left_click()) + time.sleep(0.1) + self._run_async(self.computer.interface.left_click()) + + def scroll(self, x: int, y: int, scroll_x: int, scroll_y: int): + """Scroll at the specified coordinates.""" + self._run_async(self.computer.interface.move_cursor(x, y)) + self._run_async(self.computer.interface.scroll(scroll_y // 50)) + + def type(self, text: str): + """Type the specified text.""" + self._run_async(self.computer.interface.type_text(text)) + + def wait(self, ms: int = 1000): + """Wait for the specified number of milliseconds.""" + time.sleep(ms / 1000) + + def move(self, x: int, y: int): + """Move the cursor to the specified coordinates.""" + self._run_async(self.computer.interface.move_cursor(x, y)) + + def keypress(self, keys: List[str]): + """Press the specified keys.""" + if len(keys) > 1: + self._run_async(self.computer.interface.hotkey(*keys)) + else: + for key in keys: + # Map common key names to CUA equivalents + if key.lower() == "enter": + self._run_async(self.computer.interface.press_key("return")) + elif key.lower() == "space": + self._run_async(self.computer.interface.press_key("space")) + else: + self._run_async(self.computer.interface.press_key(key)) + + def drag(self, path: List[Dict[str, int]]): + """Drag from the start point to the end point.""" + if len(path) < 2: + return + + # Move to start position + start = path[0] + self._run_async(self.computer.interface.move_cursor(start[0], start[1])) + + # Start dragging + self._run_async(self.computer.interface.mouse_down()) + + # Move through each point in the path + for point in path[1:]: + self._run_async(self.computer.interface.move_cursor(point[0], point[1])) + time.sleep(0.05) # Small delay between movements + + # Release at final position + self._run_async(self.computer.interface.mouse_up()) + + def get_current_url(self) -> str: + """Get the current URL (only applicable for browser environments).""" + # Not directly available in cua-computer, but could be implemented + # in a more sophisticated way if needed + return "" + + +class CuaBaseComputer: + """Base implementation of the Computer protocol using cua-computer and lume virtualization.""" + + def __init__( + self, + display: str = "1024x768", + memory: str = "4GB", + cpu: str = "2", + os: str = "macos", + image: str = None + ): + self.display = display + self.memory = memory + self.cpu = cpu + self.os = os + self.image = image + self.computer = None + self.adapter = None + self._width, self._height = map(int, display.split('x')) + + @property + def dimensions(self) -> Tuple[int, int]: + return (self._width, self._height) + + def __enter__(self): + # Create and run the cua-computer instance + self.computer = CuaComputer( + display=self.display, + memory=self.memory, + cpu=self.cpu, + os=self.os, + image=self.image + ) + + # Run the VM + asyncio.get_event_loop().run_until_complete(self.computer.run()) + + # Create the adapter for sync operations + self.adapter = CuaComputerAdapter(self.computer) + + return self + + def __exit__(self, exc_type, exc_val, exc_tb): + # Stop the VM when we're done + if self.computer: + asyncio.get_event_loop().run_until_complete(self.computer.stop()) + + # Delegate all the Computer protocol methods to the adapter + def screenshot(self) -> str: + return self.adapter.screenshot() + + def click(self, x: int, y: int, button: str = "left") -> None: + self.adapter.click(x, y, button) + + def double_click(self, x: int, y: int) -> None: + self.adapter.double_click(x, y) + + def scroll(self, x: int, y: int, scroll_x: int, scroll_y: int) -> None: + self.adapter.scroll(x, y, scroll_x, scroll_y) + + def type(self, text: str) -> None: + self.adapter.type(text) + + def wait(self, ms: int = 1000) -> None: + self.adapter.wait(ms) + + def move(self, x: int, y: int) -> None: + self.adapter.move(x, y) + + def keypress(self, keys: List[str]) -> None: + self.adapter.keypress(keys) + + def drag(self, path: List[Dict[str, int]]) -> None: + self.adapter.drag(path) + + def get_current_url(self) -> str: + return self.adapter.get_current_url() + + # Additional methods that could be useful for function calling + def goto(self, url: str) -> None: + """Navigate to a specific URL (emulating browser functionality).""" + # This would require launching a browser and typing the URL + self.adapter.type(url) + self.adapter.keypress(["Enter"]) + + +class CuaMacOSComputer(CuaBaseComputer): + """Implementation of the Computer protocol using cua-computer and lume virtualization for macOS.""" + + def __init__( + self, + display: str = "1024x768", + memory: str = "4GB", + cpu: str = "2" + ): + super().__init__( + display=display, + memory=memory, + cpu=cpu, + os="macos", + image="macos-sequoia-cua:latest" + ) + + @property + def environment(self) -> Literal["windows", "mac", "linux", "browser"]: + return "mac" + + def back(self) -> None: + """Go back (browser functionality) on macOS.""" + self.adapter.keypress(["Command", "Left"]) diff --git a/examples/macos_finder_example.py b/examples/macos_finder_example.py new file mode 100644 index 0000000..0b6aeaa --- /dev/null +++ b/examples/macos_finder_example.py @@ -0,0 +1,53 @@ +from agent import Agent +from computers import CuaMacOSComputer + +def acknowledge_safety_check_callback(message: str) -> bool: + """Callback function to handle safety check warnings.""" + print(f"Safety Check Warning: {message}") + response = input("Do you want to acknowledge and proceed? (y/n): ").lower() + return response == "y" + +def main(): + """Example of using CuaMacOSComputer to interact with Finder and other macOS apps.""" + print("Starting macOS environment...") + print("Task: Open Finder, create a new folder, and take a screenshot") + print("This may take a minute to initialize the VM...") + + with CuaMacOSComputer() as computer: + # Create the agent with our computer and safety callback + agent = Agent( + computer=computer, + acknowledge_safety_check_callback=acknowledge_safety_check_callback + ) + + # Define the task: interact with macOS Finder + task = """ + Follow these steps on macOS: + 1. Open Finder + 2. Create a new folder on the Desktop named "CUA Demo" + 3. Open the folder + 4. Open TextEdit and save a file in that folder + 5. Take a screenshot with the keyboard shortcut Command+Shift+3 + """ + + # Create the input items with our task + input_items = [{"role": "user", "content": task}] + + # Run the agent and get the response items + print("\nExecuting macOS task...") + response_items = agent.run_full_turn( + input_items, + debug=True, + show_images=True + ) + + # Print the final response + if response_items and response_items[-1].get("role") == "assistant": + print("\nTask completed!") + print("Assistant's final response:") + print(response_items[-1]["content"][0]["text"]) + else: + print("\nNo final response from assistant.") + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/requirements.txt b/requirements.txt index 13769fb..a406b5e 100644 --- a/requirements.txt +++ b/requirements.txt @@ -3,6 +3,7 @@ anyio==4.8.0 browserbase==1.2.0 certifi==2025.1.31 charset-normalizer==3.4.1 +cua-computer>=0.1.0 distro==1.9.0 greenlet==3.1.1 h11==0.14.0