diff --git a/cookbook/_toc.yml b/cookbook/_toc.yml index ad9e6ee59..88ee72d63 100644 --- a/cookbook/_toc.yml +++ b/cookbook/_toc.yml @@ -13,6 +13,8 @@ parts: sections: - file: en/sandbox/advanced.md - file: en/sandbox/training_sandbox.md + - file: en/sandbox/cloud_api_sandbox.md + - file: en/sandbox/e2b_sandbox.md - file: en/sandbox/troubleshooting.md - file: en/tools/tools.md sections: @@ -76,6 +78,8 @@ parts: sections: - file: zh/sandbox/advanced.md - file: zh/sandbox/training_sandbox.md + - file: zh/sandbox/cloud_api_sandbox.md + - file: zh/sandbox/e2b_sandbox.md - file: zh/sandbox/troubleshooting.md - file: zh/tools/tools.md sections: diff --git a/cookbook/en/api/sandbox.md b/cookbook/en/api/sandbox.md index 44233fc2d..1a4d032f5 100644 --- a/cookbook/en/api/sandbox.md +++ b/cookbook/en/api/sandbox.md @@ -78,6 +78,34 @@ The main classes that users typically interact with are directly importable from :no-index: ``` + +### CloudPhoneSandbox +```{eval-rst} +.. autoclass:: agentscope_runtime.sandbox.CloudPhoneSandbox + :members: + :undoc-members: + :show-inheritance: + :no-index: +``` + +### CloudComputerSandbox +```{eval-rst} +.. autoclass:: agentscope_runtime.sandbox.CloudComputerSandbox + :members: + :undoc-members: + :show-inheritance: + :no-index: +``` + +### E2bSandBox +```{eval-rst} +.. autoclass:: agentscope_runtime.sandbox.E2bSandBox + :members: + :undoc-members: + :show-inheritance: + :no-index: +``` + ## Custom & Example ```{eval-rst} .. automodule:: agentscope_runtime.sandbox.custom.custom_sandbox diff --git a/cookbook/en/sandbox/cloud_api_sandbox.md b/cookbook/en/sandbox/cloud_api_sandbox.md new file mode 100644 index 000000000..34386fa9d --- /dev/null +++ b/cookbook/en/sandbox/cloud_api_sandbox.md @@ -0,0 +1,405 @@ +--- +jupytext: + formats: md:myst + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.11.5 +kernelspec: + display_name: Python 3.10 + language: python + name: python3 +--- + +# Cloud Computer & Cloud Phone API Sandbox + +## Overview + +Cloud Computer and Cloud Phone Sandbox are GUI sandbox environments built on Alibaba Cloud's Wuying Cloud Desktop and Wuying Cloud Phone API services, allowing users to remotely control Windows desktop or Android phone environments in the cloud. + +## Features + +### Cloud Computer Sandbox + +- **Environment Type**: Windows desktop environment +- **Provider**: Alibaba Cloud Wuying Cloud Desktop +- **Security Level**: High +- **Access Method**: Wuying Cloud Desktop Enterprise Edition OpenAPI Python SDK call at https://api.aliyun.com/document/ecd/2020-09-30/overview + +### Cloud Phone Sandbox + +- **Environment Type**: Android phone environment +- **Provider**: Alibaba Cloud Wuying Cloud Phone +- **Security Level**: High +- **Access Method**: Wuying Cloud Phone OpenAPI Python SDK call at https://api.aliyun.com/document/eds-aic/2023-09-30/overview + +## Supported Operations + +### Tools Supported by Cloud Computer + +Note: Since the current implementation of cloud computer tools depends on Python 3.10 or higher environment, please ensure that your cloud computer environment has installed Python 3.10 or higher version, as well as basic dependency packages and custom dependencies. +The temporary storage directory for screenshot tools on cloud computers is under the C drive, so make sure this disk exists. + +#### Command Line Tools +- `run_shell_command`: Run commands in PowerShell +- `run_ipython_cell`: Execute Python code +- `write_file`: Write files +- `read_file`: Read files +- `remove_file`: Delete files + +#### Input Simulation Tools +- `press_key`: Press keys +- `click`: Click screen coordinates +- `right_click`: Right-click +- `click_and_type`: Click and input text +- `append_text`: Append text at specified position +- `mouse_move`: Mouse movement +- `scroll`: Scroll +- `scroll_pos`: Scroll at specified position + +#### System Control Tools +- `screenshot`: Screenshot +- `go_home`: Return to desktop +- `launch_app`: Launch applications + +### Tools Supported by Cloud Phone + +Note: The current text input tool is implemented through ADBKeyboard input method combined with clipboard, so please ensure that your cloud phone has installed the ADBKeyboard.apk input method. + +#### Command Line Tools +- `run_shell_command`: Run ADB Shell commands + +#### Input Simulation Tools +- `click`: Click screen coordinates +- `type_text`: Input text +- `slide`: Slide screen + +#### Navigation Control Tools +- `go_home`: Return to home screen +- `back`: Back button +- `menu`: Menu button +- `enter`: Enter key +- `kill_front_app`: Kill foreground application + +#### System Tools +- `screenshot`: Screenshot +- `send_file`: Send file to cloud phone +- `remove_file`: Delete files on cloud phone + +#### Page Interaction +Unlike agentbay which does not have related OpenAPI to query remote page links, interaction pages can be used with Wuying client, or refer to Wuying WEBsdk to build a front-end HTML page for page interaction. + +WEBsdk: https://wuying.aliyun.com/wuyingWebSdk/docs/intro/quick-start + +## Integration of Cloud Computer & Cloud Phone API Sandbox into Agentscope-Runtime: + +Currently, Agentscope-Runtime's sandbox containers are based on Docker implementation, while cloud containers are based on Kubernetes implementation. Integrating Cloud Computer & Cloud Phone API into AgentScope-Runtime provides another choice of cloud sandbox environments for users of Agentscope-Runtime. Users can choose to use Wuying Cloud API sandbox instead of Docker container sandbox. + +### Core Idea: + +The core idea is to encapsulate Wuying Cloud Computer & Cloud Phone API into Cloud API Sandbox and integrate it into AgentScope-Runtime as another cloud sandbox option. Since Cloud API Sandbox does not depend on containers, we create a CloudSandbox base class that inherits from Sandbox class. This enables Agentscope-Runtime to support both traditional container sandboxes and cloud-native sandboxes, maintaining consistency with traditional container sandboxes as much as possible. + +### 1. Core Architecture Integration + +- **New Sandbox Types**: `SandboxType.CLOUD_COMPUTER`, `SandboxType.CLOUD_PHONE` enumerations for creating Cloud API Sandbox, supporting dynamic enumeration extension; +- **CloudSandbox Base Class**: Abstract base class providing unified interface for cloud service sandbox, not dependent on container management, communicating directly through cloud APIs, supporting expansion for different cloud providers; +- **CloudComputerSandbox Implementation**: Inherits from CloudSandbox, accesses cloud sandbox directly through WuYing Cloud Computer API, implementing complete tool mapping and error handling; +- **CloudPhoneSandbox Implementation**: Inherits from CloudSandbox, accesses cloud sandbox directly through WuYing Cloud Phone API, implementing complete tool mapping and error handling; +- **SandboxService Support**: Maintaining compatibility with existing `sandbox_service` calling methods, specially handling Cloud API sandbox types, resource cleanup; + +### 2. Class Hierarchy Structure + +``` +Sandbox (Base Class) +└── CloudSandbox (Cloud Sandbox Base Class) + ├── CloudComputerSandbox (Cloud Computer Implementation) + └── CloudPhoneSandbox (Cloud Phone Implementation) +``` + + +### 3. File Structure + +``` +src/agentscope_runtime/sandbox/ +├── enums.py # Added AGENTBAY enumeration +├── box/ +│ ├── cloud/ +│ │ ├── __init__.py # Added +│ │ └── cloud_sandbox.py # Added CloudSandbox base class +│ └── cloud_api/ +│ ├── __init__.py # Added +│ └── cloud_computer_sandbox.py # Added CloudComputerSandbox implementation +│ └── cloud_phone_sandbox.py # Added CloudPhoneSandbox implementation +└── __init__.py # Updated exports +``` + + +### 4. Service Layer Integration + +- **Registration Mechanism**: Using `@SandboxRegistry.register` decorator for registration +- **Service Integration**: Special handling of `CLOUD_COMPUTER`, `CLOUD_PHONE` types in `SandboxService` +- **Compatibility**: Maintaining full compatibility with existing sandbox interfaces +- **Lifecycle Management**: Supporting creation, connection, and release of cloud resources + +## How to Use + +### 1. Setting Environment Variables + +##### 1.1.1 Obtain Alibaba Cloud Account AK, SK +Documentation: +https://help.aliyun.com/document_detail/53045.html?spm=5176.21213303.aillm.3.7df92f3d4XzQHZ&scm=20140722.S_%E9%98%BF%E9%87%8C%E4%BA%91sk._.RL_%E9%98%BF%E9%87%8C%E4%BA%91sk-LOC_aillm-OR_chat-V_3-RC_llm + +##### 1.1.2 Activate OSS +Documentation: +https://help.aliyun.com/zh/oss/?spm=5176.29463013.J_AHgvE-XDhTWrtotIBlDQQ.8.68b834deqSKlrh + +Note: After purchase, configure account credential information to the following environment variables. The EDS_OSS_ configuration means that EDS_OSS_ACCESS_KEY related information is the ak, sk of the Alibaba Cloud account that purchased OSS. + +##### 1.1.3 Activate Wuying Cloud Desktop +Purchase cloud desktop, enterprise edition recommended (personal edition requires EndUserId from Wuying for configuring environment variable ECD_USERNAME). Currently only supports Windows. + +Personal edition documentation: +https://help.aliyun.com/zh/edsp?spm=a2c4g.11174283.d_help_search.i2 +Enterprise edition documentation: +https://help.aliyun.com/zh/wuying-workspace/product-overview/?spm=a2c4g.11186623.help-menu-68242.d_0.518d5bd7bpQxLq + +After purchase, configure the required cloud desktop information into the following environment variables, namely the ECD_ configuration. ALIBABA_CLOUD_ACCESS_KEY related information is the ak, sk of the Alibaba Cloud account that purchased the cloud desktop. + +##### 1.1.4 Activate Wuying Cloud Phone +Currently only supports Android system. + +Console: +https://wya.wuying.aliyun.com/instanceLayouts +Help documentation: +https://help.aliyun.com/zh/ecp/?spm=a2c4g.11186623.0.0.62dfe33avAMTwU + +After purchase, configure the required cloud desktop information into the following environment variables, namely the EDS_ configuration. ALIBABA_CLOUD_ACCESS_KEY related information is the ak, sk of the Alibaba Cloud account that purchased the cloud phone. + +Edit the .env.template file in the current directory or set environment variables: + +```bash +# Cloud computer related environment variables +# Console authorized username +export ECD_USERNAME='' +export ECD_APP_STREAM_REGION_ID='cn-shanghai' +export DESKTOP_ID='' +export ECD_ALIBABA_CLOUD_REGION_ID='cn-hangzhou' +export ECD_ALIBABA_CLOUD_ENDPOINT='ecd.cn-hangzhou.aliyuncs.com' +export ECD_ALIBABA_CLOUD_ACCESS_KEY_ID='' +export ECD_ALIBABA_CLOUD_ACCESS_KEY_SECRET='' + +# Cloud phone related environment variables +export PHONE_INSTANCE_ID='' # Cloud phone instance ID +export EDS_ALIBABA_CLOUD_ENDPOINT='eds-aic.cn-shanghai.aliyuncs.com' +export EDS_ALIBABA_CLOUD_ACCESS_KEY_ID='' +export EDS_ALIBABA_CLOUD_ACCESS_KEY_SECRET='' + +# OSS storage related environment variables +export EDS_OSS_ACCESS_KEY_ID='' +export EDS_OSS_ACCESS_KEY_SECRET='' +export EDS_OSS_BUCKET_NAME='' +export EDS_OSS_ENDPOINT='' +export EDS_OSS_PATH='' + +# Docker runtime environment $home replaced with user home directory, no need to configure when using cloud sandbox directly +export DOCKER_HOST='unix:///$home/.colima/default/docker.sock' +``` + + +Dependency installation: + +```bash +# Install core dependencies +pip install agentscope-runtime + +# Install extensions +pip install "agentscope-runtime[ext]" +``` + + +### 2. Cloud Computer Python Dependency Installation + +All the following commands are executed in PowerShell on the cloud computer, which can be accessed by downloading the Wuying client and logging into the computer: + +```powershell +# Set download path and version +$version = "3.10.11" +$installerName = "python-$version-amd64.exe" +$downloadUrl = "https://mirrors.aliyun.com/python-release/windows/$installerName" +$pythonInstaller = "$env:TEMP\$installerName" + +# Default installation path (Python 3.10 installed to Program Files) +$installDir = "C:\Program Files\Python310" +$scriptsDir = "$installDir\Scripts" + +# Download Python installer (using Alibaba Cloud mirror) +Write-Host "Downloading $installerName from Alibaba Cloud..." -ForegroundColor Green +Invoke-WebRequest -Uri $downloadUrl -OutFile $pythonInstaller + +# Silent installation of Python (all users + attempt to add PATH) +Write-Host "Installing Python $version ..." -ForegroundColor Green +Start-Process -Wait -FilePath $pythonInstaller -ArgumentList "/quiet InstallAllUsers=1 PrependPath=0" # We add PATH ourselves, so disable built-in one + +# Delete installer package +Remove-Item -Force $pythonInstaller + +# ========== Manually add Python to system PATH ========== +Write-Host "Adding Python to system environment variable PATH..." -ForegroundColor Green + +# Get current system PATH (Machine level) +$currentPath = [Environment]::GetEnvironmentVariable("Path", "Machine") -split ";" + +# Paths to add +$pathsToAdd = @($installDir, $scriptsDir) + +# Check and add +$updated = $false +foreach ($path in $pathsToAdd) { + if (-not $currentPath.Contains($path) -and (Test-Path $path)) { + $currentPath += $path + $updated = $true + Write-Host "Added: $path" -ForegroundColor Cyan + } +} + +# Write back to system PATH +if ($updated) { + $newPath = $currentPath -join ";" + [Environment]::SetEnvironmentVariable("Path", $newPath, "Machine") + Write-Host "System PATH updated." -ForegroundColor Green +} else { + Write-Host "Python path already exists in system PATH." -ForegroundColor Yellow +} + +# ========== Update current PowerShell session PATH ========== +# Otherwise, python command won't work in current terminal +$env:Path = [Environment]::GetEnvironmentVariable("Path", "Machine") + ";" + [Environment]::GetEnvironmentVariable("Path", "User") + +# ========== Check if installation was successful ========== +Write-Host "`nChecking installation results:" -ForegroundColor Green +try { + python --version +} catch { + Write-Host "python command unavailable, please restart terminal." -ForegroundColor Red +} + +try { + pip --version +} catch { + Write-Host "pip command unavailable, please restart terminal." -ForegroundColor Red +} + +# Install dependency packages +python -m pip install pyautogui -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install requests -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install pyperclip -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install pynput -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install aiohttp -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install asyncio -i https://mirrors.aliyun.com/pypi/simple/ +``` + + +### 3. Direct Usage of Cloud Computer Sandbox + +Note: You need to create cloud desktop and cloud phone instances in the Alibaba Cloud console first. + +```python +from agentscope_runtime.sandbox import CloudComputerSandbox + +sandbox = CloudComputerSandbox( + desktop_id="your_desktop_id" +) + +# Run PowerShell command +result = sandbox.call_tool("run_shell_command", {"command": "echo Hello World"}) +print(result["output"]) + +# Screenshot +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_name": "screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` + + +### 4. Direct Usage of Cloud Phone Sandbox + +```python +from agentscope_runtime.sandbox import CloudPhoneSandbox + +sandbox = CloudPhoneSandbox( + instance_id="your_instance_id" +) + +# Click screen coordinates +result = sandbox.call_tool( + "click", + { + "x1": 151, + "y1": 404, + "x2": 151, + "y2": 404, + "width": 716, + "height": 1280 + } + ) + +# Screenshot +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_name": "screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` + + +### 5. Usage via SandboxService + +```python +from agentscope_runtime.sandbox.enums import SandboxType +from agentscope_runtime.engine.services.sandbox import SandboxService + +sandbox_service = SandboxService() +sandboxes = sandbox_service.connect( + session_id="session1", + user_id="user1", + sandbox_types=[SandboxType.CLOUD_COMPUTER, SandboxType.CLOUD_PHONE] +) +``` + + +## Configuration Parameters + +### Cloud Computer Sandbox Configuration + +| Parameter | Type | Description | +|-----------|------|-------------| +| desktop_id | str | Cloud desktop ID | +| timeout | int | Operation timeout (seconds), default 600 | +| auto_wakeup | bool | Whether to automatically wake up cloud computer, default True | +| screenshot_dir | str | Screenshot save directory | +| command_timeout | int | Command execution timeout (seconds), default 60 | + +### Cloud Phone Sandbox Configuration + +| Parameter | Type | Description | +|-----------|------|-------------| +| instance_id | str | Cloud phone instance ID | +| timeout | int | Operation timeout (seconds), default 600 | +| auto_start | bool | Whether to automatically start cloud phone, default True | + +## Notes + +1. Ensure that Wuying Cloud Desktop/Cloud Phone service has been activated on Alibaba Cloud before use +2. Need to correctly configure corresponding environment variables +3. Cloud computer and cloud phone will incur corresponding resource costs +4. Some operations may require specific software or drivers to be installed in the target environment to function properly + +## Running Demo + +```bash +# Sandbox demo +python examples/cloud_api_sandbox/cloud_api_sandbox_demo.py +``` diff --git a/cookbook/en/sandbox/e2b_sandbox.md b/cookbook/en/sandbox/e2b_sandbox.md new file mode 100644 index 000000000..17b39010b --- /dev/null +++ b/cookbook/en/sandbox/e2b_sandbox.md @@ -0,0 +1,186 @@ +--- +jupytext: + formats: md:myst + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.11.5 +kernelspec: + display_name: Python 3.10 + language: python + name: python3 +--- + +# E2B Desktop Sandbox + +## Overview + +E2bSandBox is a GUI sandbox environment built on the E2B cloud desktop service that allows users to remotely control desktop environments in the cloud. + +## Features + +### E2B Desktop Sandbox (E2bSandBox) + +- **Environment Type**: Cloud desktop environment +- **Provider**: E2B Desktop +- **Security Level**: High +- **Access Method**: E2B Desktop Python SDK invocation + +## Supported Operations + +### Desktop Control Tools + +- click: Click screen coordinates +- right_click: Right-click +- type_text: Input text +- press_key: Press key +- launch_app: Launch application +- click_and_type: Click and input text + +### Command Line Tools + +- run_shell_command: Run shell commands + +### System Tools + +- screenshot: Take screenshots + +## Integration with Agentscope-Runtime + +The E2B Desktop Sandbox has been integrated into Agentscope-Runtime, providing a similar user experience to Docker sandboxes. + +## E2B Sandbox Integration into Agentscope-Runtime: + +Currently, Agentscope-Runtime's sandbox containers are implemented based on Docker, and cloud containers are implemented based on Kubernetes. Integrating E2B Sandbox into AgentScope-Runtime provides users with another choice for cloud sandbox environments, allowing them to choose between Docker container sandboxes and E2B sandboxes. + +### Core Concept: + +The core idea is to encapsulate the E2B Sandbox as a Sandbox integration into AgentScope-Runtime, serving as another cloud sandbox option. Since E2B Sandbox does not depend on containers, we create the [CloudSandbox](file:///Users/zlh/PycharmProjects/1/agentscope-runtime/src/agentscope_runtime/sandbox/box/cloud/cloud_sandbox.py#L18-L253) base class inheriting from the [Sandbox](file:///Users/zlh/PycharmProjects/1/agentscope-runtime/src/agentscope_runtime/sandbox/box/sandbox.py#L14-L170) class. This enables Agentscope-Runtime to support both traditional container sandboxes and cloud-native sandboxes, maintaining consistency with traditional container sandboxes in usage. + +### 1. Core Architecture Integration + +- **New Sandbox Type**: [SandboxType.E2B](file:///Users/zlh/PycharmProjects/1/agentscope-runtime/src/agentscope_runtime/sandbox/enums.py#L74-L74) enumeration for creating E2B Sandboxes, supporting dynamic enumeration extension +- **CloudSandbox Base Class**: Abstract base class providing unified interface for cloud service sandboxes, not dependent on container management, communicating directly through cloud APIs, extensible to different cloud providers +- **E2bSandBox Implementation**: Inherits from [CloudSandbox](file:///Users/zlh/PycharmProjects/1/agentscope-runtime/src/agentscope_runtime/sandbox/box/cloud/cloud_sandbox.py#L18-L253), accesses cloud sandboxes directly through E2B SDK, implementing complete tool mapping and error handling +- **SandboxService Support**: Maintains compatibility with existing [sandbox_service](file:///Users/zlh/PycharmProjects/1/agentscope-runtime/src/agentscope_runtime/engine/services/sandbox/sandbox_service.py#L0-L210) calling methods, specially handles E2B sandbox types, resource cleanup + +### 2. Class Hierarchy Structure + +``` +Sandbox (Base Class) +└── CloudSandbox (Cloud Sandbox Base Class) + └── E2bSandBox (E2B Desktop Implementation) +``` + + +### 3. File Structure + +``` +src/agentscope_runtime/sandbox/ +├── enums.py # Added AGENTBAY enumeration +├── box/ +│ ├── cloud/ +│ │ ├── __init__.py # Added +│ │ └── cloud_sandbox.py # Added CloudSandbox base class +│ └── e2b/ +│ ├── __init__.py # Added +│ └── e2b_sandbox.py # Added E2bSandBox implementation +└── __init__.py # Updated exports +``` + + +### 4. Service Layer Integration + +- **Registration Mechanism**: Register using [@SandboxRegistry.register](file:///Users/zlh/PycharmProjects/1/agentscope-runtime/src/agentscope_runtime/sandbox/registry.py#L38-L89) decorator +- **Service Integration**: Special handling of E2B types in [SandboxService](file:///Users/zlh/PycharmProjects/1/agentscope-runtime/src/agentscope_runtime/engine/services/sandbox/sandbox_service.py#L10-L209) +- **Compatibility**: Full compatibility with existing sandbox interfaces +- **Lifecycle Management**: Supports creation, connection, and release of cloud resources + +## How to Use + +### 1. Set Environment Variables + +Configure authentication information according to E2B official documentation. +##### 1.1.1 E2B Activation +Visit the E2B website to register and obtain credentials, then configure E2B_API_KEY +https://e2b.dev + +Edit the .env.template file in the current directory or set environment variables + +```bash +# E2B API Key +export E2B_API_KEY= +# Docker runtime environment $home replaced with user home directory, no configuration needed when using cloud sandbox directly, unix:///$home/.colima/default/docker.sock +export DOCKER_HOST='' +``` + + +Dependency Installation + +```bash +# Install core dependencies +pip install agentscope-runtime + +# Install extensions +pip install "agentscope-runtime[ext]" +``` + + +### 2. Direct Usage of E2B Desktop Sandbox + +```python +import os +from agentscope_runtime.sandbox import E2bSandBox + +sandbox = E2bSandBox() + +# Run shell command +result = sandbox.call_tool("run_shell_command", {"command": "echo Hello World"}) +print(result["output"]) + +# Screenshot +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_path": f"{os.getcwd()}/screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` + + +### 3. Using via SandboxService + +```python +from agentscope_runtime.sandbox.enums import SandboxType +from agentscope_runtime.engine.services.sandbox import SandboxService + +sandbox_service = SandboxService() +sandboxes = sandbox_service.connect( + session_id="session1", + user_id="user1", + sandbox_types=[SandboxType.E2B] +) +``` + + +## Configuration Parameters + +### E2B Desktop Sandbox Configuration + +| Parameter | Type | Description | +|-----------|------|-------------| +| timeout | int | Operation timeout (seconds), default 600 | +| command_timeout | int | Command execution timeout (seconds), default 60 | + +## Notes + +1. Ensure E2B service is registered and configured before use +2. Need to properly configure corresponding environment variables +3. E2B service will incur corresponding resource costs + +## Running Demo + +```bash +# Sandbox demo +python examples/e2b_sandbox/e2b_sandbox_demo.py +``` diff --git a/cookbook/en/sandbox/sandbox.md b/cookbook/en/sandbox/sandbox.md index 7fccae2fe..aa0437d8b 100644 --- a/cookbook/en/sandbox/sandbox.md +++ b/cookbook/en/sandbox/sandbox.md @@ -246,6 +246,89 @@ with AgentbaySandbox( - Automatic session lifecycle management - Direct API communication with cloud service + + +**CloudApiSandbox (CloudComputerSandbox/CloudPhoneSandbox)**: A GUI sandbox environment built on Alibaba Cloud's Cloud Desktop and Cloud Phone API services, allowing users to remotely control cloud environments (currently only supports Windows desktop environment or Android environment). +* Note: Since it involves related cloud resources, you need to create Cloud Desktop and Cloud Phone instances in the Alibaba Cloud console before use, as well as configure the corresponding environment variables. +* please refer to {doc}`cloud_api_sandbox` for details. +```{code-cell} +from agentscope_runtime.sandbox import CloudComputerSandbox + +sandbox = CloudComputerSandbox( + desktop_id="your_desktop_id" +) + +# Run PowerShell command +result = sandbox.call_tool("run_shell_command", {"command": "echo Hello World"}) +print(result["output"]) + +# Screenshot +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_name": "screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` +```{code-cell} +from agentscope_runtime.sandbox import CloudPhoneSandbox + +sandbox = CloudPhoneSandbox( + instance_id="your_instance_id" +) + +# Click screen coordinates +result = sandbox.call_tool( + "click", + { + "x1": 151, + "y1": 404, + "x2": 151, + "y2": 404, + "width": 716, + "height": 1280 + } + ) + +# Screenshot +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_name": "screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` + +**CloudApi Sandbox Features:**: +- No local Docker required, fully cloud-based +- Supports multiple environment types (currently supports Windows desktop environment or Android environment only) +- Remotely control cloud resource lifecycle and management +- Direct API communication with cloud services + + +**E2B Desktop Sandbox (E2bSandBox)**: A GUI sandbox environment built on E2B cloud desktop services, allowing users to remotely control desktop environments on the cloud (Linux). +* Note: Before use, you need to configure the E2B_API_KEY environment variable. +* please refer to {doc}`e2b_sandbox` for details. +```{code-cell} +import os +from agentscope_runtime.sandbox import E2bSandBox + +sandbox = E2bSandBox() + +# Run shell command +result = sandbox.call_tool("run_shell_command", {"command": "echo Hello World"}) +print(result["output"]) + +# Screenshot +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_path": f"{os.getcwd()}/screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` +**E2B Sandbox Features:**: +- No local Docker required, fully cloud-based +- Automatic resource lifecycle management +- Direct API communication with cloud services + ```{note} More sandbox types are under development—stay tuned! ``` diff --git a/cookbook/en/service/sandbox.md b/cookbook/en/service/sandbox.md index ab0050d13..fb3bb971b 100644 --- a/cookbook/en/service/sandbox.md +++ b/cookbook/en/service/sandbox.md @@ -24,7 +24,7 @@ In the course of agent execution, typical roles of the sandbox service include: - **Connecting to existing environments**: In multi-turn conversations, connect the agent to a previously created sandbox to continue operations. - **Tool invocation**: Provide callable methods (such as `browser_navigate`, `browser_take_screenshot`, etc.) that can be registered as tools in an agent. - **Releasing environments**: Release the corresponding environment resources when the session ends or requirements change. -- **Multi-type support**: Supports different types of sandboxes (`BASE`, `BROWSER`, `CODE`, `AGENTBAY`, etc.). +- **Multi-type support**: Supports different types of sandboxes (`BASE`, `BROWSER`, `CODE`, `AGENTBAY`, `CLOUD_COMPUTER`,`CLOUD_PHONE`,`E2B` etc.). In different implementations, sandbox services mainly differ in: **running modes** (embedded/remote), **supported types**, **management methods**, and **extensibility**. @@ -87,17 +87,20 @@ for tool in [ ### Supported Sandbox Types -| Type Value | Description | Common Usage Examples | -| ------------ | -------------------------------------------- | ------------------------------------------------------------ | -| `DUMMY` | Null implementation / placeholder sandbox | Test workflows, simulate sandbox APIs without actual execution | -| `BASE` | Basic sandbox environment | General tool execution environment | -| `BROWSER` | Browser sandbox | Web navigation, screenshots, data crawling | -| `FILESYSTEM` | File system sandbox | Reading/writing files in a secure, isolated file system | -| `GUI` | Graphical interface sandbox | Interacting with GUI apps (clicking, typing, screenshots) | -| `MOBILE` | Mobile device emulation sandbox | Simulating mobile app operations and touch interactions | -| `APPWORLD` | App world emulation sandbox | Simulating cross-app interactions in a virtual environment | -| `BFCL` | BFCL (domain-specific execution environment) | Running business process scripts (depends on implementation) | -| `AGENTBAY` | Session-based AgentBay sandbox | Dedicated for multi-agent collaboration or complex task orchestration | +| Type Value | Description | Common Usage Examples | +|------------------|----------------------------------------------| ------------------------------------------------------------ | +| `DUMMY` | Null implementation / placeholder sandbox | Test workflows, simulate sandbox APIs without actual execution | +| `BASE` | Basic sandbox environment | General tool execution environment | +| `BROWSER` | Browser sandbox | Web navigation, screenshots, data crawling | +| `FILESYSTEM` | File system sandbox | Reading/writing files in a secure, isolated file system | +| `GUI` | Graphical interface sandbox | Interacting with GUI apps (clicking, typing, screenshots) | +| `MOBILE` | Mobile device emulation sandbox | Simulating mobile app operations and touch interactions | +| `APPWORLD` | App world emulation sandbox | Simulating cross-app interactions in a virtual environment | +| `BFCL` | BFCL (domain-specific execution environment) | Running business process scripts (depends on implementation) | +| `AGENTBAY` | Session-based AgentBay sandbox | Dedicated for multi-agent collaboration or complex task orchestration | +| `CLOUD_COMPUTER` | Cloud computer sandbox | Dedicated for multi-agent collaboration or complex task orchestration | +| `CLOUD_PHONE` | Cloud phone sandbox | Dedicated for multi-agent collaboration or complex task orchestration | +| `E2B` | E2B sandbox | Dedicated for multi-agent collaboration or complex task orchestration | ## Example: Switching Running Modes diff --git a/cookbook/zh/api/sandbox.md b/cookbook/zh/api/sandbox.md index 9ba3d5ced..aae12d5eb 100644 --- a/cookbook/zh/api/sandbox.md +++ b/cookbook/zh/api/sandbox.md @@ -78,6 +78,34 @@ Sandbox模块提供了隔离环境以安全地运行代码。 :no-index: ``` +### CloudPhoneSandbox +```{eval-rst} +.. autoclass:: agentscope_runtime.sandbox.CloudPhoneSandbox + :members: + :undoc-members: + :show-inheritance: + :no-index: +``` + +### CloudComputerSandbox +```{eval-rst} +.. autoclass:: agentscope_runtime.sandbox.CloudComputerSandbox + :members: + :undoc-members: + :show-inheritance: + :no-index: +``` + +### E2bSandBox +```{eval-rst} +.. autoclass:: agentscope_runtime.sandbox.E2bSandBox + :members: + :undoc-members: + :show-inheritance: + :no-index: +``` + + ## 自定义与示例 ```{eval-rst} .. automodule:: agentscope_runtime.sandbox.custom.custom_sandbox diff --git a/cookbook/zh/sandbox/cloud_api_sandbox.md b/cookbook/zh/sandbox/cloud_api_sandbox.md new file mode 100644 index 000000000..d77da3249 --- /dev/null +++ b/cookbook/zh/sandbox/cloud_api_sandbox.md @@ -0,0 +1,411 @@ +--- +jupytext: + formats: md:myst + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.11.5 +kernelspec: + display_name: Python 3.10 + language: python + name: python3 +--- + +# 云电脑/云手机沙箱 + +## 概述 + +Cloud Computer 和 Cloud Phone Sandbox 是基于阿里云无影云电脑和无影云手机API服务构建的 GUI 沙箱环境,允许用户远程控制云上的 Windows 桌面环境或 Android 手机环境。 + +## 功能特性 + +### 云电脑沙箱 (Cloud Computer Sandbox) + +- **环境类型**: Windows 桌面环境 +- **提供商**: 阿里云无影云电脑 +- **安全等级**: 高 +- **接入方式**: 无影云电脑企业版OpenAPI Python SDK调用 https://api.aliyun.com/document/ecd/2020-09-30/overview + +### 云手机沙箱 (Cloud Phone Sandbox) + +- **环境类型**: Android 手机环境 +- **提供商**: 阿里云无影云手机 +- **安全等级**: 高 +- **接入方式**: 无影云手机OpenAPI Python SDK调用 https://api.aliyun.com/document/eds-aic/2023-09-30/overview + +## 支持的操作 + +### 云电脑支持的工具操作 + +注意: 由于云电脑当前工具实现依赖于python3.10及以上环境,请确保你的云电脑环境已经安装了 Python 3.10 或更高版本,以及基础依赖包,和自定义依赖。 + 截图工具云电脑临时存放目录是在C盘下,需确保有该磁盘 + +#### 命令行工具 +- run_shell_command: 在 PowerShell 中运行命令 +- run_ipython_cell: 执行 Python 代码 +- write_file: 写入文件 +- read_file: 读取文件 +- remove_file: 删除文件 + +#### 输入模拟工具 +- press_key: 按键 +- click: 点击屏幕坐标 +- right_click: 右键点击 +- click_and_type: 点击并输入文本 +- append_text: 在指定位置追加文本 +- mouse_move: 鼠标移动 +- scroll: 滚动 +- scroll_pos: 在指定位置滚动 + +#### 系统控制工具 +- screenshot: 截图 +- go_home: 返回桌面 +- launch_app: 启动应用程序 + +### 云手机支持的工具操作 + +注意:当前输入文本工具是通过ADBKeyboard输入法结合粘贴板实现,所以请确保你的云手机已经安装ADBKeyboard.apk输入法。 + +#### 命令行工具 +- run_shell_command: 运行 ADB Shell 命令 + +#### 输入模拟工具 +- click: 点击屏幕坐标 +- type_text: 输入文本 +- slide: 滑动屏幕 + +#### 导航控制工具 +- go_home: 返回主屏幕 +- back: 返回按钮 +- menu: 菜单按钮 +- enter: 回车键 +- kill_front_app: 杀死前台应用 + +#### 系统工具 +- screenshot: 截图 +- send_file: 发送文件到云手机 +- remove_file: 删除云手机上的文件 + +#### 页面交互 +区别于Agentbay没有相关openapi可以查询远程页面链接,但是可以搭配无影客户端使用交互页面,或者参考无影WEBsdk,搭建一个前端html页面进行页面交互。 + +WEBsdk: https://wuying.aliyun.com/wuyingWebSdk/docs/intro/quick-start + +## Cloud Computer & Cloud Phone API Sandbox 集成进 Agentscope-Runtime: + +目前,Agentscope-Runtime 的沙箱容器基于 docker 实现,云上容器基于 k8s 实现;Cloud Computer & Cloud Phone API 集成进 AgentScope-Runtime,能够给使用 Agentscope-Runtime 提供另外一种云上沙箱环境的选择,可以使用除了 docker 容器沙箱之外,也可以选择使用无影云api沙箱; + +### 核心思路: + +核心思路是把 无影Cloud Computer & Cloud Phone API 封装成 Cloud Api Sandbox 集成进 AgentScope-Runtime,作为另外一种云沙箱的选择; +由于 Cloud Api Sandbox 并不依赖容器,所以创建 CloudSandbox 基类继承 Sandbox 类,这样就使得 Agentscope-Runtime 能够同时支持传统容器沙箱和云原生沙箱,在使用上与传统容器沙箱尽量保持一致; + +### 1. 核心架构集成 + +- **新增沙箱类型**: `SandboxType.CLOUD_COMPUTER`,`SandboxType.CLOUD_PHONE` 枚举,用于创建 Cloud Api Sandbox,支持动态枚举扩展; +- **CloudSandbox 基类**: 抽象基类,为云服务沙箱提供统一接口,不依赖容器管理,直接通过云 API 通信,可以支持不同云提供商扩展; +- **CloudComputerSandbox 实现**: 继承自 CloudSandbox,直接通过 WuYing Cloud Computer API 访问云端沙箱,实现完整的工具映射和错误处理; +- **CloudPhoneSandbox 实现**: 继承自 CloudSandbox,直接通过 WuYing Cloud Phone API 访问云端沙箱,实现完整的工具映射和错误处理; +- **SandboxService 支持**: 保持与原有 sandbox_service 调用方式的兼容性,特殊处理 Cloud Api 沙箱类型,资源清理; + +### 2. 类层次结构 + +``` +Sandbox (基类) +└── CloudSandbox (云沙箱基类) + ├── CloudComputerSandbox (云电脑实现) + └── CloudPhoneSandbox (云手机实现) +``` + +### 3. 文件结构 + +``` +src/agentscope_runtime/sandbox/ +├── enums.py # 新增 AGENTBAY 枚举 +├── box/ +│ ├── cloud/ +│ │ ├── __init__.py # 新增 +│ │ └── cloud_sandbox.py # 新增 CloudSandbox 基类 +│ └── cloud_api/ +│ ├── __init__.py # 新增 +│ └── cloud_computer_sandbox.py # 新增 CloudComputerSandbox 实现 +│ └── cloud_phone_sandbox.py # 新增 CloudPhoneSandbox 实现 +└── __init__.py # 更新导出 +``` + + +### 4. 服务层集成 + +- **注册机制**:使用 `@SandboxRegistry.register` 装饰器注册 +- **服务集成**:在 `SandboxService` 中特殊处理 CLOUD_COMPUTER,CLOUD_PHONE 类型 +- **兼容性**:保持与现有沙箱接口的完全兼容 +- **生命周期管理**: 支持创建、连接、释放 云资源 + +## 如何使用 + +### 1. 设置环境变量 + +##### 1.1.1 阿里云账号ak ,sk 获取 + 介绍文档: + https://help.aliyun.com/document_detail/53045.html?spm=5176.21213303.aillm.3.7df92f3d4XzQHZ&scm=20140722.S_%E9%98%BF%E9%87%8C%E4%BA%91sk._.RL_%E9%98%BF%E9%87%8C%E4%BA%91sk-LOC_aillm-OR_chat-V_3-RC_llm + +##### 1.1.2 oss开通 + 介绍文档: + https://help.aliyun.com/zh/oss/?spm=5176.29463013.J_AHgvE-XDhTWrtotIBlDQQ.8.68b834deqSKlrh + +备注:购买完后将账号凭证信息配置到下面环境变量中,也就是EDS_OSS_ 的配置 EDS_OSS_ACCESS_KEY相关的信息就是购买OSS的阿里云账号的ak,sk + +##### 1.1.3 无影云电脑开通 + 购买云电脑,建议企业版(个人版需要跟无影要一下EndUserId,用于配置环境变量ECD_USERNAME) +目前仅支持windos + + 无影个人版文档: + https://help.aliyun.com/zh/edsp?spm=a2c4g.11174283.d_help_search.i2 + 无影企业版文档: + https://help.aliyun.com/zh/wuying-workspace/product-overview/?spm=a2c4g.11186623.help-menu-68242.d_0.518d5bd7bpQxLq +购买完后将云电脑需要的信息配置到下面环境变量中,也就是ECD_ 的配置 + ALIBABA_CLOUD_ACCESS_KEY相关的信息就是购买云电脑的阿里云账号的ak,sk + +##### 1.1.4 无影云手机开通 +目前仅支持安卓系统 + + 控制台: + https://wya.wuying.aliyun.com/instanceLayouts + 帮助文档: + https://help.aliyun.com/zh/ecp/?spm=a2c4g.11186623.0.0.62dfe33avAMTwU + 购买完后将云电脑需要的信息配置到下面环境变量中,也就是EDS_ 的配置 + ALIBABA_CLOUD_ACCESS_KEY相关的信息就是购买云手机的阿里云账号的ak,sk + + +编辑当前目录下的.env.template文件或者设置环境变量 + +```bash +# 云电脑相关环境变量 +# 管控台授权用户名 +export ECD_USERNAME='' +export ECD_APP_STREAM_REGION_ID='cn-shanghai' +export DESKTOP_ID='' +export ECD_ALIBABA_CLOUD_REGION_ID='cn-hangzhou' +export ECD_ALIBABA_CLOUD_ENDPOINT='ecd.cn-hangzhou.aliyuncs.com' +export ECD_ALIBABA_CLOUD_ACCESS_KEY_ID='' +export ECD_ALIBABA_CLOUD_ACCESS_KEY_SECRET='' + +# 云手机相关环境变量 +export PHONE_INSTANCE_ID='' # 云手机实例ID +export EDS_ALIBABA_CLOUD_ENDPOINT='eds-aic.cn-shanghai.aliyuncs.com' +export EDS_ALIBABA_CLOUD_ACCESS_KEY_ID='' +export EDS_ALIBABA_CLOUD_ACCESS_KEY_SECRET='' + +# OSS存储相关环境变量 +export EDS_OSS_ACCESS_KEY_ID='' +export EDS_OSS_ACCESS_KEY_SECRET='' +export EDS_OSS_BUCKET_NAME='' +export EDS_OSS_ENDPOINT='' +export EDS_OSS_PATH='' + + +# docker 运行环境 $home 替换为用户主目录,直接使用云沙箱的方式下无需配置, +export DOCKER_HOST='unix:///$home/.colima/default/docker.sock' + +``` + +依赖安装 + +```bash +# 安装核心依赖 +pip install agentscope-runtime + +# 安装拓展 +pip install "agentscope-runtime[ext]" +``` + + +### 2. 云电脑python,依赖安装 + +以下所有命令都是在云电脑上的 PowerShell 中执行,可以通过下载无影客户端登录到电脑上执行: + +```powershell +# 设置下载路径和版本 +$version = "3.10.11" +$installerName = "python-$version-amd64.exe" +$downloadUrl = "https://mirrors.aliyun.com/python-release/windows/$installerName" +$pythonInstaller = "$env:TEMP\$installerName" + +# 默认安装路径(Python 3.10 安装到 Program Files) +$installDir = "C:\Program Files\Python310" +$scriptsDir = "$installDir\Scripts" + +# 下载 Python 安装包(使用阿里云镜像) +Write-Host "正在从阿里云下载 $installerName ..." -ForegroundColor Green +Invoke-WebRequest -Uri $downloadUrl -OutFile $pythonInstaller + +# 静默安装 Python(所有用户 + 尝试添加 PATH) +Write-Host "正在安装 Python $version ..." -ForegroundColor Green +Start-Process -Wait -FilePath $pythonInstaller -ArgumentList "/quiet InstallAllUsers=1 PrependPath=0" # 我们自己加 PATH,所以关闭内置的 + +# 删除安装包 +Remove-Item -Force $pythonInstaller + +# ========== 主动添加 Python 到系统 PATH ========== +Write-Host "正在将 Python 添加到系统环境变量 PATH ..." -ForegroundColor Green + +# 获取当前系统 PATH(Machine 级别) +$currentPath = [Environment]::GetEnvironmentVariable("Path", "Machine") -split ";" + +# 要添加的路径 +$pathsToAdd = @($installDir, $scriptsDir) + +# 检查并添加 +$updated = $false +foreach ($path in $pathsToAdd) { + if (-not $currentPath.Contains($path) -and (Test-Path $path)) { + $currentPath += $path + $updated = $true + Write-Host "已添加: $path" -ForegroundColor Cyan + } +} + +# 写回系统 PATH +if ($updated) { + $newPath = $currentPath -join ";" + [Environment]::SetEnvironmentVariable("Path", $newPath, "Machine") + Write-Host "系统 PATH 已更新。" -ForegroundColor Green +} else { + Write-Host "Python 路径已存在于系统 PATH 中。" -ForegroundColor Yellow +} + +# ========== 更新当前 PowerShell 会话的 PATH ========== +# 否则当前终端还不能使用 python 命令 +$env:Path = [Environment]::GetEnvironmentVariable("Path", "Machine") + ";" + [Environment]::GetEnvironmentVariable("Path", "User") + +# ========== 检查是否安装成功 ========== +Write-Host "`n检查安装结果:" -ForegroundColor Green +try { + python --version +} catch { + Write-Host "python 命令不可用,请重启终端。" -ForegroundColor Red +} + +try { + pip --version +} catch { + Write-Host "pip 命令不可用,请重启终端。" -ForegroundColor Red +} + +# 安装依赖包 +python -m pip install pyautogui -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install requests -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install pyperclip -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install pynput -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install aiohttp -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install asyncio -i https://mirrors.aliyun.com/pypi/simple/ + +``` + + +### 3. 直接使用云电脑沙箱 + +注意:需要先在阿里云控制台创建云电脑桌面和云手机实例。 + + +```python +from agentscope_runtime.sandbox import CloudComputerSandbox + +sandbox = CloudComputerSandbox( + desktop_id="your_desktop_id" +) + +# 运行PowerShell命令 +result = sandbox.call_tool("run_shell_command", {"command": "echo Hello World"}) +print(result["output"]) + +# 截图 +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_name": "screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` + + +### 4. 直接使用云手机沙箱 + +```python +from agentscope_runtime.sandbox import CloudPhoneSandbox + +sandbox = CloudPhoneSandbox( + instance_id="your_instance_id" +) + +# 点击屏幕坐标 +result = sandbox.call_tool( + "click", + { + "x1": 151, + "y1": 404, + "x2": 151, + "y2": 404, + "width": 716, + "height": 1280 + } + ) + +# 截图 +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_name": "screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` + + +### 5. 通过 SandboxService 使用 + +```python +from agentscope_runtime.sandbox.enums import SandboxType +from agentscope_runtime.engine.services.sandbox import SandboxService + +sandbox_service = SandboxService() +sandboxes = sandbox_service.connect( + session_id="session1", + user_id="user1", + sandbox_types=[SandboxType.CLOUD_COMPUTER, SandboxType.CLOUD_PHONE] +) +``` + + +## 配置参数 + +### 云电脑沙箱配置 + +| 参数 | 类型 | 描述 | +|------|------|------| +| desktop_id | str | 云电脑桌面ID | +| timeout | int | 操作超时时间(秒),默认600 | +| auto_wakeup | bool | 是否自动唤醒云电脑,默认True | +| screenshot_dir | str | 截图保存目录 | +| command_timeout | int | 命令执行超时时间(秒),默认60 | + +### 云手机沙箱配置 + +| 参数 | 类型 | 描述 | +|------|------|------| +| instance_id | str | 云手机实例ID | +| timeout | int | 操作超时时间(秒),默认600 | +| auto_start | bool | 是否自动启动云手机,默认True | + +## 注意事项 + +1. 使用前需要确保已在阿里云开通无影云电脑/云手机服务 +2. 需要正确配置相应的环境变量 +3. 云电脑和云手机会产生相应的资源费用 +4. 某些操作可能需要目标环境中安装特定软件或驱动才能正常工作 + + +## 运行演示 demo + +```bash +# 沙箱演示 +python examples/cloud_api_sandbox/cloud_api_sandbox_demo.py +``` diff --git a/cookbook/zh/sandbox/e2b_sandbox.md b/cookbook/zh/sandbox/e2b_sandbox.md new file mode 100644 index 000000000..47319d8c9 --- /dev/null +++ b/cookbook/zh/sandbox/e2b_sandbox.md @@ -0,0 +1,184 @@ +--- +jupytext: + formats: md:myst + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.11.5 +kernelspec: + display_name: Python 3.10 + language: python + name: python3 +--- + +# E2B 沙箱 + +## 概述 + +E2bSandBox 是基于 E2B 云桌面服务构建的 GUI 沙箱环境,允许用户远程控制云上的桌面环境。 + +## 功能特性 + +### E2B 桌面沙箱 (E2bSandBox) + +- **环境类型**: 云桌面环境 +- **提供商**: E2B Desktop +- **安全等级**: 高 +- **接入方式**: E2B Desktop Python SDK 调用 + +## 支持的操作 + +### 桌面控制工具 + +- click: 点击屏幕坐标 +- right_click: 右键点击 +- type_text: 输入文本 +- press_key: 按键 +- launch_app: 启动应用程序 +- click_and_type: 点击并输入文本 + +### 命令行工具 + +- run_shell_command: 运行 shell 命令 + +### 系统工具 + +- screenshot: 截图 + +## 集成到 Agentscope-Runtime + +E2B Desktop Sandbox 已经被集成到 Agentscope-Runtime 中,提供了与 Docker 沙箱类似的使用体验。 + +## E2B Sandbox 集成进 Agentscope-Runtime: + +目前,Agentscope-Runtime 的沙箱容器基于 docker 实现,云上容器基于 k8s 实现;E2B Sandbox 集成进 AgentScope-Runtime,能够给使用 Agentscope-Runtime 提供另外一种云上沙箱环境的选择,可以使用除了 docker 容器沙箱之外,也可以选择使用e2b沙箱; + +### 核心思路: + +核心思路是把 E2B Sandbox 封装成 Sandbox 集成进 AgentScope-Runtime,作为另外一种云沙箱的选择; +由于 E2B Sandbox 并不依赖容器,所以创建 CloudSandbox 基类继承 Sandbox 类,这样就使得 Agentscope-Runtime 能够同时支持传统容器沙箱和云原生沙箱,在使用上与传统容器沙箱尽量保持一致; + +### 1. 核心架构集成 + +- **新增沙箱类型**: `SandboxType.E2B` 枚举,用于创建 E2B Sandbox,支持动态枚举扩展; +- **CloudSandbox 基类**: 抽象基类,为云服务沙箱提供统一接口,不依赖容器管理,直接通过云 API 通信,可以支持不同云提供商扩展; +- **E2bSandBox 实现**: 继承自 CloudSandbox,直接通过 E2b sdk 访问云端沙箱,实现完整的工具映射和错误处理; +- **SandboxService 支持**: 保持与原有 sandbox_service 调用方式的兼容性,特殊处理 E2b 沙箱类型,资源清理; + +### 2. 类层次结构 + +``` +Sandbox (基类) +└── CloudSandbox (云沙箱基类) + └── E2bSandBox (E2B桌面实现) +``` + +### 3. 文件结构 + +``` +src/agentscope_runtime/sandbox/ +├── enums.py # 新增 AGENTBAY 枚举 +├── box/ +│ ├── cloud/ +│ │ ├── __init__.py # 新增 +│ │ └── cloud_sandbox.py # 新增 CloudSandbox 基类 +│ └── e2b/ +│ ├── __init__.py # 新增 +│ └── e2b_sandbox.py # 新增 E2bSandBox 实现 +└── __init__.py # 更新导出 +``` + + +### 4. 服务层集成 + +- **注册机制**:使用 `@SandboxRegistry.register` 装饰器注册 +- **服务集成**:在 `SandboxService` 中特殊处理 E2B 类型 +- **兼容性**:保持与现有沙箱接口的完全兼容 +- **生命周期管理**: 支持创建、连接、释放 云资源 + +## 如何使用 + +### 1. 设置环境变量 + +根据 E2B 官方文档配置相应的认证信息。 +##### 1.1.1 E2B 开通 + 访问E2B官网注册并获取,然后配置到E2B_API_KEY + https://e2b.dev + +编辑当前目录下的.env.template文件或者设置环境变量 + +```bash +# E2B API Key +export E2B_API_KEY= +# docker 运行环境 $home 替换为用户主目录,直接使用云沙箱的方式下无需配置,unix:///$home/.colima/default/docker.sock +export DOCKER_HOST='' + +``` + +依赖安装 + +```bash +# 安装核心依赖 +pip install agentscope-runtime + +# 安装拓展 +pip install "agentscope-runtime[ext]" +``` + + +### 2. 直接使用 E2B 桌面沙箱 + +```python +import os +from agentscope_runtime.sandbox import E2bSandBox + +sandbox = E2bSandBox() + +# 运行shell命令 +result = sandbox.call_tool("run_shell_command", {"command": "echo Hello World"}) +print(result["output"]) + +# 截图 +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_path": f"{os.getcwd()}/screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` +### 3. 通过 SandboxService 使用 + +```python +from agentscope_runtime.sandbox.enums import SandboxType +from agentscope_runtime.engine.services.sandbox import SandboxService + +sandbox_service = SandboxService() +sandboxes = sandbox_service.connect( + session_id="session1", + user_id="user1", + sandbox_types=[SandboxType.E2B] +) +``` +## 配置参数 + +### E2B 桌面沙箱配置 + +| 参数 | 类型 | 描述 | +|------|------|------| +| timeout | int | 操作超时时间(秒),默认600 | +| command_timeout | int | 命令执行超时时间(秒),默认60 | + +## 注意事项 + +1. 使用前需要确保已注册并配置好 E2B 服务 +2. 需要正确配置相应的环境变量 +3. E2B 服务会产生相应的资源费用 +``` + + +## 运行演示 demo + +```bash +# 沙箱演示 +python examples/e2b_sandbox/e2b_sandbox_demo.py +``` diff --git a/cookbook/zh/sandbox/sandbox.md b/cookbook/zh/sandbox/sandbox.md index 9e0367446..1d2a38948 100644 --- a/cookbook/zh/sandbox/sandbox.md +++ b/cookbook/zh/sandbox/sandbox.md @@ -278,6 +278,88 @@ with AgentbaySandbox( - 自动管理会话生命周期 - 通过 API 直接与云服务通信 + +**CloudApi沙箱(CloudComputerSandbox/CloudPhoneSandbox)**:基于阿里云无影云电脑和无影云手机API服务构建的 GUI 沙箱环境,允许用户远程控制云上环境(目前仅支持Windows 桌面环境或 Android环境)。 +* 注意:由于涉及到相关的云资源,使用前需要先在阿里云控制台创建无影云电脑桌面和无影云手机实例,以及对应环境变量配置 +* 详细请参考:{doc}`cloud_api_sandbox` +```{code-cell} +from agentscope_runtime.sandbox import CloudComputerSandbox + +sandbox = CloudComputerSandbox( + desktop_id="your_desktop_id" +) + +# 运行PowerShell命令 +result = sandbox.call_tool("run_shell_command", {"command": "echo Hello World"}) +print(result["output"]) + +# 截图 +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_name": "screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` +```{code-cell} +from agentscope_runtime.sandbox import CloudPhoneSandbox + +sandbox = CloudPhoneSandbox( + instance_id="your_instance_id" +) + +# 点击屏幕坐标 +result = sandbox.call_tool( + "click", + { + "x1": 151, + "y1": 404, + "x2": 151, + "y2": 404, + "width": 716, + "height": 1280 + } + ) + +# 截图 +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_name": "screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` + +**CloudApi沙箱特性**: +- 无需本地 Docker,完全基于云服务 +- 支持多种环境类型(目前暂时支持Windows 桌面环境或 Android环境) +- 可远程操控云资源生命周期,管理 +- 通过 API 直接与云服务通信 + + +**E2B桌面沙箱(E2bSandBox)**:是基于 E2B 云桌面服务构建的 GUI 沙箱环境,允许用户远程控制云上的桌面环境(linux) +* 注意:使用前需要配置E2B_API_KEY环境变量,详细请参考:{doc}`e2b_sandbox` +```{code-cell} +import os +from agentscope_runtime.sandbox import E2bSandBox + +sandbox = E2bSandBox() + +# 运行shell命令 +result = sandbox.call_tool("run_shell_command", {"command": "echo Hello World"}) +print(result["output"]) + +# 截图 +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_path": f"{os.getcwd()}/screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` +**E2B沙箱特性**: +- 无需本地 Docker,完全基于云服务 +- 自动管理资源生命周期 +- 通过 API 直接与云服务通信 + + ```{note} 更多沙箱类型正在开发中,敬请期待! ``` diff --git a/cookbook/zh/service/sandbox.md b/cookbook/zh/service/sandbox.md index a38ac19e6..abdfa8e57 100644 --- a/cookbook/zh/service/sandbox.md +++ b/cookbook/zh/service/sandbox.md @@ -24,7 +24,7 @@ kernelspec: - **连接已有环境**:在对话多轮执行过程中,智能体连接到之前的沙箱继续操作。 - **工具调用**:提供可调用的方法(如 `browser_navigate`、`browser_take_screenshot` 等),可在 Agent 内注册为工具。 - **释放环境**:会话结束或需求变化时,释放对应环境资源。 -- **多类型支持**:支持不同类型的沙箱(`BASE`、`BROWSER`、`CODE`、`AGENTBAY` 等)。 +- **多类型支持**:支持不同类型的沙箱(`BASE`、`BROWSER`、`CODE`、`AGENTBAY`、`CLOUD_COMPUTER`、`CLOUD_PHONE`、`E2B` 等)。 沙箱服务在不同实现中,差异主要体现在: **运行模式**(嵌入式/远程)、**支持的类型**、**管理方式**以及**可扩展性**。 @@ -87,17 +87,20 @@ for tool in [ ### 支持的沙箱类型 -| 类型值 | 功能描述 | 常见用途示例 | -| ------------ | ---------------------------- | ----------------------------------------- | -| `DUMMY` | 空实现/占位沙箱 | 测试流程,模拟沙箱接口但不执行实际操作 | -| `BASE` | 基础沙箱环境 | 通用工具运行环境 | -| `BROWSER` | 浏览器沙箱 | 网页导航、截图、数据抓取 | -| `FILESYSTEM` | 文件系统沙箱 | 在安全隔离的文件系统中读写文件 | -| `GUI` | 图形界面沙箱 | 与 GUI 应用交互(点击、输入、截屏) | -| `MOBILE` | 移动设备仿真沙箱 | 模拟手机应用操作、触控交互 | -| `APPWORLD` | 应用世界仿真沙箱 | 在虚拟环境中模拟跨应用交互 | -| `BFCL` | BFCL(特定业务领域执行环境) | 运行业务流程脚本(具体取决于实现) | -| `AGENTBAY` | AgentBay会话型沙箱 | 专用于多Agent协作或复杂任务编排的持久环境 | +| 类型值 | 功能描述 | 常见用途示例 | +|------------------|------------------|---------------------| +| `DUMMY` | 空实现/占位沙箱 | 测试流程,模拟沙箱接口但不执行实际操作 | +| `BASE` | 基础沙箱环境 | 通用工具运行环境 | +| `BROWSER` | 浏览器沙箱 | 网页导航、截图、数据抓取 | +| `FILESYSTEM` | 文件系统沙箱 | 在安全隔离的文件系统中读写文件 | +| `GUI` | 图形界面沙箱 | 与 GUI 应用交互(点击、输入、截屏) | +| `MOBILE` | 移动设备仿真沙箱 | 模拟手机应用操作、触控交互 | +| `APPWORLD` | 应用世界仿真沙箱 | 在虚拟环境中模拟跨应用交互 | +| `BFCL` | BFCL(特定业务领域执行环境) | 运行业务流程脚本(具体取决于实现) | +| `AGENTBAY` | AgentBay会话型沙箱 | 专用于多Agent协作或复杂任务编排的持久环境 | +| `CLOUD_COMPUTER` | 无影云电脑沙箱 | 专用于多Agent协作或复杂任务编排的持久环境 | +| `CLOUD_PHONE` | 无影云手机沙箱 | 专用于多Agent协作或复杂任务编排的持久环境 | +| `E2B` | E2B沙箱 | 专用于多Agent协作或复杂任务编排的持久环境 | ## 切换运行模式示例 diff --git a/examples/cloud_api_sandbox/.env.template b/examples/cloud_api_sandbox/.env.template new file mode 100644 index 000000000..26f9ecb3a --- /dev/null +++ b/examples/cloud_api_sandbox/.env.template @@ -0,0 +1,24 @@ +# 云电脑管控授权的用户名 +ECD_USERNAME= +ECD_APP_STREAM_REGION_ID=cn-shanghai + +DESKTOP_ID=ecd- +ECD_ALIBABA_CLOUD_REGION_ID=cn-hangzhou +ECD_ALIBABA_CLOUD_ENDPOINT=ecd.cn-hangzhou.aliyuncs.com +ECD_ALIBABA_CLOUD_ACCESS_KEY_ID= +ECD_ALIBABA_CLOUD_ACCESS_KEY_SECRET= + + +PHONE_INSTANCE_ID=acp- +EDS_ALIBABA_CLOUD_ENDPOINT=eds-aic.cn-shanghai.aliyuncs.com +EDS_ALIBABA_CLOUD_ACCESS_KEY_ID= +EDS_ALIBABA_CLOUD_ACCESS_KEY_SECRET= + +EDS_OSS_ACCESS_KEY_ID= +EDS_OSS_ACCESS_KEY_SECRET= +EDS_OSS_BUCKET_NAME= +EDS_OSS_ENDPOINT= +EDS_OSS_PATH= + +# 使用service方式调用需要配置 docker 运行环境 例如,替换$home为实际地址 unix://$home/.colima/default/docker.sock +DOCKER_HOST= \ No newline at end of file diff --git a/examples/cloud_api_sandbox/README.md b/examples/cloud_api_sandbox/README.md new file mode 100644 index 000000000..ac77b00e8 --- /dev/null +++ b/examples/cloud_api_sandbox/README.md @@ -0,0 +1,391 @@ +# Cloud Computer & Cloud Phone API Sandbox + +## Overview + +Cloud Computer and Cloud Phone Sandbox are GUI sandbox environments built on Alibaba Cloud's Wuying Cloud Desktop and Wuying Cloud Phone API services, allowing users to remotely control Windows desktop or Android phone environments in the cloud. + +## Features + +### Cloud Computer Sandbox + +- **Environment Type**: Windows desktop environment +- **Provider**: Alibaba Cloud Wuying Cloud Desktop +- **Security Level**: High +- **Access Method**: Wuying Cloud Desktop Enterprise Edition OpenAPI Python SDK call at https://api.aliyun.com/document/ecd/2020-09-30/overview + +### Cloud Phone Sandbox + +- **Environment Type**: Android phone environment +- **Provider**: Alibaba Cloud Wuying Cloud Phone +- **Security Level**: High +- **Access Method**: Wuying Cloud Phone OpenAPI Python SDK call at https://api.aliyun.com/document/eds-aic/2023-09-30/overview + +## Supported Operations + +### Tools Supported by Cloud Computer + +Note: Since the current implementation of cloud computer tools depends on Python 3.10 or higher environment, please ensure that your cloud computer environment has installed Python 3.10 or higher version, as well as basic dependency packages and custom dependencies. +The temporary storage directory for screenshot tools on cloud computers is under the C drive, so make sure this disk exists. + +#### Command Line Tools +- `run_shell_command`: Run commands in PowerShell +- `run_ipython_cell`: Execute Python code +- `write_file`: Write files +- `read_file`: Read files +- `remove_file`: Delete files + +#### Input Simulation Tools +- `press_key`: Press keys +- `click`: Click screen coordinates +- `right_click`: Right-click +- `click_and_type`: Click and input text +- `append_text`: Append text at specified position +- `mouse_move`: Mouse movement +- `scroll`: Scroll +- `scroll_pos`: Scroll at specified position + +#### System Control Tools +- `screenshot`: Screenshot +- `go_home`: Return to desktop +- `launch_app`: Launch applications + +### Tools Supported by Cloud Phone + +Note: The current text input tool is implemented through ADBKeyboard input method combined with clipboard, so please ensure that your cloud phone has installed the ADBKeyboard.apk input method. + +#### Command Line Tools +- `run_shell_command`: Run ADB Shell commands + +#### Input Simulation Tools +- `click`: Click screen coordinates +- `type_text`: Input text +- `slide`: Slide screen + +#### Navigation Control Tools +- `go_home`: Return to home screen +- `back`: Back button +- `menu`: Menu button +- `enter`: Enter key +- `kill_front_app`: Kill foreground application + +#### System Tools +- `screenshot`: Screenshot +- `send_file`: Send file to cloud phone +- `remove_file`: Delete files on cloud phone + +#### Page Interaction +Unlike agentbay which does not have related OpenAPI to query remote page links, interaction pages can be used with Wuying client, or refer to Wuying WEBsdk to build a front-end HTML page for page interaction. + +WEBsdk: https://wuying.aliyun.com/wuyingWebSdk/docs/intro/quick-start + +## Integration of Cloud Computer & Cloud Phone API Sandbox into Agentscope-Runtime: + +Currently, Agentscope-Runtime's sandbox containers are based on Docker implementation, while cloud containers are based on Kubernetes implementation. Integrating Cloud Computer & Cloud Phone API into AgentScope-Runtime provides another choice of cloud sandbox environments for users of Agentscope-Runtime. Users can choose to use Wuying Cloud API sandbox instead of Docker container sandbox. + +### Core Idea: + +The core idea is to encapsulate Wuying Cloud Computer & Cloud Phone API into Cloud API Sandbox and integrate it into AgentScope-Runtime as another cloud sandbox option. Since Cloud API Sandbox does not depend on containers, we create a CloudSandbox base class that inherits from Sandbox class. This enables Agentscope-Runtime to support both traditional container sandboxes and cloud-native sandboxes, maintaining consistency with traditional container sandboxes as much as possible. + +### 1. Core Architecture Integration + +- **New Sandbox Types**: `SandboxType.CLOUD_COMPUTER`, `SandboxType.CLOUD_PHONE` enumerations for creating Cloud API Sandbox, supporting dynamic enumeration extension; +- **CloudSandbox Base Class**: Abstract base class providing unified interface for cloud service sandbox, not dependent on container management, communicating directly through cloud APIs, supporting expansion for different cloud providers; +- **CloudComputerSandbox Implementation**: Inherits from CloudSandbox, accesses cloud sandbox directly through WuYing Cloud Computer API, implementing complete tool mapping and error handling; +- **CloudPhoneSandbox Implementation**: Inherits from CloudSandbox, accesses cloud sandbox directly through WuYing Cloud Phone API, implementing complete tool mapping and error handling; +- **SandboxService Support**: Maintaining compatibility with existing `sandbox_service` calling methods, specially handling Cloud API sandbox types, resource cleanup; + +### 2. Class Hierarchy Structure + +``` +Sandbox (Base Class) +└── CloudSandbox (Cloud Sandbox Base Class) + ├── CloudComputerSandbox (Cloud Computer Implementation) + └── CloudPhoneSandbox (Cloud Phone Implementation) +``` + + +### 3. File Structure + +``` +src/agentscope_runtime/sandbox/ +├── enums.py # Added AGENTBAY enumeration +├── box/ +│ ├── cloud/ +│ │ ├── __init__.py # Added +│ │ └── cloud_sandbox.py # Added CloudSandbox base class +│ └── cloud_api/ +│ ├── __init__.py # Added +│ └── cloud_computer_sandbox.py # Added CloudComputerSandbox implementation +│ └── cloud_phone_sandbox.py # Added CloudPhoneSandbox implementation +└── __init__.py # Updated exports +``` + + +### 4. Service Layer Integration + +- **Registration Mechanism**: Using `@SandboxRegistry.register` decorator for registration +- **Service Integration**: Special handling of `CLOUD_COMPUTER`, `CLOUD_PHONE` types in `SandboxService` +- **Compatibility**: Maintaining full compatibility with existing sandbox interfaces +- **Lifecycle Management**: Supporting creation, connection, and release of cloud resources + +## How to Use + +### 1. Setting Environment Variables + +##### 1.1.1 Obtain Alibaba Cloud Account AK, SK +Documentation: +https://help.aliyun.com/document_detail/53045.html?spm=5176.21213303.aillm.3.7df92f3d4XzQHZ&scm=20140722.S_%E9%98%BF%E9%87%8C%E4%BA%91sk._.RL_%E9%98%BF%E9%87%8C%E4%BA%91sk-LOC_aillm-OR_chat-V_3-RC_llm + +##### 1.1.2 Activate OSS +Documentation: +https://help.aliyun.com/zh/oss/?spm=5176.29463013.J_AHgvE-XDhTWrtotIBlDQQ.8.68b834deqSKlrh + +Note: After purchase, configure account credential information to the following environment variables. The EDS_OSS_ configuration means that EDS_OSS_ACCESS_KEY related information is the ak, sk of the Alibaba Cloud account that purchased OSS. + +##### 1.1.3 Activate Wuying Cloud Desktop +Purchase cloud desktop, enterprise edition recommended (personal edition requires EndUserId from Wuying for configuring environment variable ECD_USERNAME). Currently only supports Windows. + +Personal edition documentation: +https://help.aliyun.com/zh/edsp?spm=a2c4g.11174283.d_help_search.i2 +Enterprise edition documentation: +https://help.aliyun.com/zh/wuying-workspace/product-overview/?spm=a2c4g.11186623.help-menu-68242.d_0.518d5bd7bpQxLq + +After purchase, configure the required cloud desktop information into the following environment variables, namely the ECD_ configuration. ALIBABA_CLOUD_ACCESS_KEY related information is the ak, sk of the Alibaba Cloud account that purchased the cloud desktop. + +##### 1.1.4 Activate Wuying Cloud Phone +Currently only supports Android system. + +Console: +https://wya.wuying.aliyun.com/instanceLayouts +Help documentation: +https://help.aliyun.com/zh/ecp/?spm=a2c4g.11186623.0.0.62dfe33avAMTwU + +After purchase, configure the required cloud desktop information into the following environment variables, namely the EDS_ configuration. ALIBABA_CLOUD_ACCESS_KEY related information is the ak, sk of the Alibaba Cloud account that purchased the cloud phone. + +Edit the .env.template file in the current directory or set environment variables: + +```bash +# Cloud computer related environment variables +# Console authorized username +export ECD_USERNAME='' +export ECD_APP_STREAM_REGION_ID='cn-shanghai' +export DESKTOP_ID='' +export ECD_ALIBABA_CLOUD_REGION_ID='cn-hangzhou' +export ECD_ALIBABA_CLOUD_ENDPOINT='ecd.cn-hangzhou.aliyuncs.com' +export ECD_ALIBABA_CLOUD_ACCESS_KEY_ID='' +export ECD_ALIBABA_CLOUD_ACCESS_KEY_SECRET='' + +# Cloud phone related environment variables +export PHONE_INSTANCE_ID='' # Cloud phone instance ID +export EDS_ALIBABA_CLOUD_ENDPOINT='eds-aic.cn-shanghai.aliyuncs.com' +export EDS_ALIBABA_CLOUD_ACCESS_KEY_ID='' +export EDS_ALIBABA_CLOUD_ACCESS_KEY_SECRET='' + +# OSS storage related environment variables +export EDS_OSS_ACCESS_KEY_ID='' +export EDS_OSS_ACCESS_KEY_SECRET='' +export EDS_OSS_BUCKET_NAME='' +export EDS_OSS_ENDPOINT='' +export EDS_OSS_PATH='' + +# Docker runtime environment $home replaced with user home directory, no need to configure when using cloud sandbox directly +export DOCKER_HOST='unix:///$home/.colima/default/docker.sock' +``` + + +Dependency installation: + +```bash +# Install core dependencies +pip install agentscope-runtime + +# Install extensions +pip install "agentscope-runtime[ext]" +``` + + +### 2. Cloud Computer Python Dependency Installation + +All the following commands are executed in PowerShell on the cloud computer, which can be accessed by downloading the Wuying client and logging into the computer: + +```powershell +# Set download path and version +$version = "3.10.11" +$installerName = "python-$version-amd64.exe" +$downloadUrl = "https://mirrors.aliyun.com/python-release/windows/$installerName" +$pythonInstaller = "$env:TEMP\$installerName" + +# Default installation path (Python 3.10 installed to Program Files) +$installDir = "C:\Program Files\Python310" +$scriptsDir = "$installDir\Scripts" + +# Download Python installer (using Alibaba Cloud mirror) +Write-Host "Downloading $installerName from Alibaba Cloud..." -ForegroundColor Green +Invoke-WebRequest -Uri $downloadUrl -OutFile $pythonInstaller + +# Silent installation of Python (all users + attempt to add PATH) +Write-Host "Installing Python $version ..." -ForegroundColor Green +Start-Process -Wait -FilePath $pythonInstaller -ArgumentList "/quiet InstallAllUsers=1 PrependPath=0" # We add PATH ourselves, so disable built-in one + +# Delete installer package +Remove-Item -Force $pythonInstaller + +# ========== Manually add Python to system PATH ========== +Write-Host "Adding Python to system environment variable PATH..." -ForegroundColor Green + +# Get current system PATH (Machine level) +$currentPath = [Environment]::GetEnvironmentVariable("Path", "Machine") -split ";" + +# Paths to add +$pathsToAdd = @($installDir, $scriptsDir) + +# Check and add +$updated = $false +foreach ($path in $pathsToAdd) { + if (-not $currentPath.Contains($path) -and (Test-Path $path)) { + $currentPath += $path + $updated = $true + Write-Host "Added: $path" -ForegroundColor Cyan + } +} + +# Write back to system PATH +if ($updated) { + $newPath = $currentPath -join ";" + [Environment]::SetEnvironmentVariable("Path", $newPath, "Machine") + Write-Host "System PATH updated." -ForegroundColor Green +} else { + Write-Host "Python path already exists in system PATH." -ForegroundColor Yellow +} + +# ========== Update current PowerShell session PATH ========== +# Otherwise, python command won't work in current terminal +$env:Path = [Environment]::GetEnvironmentVariable("Path", "Machine") + ";" + [Environment]::GetEnvironmentVariable("Path", "User") + +# ========== Check if installation was successful ========== +Write-Host "`nChecking installation results:" -ForegroundColor Green +try { + python --version +} catch { + Write-Host "python command unavailable, please restart terminal." -ForegroundColor Red +} + +try { + pip --version +} catch { + Write-Host "pip command unavailable, please restart terminal." -ForegroundColor Red +} + +# Install dependency packages +python -m pip install pyautogui -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install requests -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install pyperclip -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install pynput -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install aiohttp -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install asyncio -i https://mirrors.aliyun.com/pypi/simple/ +``` + + +### 3. Direct Usage of Cloud Computer Sandbox + +Note: You need to create cloud desktop and cloud phone instances in the Alibaba Cloud console first. + +```python +from agentscope_runtime.sandbox import CloudComputerSandbox + +sandbox = CloudComputerSandbox( + desktop_id="your_desktop_id" +) + +# Run PowerShell command +result = sandbox.call_tool("run_shell_command", {"command": "echo Hello World"}) +print(result["output"]) + +# Screenshot +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_name": "screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` + + +### 4. Direct Usage of Cloud Phone Sandbox + +```python +from agentscope_runtime.sandbox import CloudPhoneSandbox + +sandbox = CloudPhoneSandbox( + instance_id="your_instance_id" +) + +# Click screen coordinates +result = sandbox.call_tool( + "click", + { + "x1": 151, + "y1": 404, + "x2": 151, + "y2": 404, + "width": 716, + "height": 1280 + } + ) + +# Screenshot +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_name": "screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` + + +### 5. Usage via SandboxService + +```python +from agentscope_runtime.sandbox.enums import SandboxType +from agentscope_runtime.engine.services.sandbox import SandboxService + +sandbox_service = SandboxService() +sandboxes = sandbox_service.connect( + session_id="session1", + user_id="user1", + sandbox_types=[SandboxType.CLOUD_COMPUTER, SandboxType.CLOUD_PHONE] +) +``` + + +## Configuration Parameters + +### Cloud Computer Sandbox Configuration + +| Parameter | Type | Description | +|-----------|------|-------------| +| desktop_id | str | Cloud desktop ID | +| timeout | int | Operation timeout (seconds), default 600 | +| auto_wakeup | bool | Whether to automatically wake up cloud computer, default True | +| screenshot_dir | str | Screenshot save directory | +| command_timeout | int | Command execution timeout (seconds), default 60 | + +### Cloud Phone Sandbox Configuration + +| Parameter | Type | Description | +|-----------|------|-------------| +| instance_id | str | Cloud phone instance ID | +| timeout | int | Operation timeout (seconds), default 600 | +| auto_start | bool | Whether to automatically start cloud phone, default True | + +## Notes + +1. Ensure that Wuying Cloud Desktop/Cloud Phone service has been activated on Alibaba Cloud before use +2. Need to correctly configure corresponding environment variables +3. Cloud computer and cloud phone will incur corresponding resource costs +4. Some operations may require specific software or drivers to be installed in the target environment to function properly + +## Running Demo + +```bash +# Sandbox demo +python examples/cloud_api_sandbox/cloud_api_sandbox_demo.py +``` diff --git a/examples/cloud_api_sandbox/README_zh.md b/examples/cloud_api_sandbox/README_zh.md new file mode 100644 index 000000000..1a55e6dd6 --- /dev/null +++ b/examples/cloud_api_sandbox/README_zh.md @@ -0,0 +1,397 @@ +# Cloud Computer & Cloud Phone API Sandbox 文档 + +## 概述 + +Cloud Computer 和 Cloud Phone Sandbox 是基于阿里云无影云电脑和无影云手机API服务构建的 GUI 沙箱环境,允许用户远程控制云上的 Windows 桌面环境或 Android 手机环境。 + +## 功能特性 + +### 云电脑沙箱 (Cloud Computer Sandbox) + +- **环境类型**: Windows 桌面环境 +- **提供商**: 阿里云无影云电脑 +- **安全等级**: 高 +- **接入方式**: 无影云电脑企业版OpenAPI Python SDK调用 https://api.aliyun.com/document/ecd/2020-09-30/overview + +### 云手机沙箱 (Cloud Phone Sandbox) + +- **环境类型**: Android 手机环境 +- **提供商**: 阿里云无影云手机 +- **安全等级**: 高 +- **接入方式**: 无影云手机OpenAPI Python SDK调用 https://api.aliyun.com/document/eds-aic/2023-09-30/overview + +## 支持的操作 + +### 云电脑支持的工具操作 + +注意: 由于云电脑当前工具实现依赖于python3.10及以上环境,请确保你的云电脑环境已经安装了 Python 3.10 或更高版本,以及基础依赖包,和自定义依赖。 + 截图工具云电脑临时存放目录是在C盘下,需确保有该磁盘 + +#### 命令行工具 +- run_shell_command: 在 PowerShell 中运行命令 +- run_ipython_cell: 执行 Python 代码 +- write_file: 写入文件 +- read_file: 读取文件 +- remove_file: 删除文件 + +#### 输入模拟工具 +- press_key: 按键 +- click: 点击屏幕坐标 +- right_click: 右键点击 +- click_and_type: 点击并输入文本 +- append_text: 在指定位置追加文本 +- mouse_move: 鼠标移动 +- scroll: 滚动 +- scroll_pos: 在指定位置滚动 + +#### 系统控制工具 +- screenshot: 截图 +- go_home: 返回桌面 +- launch_app: 启动应用程序 + +### 云手机支持的工具操作 + +注意:当前输入文本工具是通过ADBKeyboard输入法结合粘贴板实现,所以请确保你的云手机已经安装ADBKeyboard.apk输入法。 + +#### 命令行工具 +- run_shell_command: 运行 ADB Shell 命令 + +#### 输入模拟工具 +- click: 点击屏幕坐标 +- type_text: 输入文本 +- slide: 滑动屏幕 + +#### 导航控制工具 +- go_home: 返回主屏幕 +- back: 返回按钮 +- menu: 菜单按钮 +- enter: 回车键 +- kill_front_app: 杀死前台应用 + +#### 系统工具 +- screenshot: 截图 +- send_file: 发送文件到云手机 +- remove_file: 删除云手机上的文件 + +#### 页面交互 +区别于Agentbay没有相关openapi可以查询远程页面链接,但是可以搭配无影客户端使用交互页面,或者参考无影WEBsdk,搭建一个前端html页面进行页面交互。 + +WEBsdk: https://wuying.aliyun.com/wuyingWebSdk/docs/intro/quick-start + +## Cloud Computer & Cloud Phone API Sandbox 集成进 Agentscope-Runtime: + +目前,Agentscope-Runtime 的沙箱容器基于 docker 实现,云上容器基于 k8s 实现;Cloud Computer & Cloud Phone API 集成进 AgentScope-Runtime,能够给使用 Agentscope-Runtime 提供另外一种云上沙箱环境的选择,可以使用除了 docker 容器沙箱之外,也可以选择使用无影云api沙箱; + +### 核心思路: + +核心思路是把 无影Cloud Computer & Cloud Phone API 封装成 Cloud Api Sandbox 集成进 AgentScope-Runtime,作为另外一种云沙箱的选择; +由于 Cloud Api Sandbox 并不依赖容器,所以创建 CloudSandbox 基类继承 Sandbox 类,这样就使得 Agentscope-Runtime 能够同时支持传统容器沙箱和云原生沙箱,在使用上与传统容器沙箱尽量保持一致; + +### 1. 核心架构集成 + +- **新增沙箱类型**: `SandboxType.CLOUD_COMPUTER`,`SandboxType.CLOUD_PHONE` 枚举,用于创建 Cloud Api Sandbox,支持动态枚举扩展; +- **CloudSandbox 基类**: 抽象基类,为云服务沙箱提供统一接口,不依赖容器管理,直接通过云 API 通信,可以支持不同云提供商扩展; +- **CloudComputerSandbox 实现**: 继承自 CloudSandbox,直接通过 WuYing Cloud Computer API 访问云端沙箱,实现完整的工具映射和错误处理; +- **CloudPhoneSandbox 实现**: 继承自 CloudSandbox,直接通过 WuYing Cloud Phone API 访问云端沙箱,实现完整的工具映射和错误处理; +- **SandboxService 支持**: 保持与原有 sandbox_service 调用方式的兼容性,特殊处理 Cloud Api 沙箱类型,资源清理; + +### 2. 类层次结构 + +``` +Sandbox (基类) +└── CloudSandbox (云沙箱基类) + ├── CloudComputerSandbox (云电脑实现) + └── CloudPhoneSandbox (云手机实现) +``` + +### 3. 文件结构 + +``` +src/agentscope_runtime/sandbox/ +├── enums.py # 新增 AGENTBAY 枚举 +├── box/ +│ ├── cloud/ +│ │ ├── __init__.py # 新增 +│ │ └── cloud_sandbox.py # 新增 CloudSandbox 基类 +│ └── cloud_api/ +│ ├── __init__.py # 新增 +│ └── cloud_computer_sandbox.py # 新增 CloudComputerSandbox 实现 +│ └── cloud_phone_sandbox.py # 新增 CloudPhoneSandbox 实现 +└── __init__.py # 更新导出 +``` + + +### 4. 服务层集成 + +- **注册机制**:使用 `@SandboxRegistry.register` 装饰器注册 +- **服务集成**:在 `SandboxService` 中特殊处理 CLOUD_COMPUTER,CLOUD_PHONE 类型 +- **兼容性**:保持与现有沙箱接口的完全兼容 +- **生命周期管理**: 支持创建、连接、释放 云资源 + +## 如何使用 + +### 1. 设置环境变量 + +##### 1.1.1 阿里云账号ak ,sk 获取 + 介绍文档: + https://help.aliyun.com/document_detail/53045.html?spm=5176.21213303.aillm.3.7df92f3d4XzQHZ&scm=20140722.S_%E9%98%BF%E9%87%8C%E4%BA%91sk._.RL_%E9%98%BF%E9%87%8C%E4%BA%91sk-LOC_aillm-OR_chat-V_3-RC_llm + +##### 1.1.2 oss开通 + 介绍文档: + https://help.aliyun.com/zh/oss/?spm=5176.29463013.J_AHgvE-XDhTWrtotIBlDQQ.8.68b834deqSKlrh + +备注:购买完后将账号凭证信息配置到下面环境变量中,也就是EDS_OSS_ 的配置 EDS_OSS_ACCESS_KEY相关的信息就是购买OSS的阿里云账号的ak,sk + +##### 1.1.3 无影云电脑开通 + 购买云电脑,建议企业版(个人版需要跟无影要一下EndUserId,用于配置环境变量ECD_USERNAME) +目前仅支持windos + + 无影个人版文档: + https://help.aliyun.com/zh/edsp?spm=a2c4g.11174283.d_help_search.i2 + 无影企业版文档: + https://help.aliyun.com/zh/wuying-workspace/product-overview/?spm=a2c4g.11186623.help-menu-68242.d_0.518d5bd7bpQxLq +购买完后将云电脑需要的信息配置到下面环境变量中,也就是ECD_ 的配置 + ALIBABA_CLOUD_ACCESS_KEY相关的信息就是购买云电脑的阿里云账号的ak,sk + +##### 1.1.4 无影云手机开通 +目前仅支持安卓系统 + + 控制台: + https://wya.wuying.aliyun.com/instanceLayouts + 帮助文档: + https://help.aliyun.com/zh/ecp/?spm=a2c4g.11186623.0.0.62dfe33avAMTwU + 购买完后将云电脑需要的信息配置到下面环境变量中,也就是EDS_ 的配置 + ALIBABA_CLOUD_ACCESS_KEY相关的信息就是购买云手机的阿里云账号的ak,sk + + +编辑当前目录下的.env.template文件或者设置环境变量 + +```bash +# 云电脑相关环境变量 +# 管控台授权用户名 +export ECD_USERNAME='' +export ECD_APP_STREAM_REGION_ID='cn-shanghai' +export DESKTOP_ID='' +export ECD_ALIBABA_CLOUD_REGION_ID='cn-hangzhou' +export ECD_ALIBABA_CLOUD_ENDPOINT='ecd.cn-hangzhou.aliyuncs.com' +export ECD_ALIBABA_CLOUD_ACCESS_KEY_ID='' +export ECD_ALIBABA_CLOUD_ACCESS_KEY_SECRET='' + +# 云手机相关环境变量 +export PHONE_INSTANCE_ID='' # 云手机实例ID +export EDS_ALIBABA_CLOUD_ENDPOINT='eds-aic.cn-shanghai.aliyuncs.com' +export EDS_ALIBABA_CLOUD_ACCESS_KEY_ID='' +export EDS_ALIBABA_CLOUD_ACCESS_KEY_SECRET='' + +# OSS存储相关环境变量 +export EDS_OSS_ACCESS_KEY_ID='' +export EDS_OSS_ACCESS_KEY_SECRET='' +export EDS_OSS_BUCKET_NAME='' +export EDS_OSS_ENDPOINT='' +export EDS_OSS_PATH='' + + +# docker 运行环境 $home 替换为用户主目录,直接使用云沙箱的方式下无需配置, +export DOCKER_HOST='unix:///$home/.colima/default/docker.sock' + +``` + +依赖安装 + +```bash +# 安装核心依赖 +pip install agentscope-runtime + +# 安装拓展 +pip install "agentscope-runtime[ext]" +``` + + +### 2. 云电脑python,依赖安装 + +以下所有命令都是在云电脑上的 PowerShell 中执行,可以通过下载无影客户端登录到电脑上执行: + +```powershell +# 设置下载路径和版本 +$version = "3.10.11" +$installerName = "python-$version-amd64.exe" +$downloadUrl = "https://mirrors.aliyun.com/python-release/windows/$installerName" +$pythonInstaller = "$env:TEMP\$installerName" + +# 默认安装路径(Python 3.10 安装到 Program Files) +$installDir = "C:\Program Files\Python310" +$scriptsDir = "$installDir\Scripts" + +# 下载 Python 安装包(使用阿里云镜像) +Write-Host "正在从阿里云下载 $installerName ..." -ForegroundColor Green +Invoke-WebRequest -Uri $downloadUrl -OutFile $pythonInstaller + +# 静默安装 Python(所有用户 + 尝试添加 PATH) +Write-Host "正在安装 Python $version ..." -ForegroundColor Green +Start-Process -Wait -FilePath $pythonInstaller -ArgumentList "/quiet InstallAllUsers=1 PrependPath=0" # 我们自己加 PATH,所以关闭内置的 + +# 删除安装包 +Remove-Item -Force $pythonInstaller + +# ========== 主动添加 Python 到系统 PATH ========== +Write-Host "正在将 Python 添加到系统环境变量 PATH ..." -ForegroundColor Green + +# 获取当前系统 PATH(Machine 级别) +$currentPath = [Environment]::GetEnvironmentVariable("Path", "Machine") -split ";" + +# 要添加的路径 +$pathsToAdd = @($installDir, $scriptsDir) + +# 检查并添加 +$updated = $false +foreach ($path in $pathsToAdd) { + if (-not $currentPath.Contains($path) -and (Test-Path $path)) { + $currentPath += $path + $updated = $true + Write-Host "已添加: $path" -ForegroundColor Cyan + } +} + +# 写回系统 PATH +if ($updated) { + $newPath = $currentPath -join ";" + [Environment]::SetEnvironmentVariable("Path", $newPath, "Machine") + Write-Host "系统 PATH 已更新。" -ForegroundColor Green +} else { + Write-Host "Python 路径已存在于系统 PATH 中。" -ForegroundColor Yellow +} + +# ========== 更新当前 PowerShell 会话的 PATH ========== +# 否则当前终端还不能使用 python 命令 +$env:Path = [Environment]::GetEnvironmentVariable("Path", "Machine") + ";" + [Environment]::GetEnvironmentVariable("Path", "User") + +# ========== 检查是否安装成功 ========== +Write-Host "`n检查安装结果:" -ForegroundColor Green +try { + python --version +} catch { + Write-Host "python 命令不可用,请重启终端。" -ForegroundColor Red +} + +try { + pip --version +} catch { + Write-Host "pip 命令不可用,请重启终端。" -ForegroundColor Red +} + +# 安装依赖包 +python -m pip install pyautogui -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install requests -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install pyperclip -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install pynput -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install aiohttp -i https://mirrors.aliyun.com/pypi/simple/ +python -m pip install asyncio -i https://mirrors.aliyun.com/pypi/simple/ + +``` + + +### 3. 直接使用云电脑沙箱 + +注意:需要先在阿里云控制台创建云电脑桌面和云手机实例。 + + +```python +from agentscope_runtime.sandbox import CloudComputerSandbox + +sandbox = CloudComputerSandbox( + desktop_id="your_desktop_id" +) + +# 运行PowerShell命令 +result = sandbox.call_tool("run_shell_command", {"command": "echo Hello World"}) +print(result["output"]) + +# 截图 +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_name": "screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` + + +### 4. 直接使用云手机沙箱 + +```python +from agentscope_runtime.sandbox import CloudPhoneSandbox + +sandbox = CloudPhoneSandbox( + instance_id="your_instance_id" +) + +# 点击屏幕坐标 +result = sandbox.call_tool( + "click", + { + "x1": 151, + "y1": 404, + "x2": 151, + "y2": 404, + "width": 716, + "height": 1280 + } + ) + +# 截图 +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_name": "screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` + + +### 5. 通过 SandboxService 使用 + +```python +from agentscope_runtime.sandbox.enums import SandboxType +from agentscope_runtime.engine.services.sandbox import SandboxService + +sandbox_service = SandboxService() +sandboxes = sandbox_service.connect( + session_id="session1", + user_id="user1", + sandbox_types=[SandboxType.CLOUD_COMPUTER, SandboxType.CLOUD_PHONE] +) +``` + + +## 配置参数 + +### 云电脑沙箱配置 + +| 参数 | 类型 | 描述 | +|------|------|------| +| desktop_id | str | 云电脑桌面ID | +| timeout | int | 操作超时时间(秒),默认600 | +| auto_wakeup | bool | 是否自动唤醒云电脑,默认True | +| screenshot_dir | str | 截图保存目录 | +| command_timeout | int | 命令执行超时时间(秒),默认60 | + +### 云手机沙箱配置 + +| 参数 | 类型 | 描述 | +|------|------|------| +| instance_id | str | 云手机实例ID | +| timeout | int | 操作超时时间(秒),默认600 | +| auto_start | bool | 是否自动启动云手机,默认True | + +## 注意事项 + +1. 使用前需要确保已在阿里云开通无影云电脑/云手机服务 +2. 需要正确配置相应的环境变量 +3. 云电脑和云手机会产生相应的资源费用 +4. 某些操作可能需要目标环境中安装特定软件或驱动才能正常工作 + + +## 运行演示 demo + +```bash +# 沙箱演示 +python examples/cloud_api_sandbox/cloud_api_sandbox_demo.py +``` diff --git a/examples/cloud_api_sandbox/cloud_api_sandbox_demo.py b/examples/cloud_api_sandbox/cloud_api_sandbox_demo.py new file mode 100644 index 000000000..82f2cc906 --- /dev/null +++ b/examples/cloud_api_sandbox/cloud_api_sandbox_demo.py @@ -0,0 +1,496 @@ +# -*- coding: utf-8 -*- +import os +import asyncio +import logging +from pathlib import Path +from dotenv import load_dotenv +from agentscope_runtime.sandbox.enums import SandboxType +from agentscope_runtime.sandbox import ( + CloudComputerSandbox, +) +from agentscope_runtime.sandbox import ( + CloudPhoneSandbox, +) +from agentscope_runtime.engine.services.sandbox import SandboxService + +# Configure logging +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + + +def load_env_variables() -> None: + """ + Load environment variables from .env file into system environment. + + This function loads all variables from .env file in the current directory + into the system environment variables, making them accessible via + os.getenv(). + Variables already present in system environment are not overridden. + """ + current_dir = Path(__file__).parent + env_file = current_dir / ".env.template" + + if env_file.exists(): + # Load environment variables from .env file into system environment + load_dotenv(env_file, override=False) + + from dotenv import dotenv_values + + env_vars = dotenv_values(env_file) + for k, v in env_vars.items(): + os.environ[k] = v + + +def test_cloud_pc_api_sandbox_direct(): + """ + Test Cloud api sandbox directly without sandbox service. + """ + + try: + load_env_variables() + try: + desktop_id = os.getenv("DESKTOP_ID") + sandbox = CloudComputerSandbox( + desktop_id=desktop_id, + ) + + logger.info( + f"Created sandbox with ID: {sandbox.desktop_id}", + ) + + # Test basic operations + result = sandbox.call_tool( + "run_shell_command", + {"command": "echo 'Hello from Cloud PC Api!'"}, + ) + logger.info(f"Command result: {result}") + + result = sandbox.call_tool( + "run_ipython_cell", + {"code": "print('hellow!')"}, + ) + logger.info(f"run_ipython_cell result: {result}") + + result = sandbox.call_tool( + "launch_app", + {"name": "File Explorer"}, + ) + logger.info(f"launch_app result: {result}") + + result = sandbox.call_tool( + "screenshot", + {"file_name": "screenshot.png"}, + ) + logger.info(f"screenshot result: {result}") + + result = sandbox.call_tool( + "write_file", + { + "file_path": "C:/test.txt", + "content": "welcome cloud api test !", + }, + ) + logger.info(f"write_file result: {result}") + + result = sandbox.call_tool( + "read_file", + { + "file_path": "C:/test.txt", + }, + ) + logger.info(f"read_file result: {result}") + + result = sandbox.call_tool( + "remove_file", + { + "file_path": "C:/test.txt", + }, + ) + logger.info(f"read_file result: {result}") + + result = sandbox.call_tool( + "go_home", + {}, + ) + logger.info(f"go_home result: {result}") + result = sandbox.call_tool( + "press_key", + { + "key": "home", + }, + ) + logger.info(f"press_key result: {result}") + + result = sandbox.call_tool( + "click", + { + "x": 151, + "y": 404, + "count": 2, + }, + ) + logger.info(f"click result: {result}") + + result = sandbox.call_tool( + "right_click", + { + "x": 151, + "y": 404, + "count": 1, + }, + ) + logger.info(f"click result: {result}") + + result = sandbox.call_tool( + "click_and_type", + { + "x": 151, + "y": 404, + "text": "你好", + }, + ) + logger.info(f"click result: {result}") + + result = sandbox.call_tool( + "append_text", + { + "x": 151, + "y": 404, + "text": "你好", + }, + ) + logger.info(f"click result: {result}") + result = sandbox.call_tool( + "mouse_move", + { + "x": 151, + "y": 404, + }, + ) + logger.info(f"mouse_move result: {result}") + + result = sandbox.call_tool( + "scroll", + { + "pixels": -5, + }, + ) + logger.info(f"scroll result: {result}") + + result = sandbox.call_tool( + "scroll_pos", + { + "x": 954, + "y": 537, + "pixels": -5, + }, + ) + logger.info(f"scroll_pos result: {result}") + + # Cleanup + sandbox._cleanup() # pylint: disable=protected-access + logger.info("Cloud PC Api sandbox test completed successfully") + return True + + except ImportError as e: + logger.warning(f"Cloud PC Api not installed: {e}") + logger.info("This is expected if Cloud Api PC is not available") + return True # Consider this a pass since integration is correct + + except Exception as e: + logger.error(f"Cloud PC Api sandbox test failed: {e}") + return False + + +def test_cloud_phone_api_sandbox_direct(): + """ + Test Cloud api sandbox directly without sandbox service. + """ + + try: + load_env_variables() + try: + instance_ids = os.getenv("PHONE_INSTANCE_ID") + + sandbox = CloudPhoneSandbox( + instance_id=instance_ids, + ) + + logger.info( + f"Created sandbox with ID: {sandbox.instance_id}", + ) + + # Test basic operations + result = sandbox.call_tool( + "run_shell_command", + {"command": "echo 'Hello from Cloud Phone Api!'"}, + ) + logger.info(f"Command result: {result}") + + result = sandbox.call_tool( + "screenshot", + {"file_name": "screenshot.png"}, + ) + logger.info(f"screenshot result: {result}") + + result = sandbox.call_tool( + "send_file", + { + "source_file_path": "/sdcard/Download/dog_and_girl.jpeg", + "upload_url": "https://help-static-aliyun-doc.aliyuncs.com" + "/file-manage-files/" + "zh-CN/20241022/emyrja/dog_and_girl.jpeg", + }, + ) + logger.info(f"send_file result: {result}") + + result = sandbox.call_tool( + "remove_file", + {"file_path": "/sdcard/Download/dog_and_girl.jpeg"}, + ) + logger.info(f"remove_file result: {result}") + + result = sandbox.call_tool( + "click", + { + "x1": 151, + "y1": 404, + "x2": 151, + "y2": 404, + "width": 716, + "height": 1280, + }, + ) + logger.info(f"click result: {result}") + + result = sandbox.call_tool( + "slide", + { + "x1": 366, + "y1": 1123, + "x2": 366, + "y2": 330, + "width": 716, + "height": 1280, + }, + ) + logger.info(f"slide result: {result}") + # 当前文字输入依赖于ADBKeyboard输入法,需提前安装 + result = sandbox.call_tool( + "type_text", + { + "text": "阿里巴巴", + }, + ) + logger.info(f"type_text result: {result}") + + result = sandbox.call_tool( + "enter", + {}, + ) + logger.info(f"enter result: {result}") + + result = sandbox.call_tool( + "back", + {}, + ) + logger.info(f"back result: {result}") + + result = sandbox.call_tool( + "kill_front_app", + {}, + ) + logger.info(f"kill_front_app result: {result}") + + result = sandbox.call_tool( + "menu", + {}, + ) + logger.info(f"menu result: {result}") + + result = sandbox.call_tool( + "go_home", + {}, + ) + logger.info(f"go_home result: {result}") + + # Cleanup + sandbox._cleanup() # pylint: disable=protected-access + logger.info("Cloud Phone Api sandbox test completed successfully") + return True + + except ImportError as e: + logger.warning(f"Cloud Phone Api not installed: {e}") + logger.info("This is expected if AgentBay SDK is not available") + return True # Consider this a pass since integration is correct + + except Exception as e: + logger.error(f"Cloud Phone Api sandbox test failed: {e}") + return False + + +async def test_cloud_pc_api_sandbox_service(): + """ + Test Cloud PC Api sandbox via SandboxService and EnvironmentManager. + """ + try: + load_env_variables() + + # Create environment manager context + async with SandboxService() as service: + sandboxes = service.connect( + session_id="demo_service_session", + user_id="demo_user", + sandbox_types=[SandboxType.CLOUD_COMPUTER], + ) + + if not sandboxes: + print("No sandboxes returned by SandboxService") + logger.error("No sandboxes returned by SandboxService") + return False + + sandbox = sandboxes[0] + print( + "Connected Cloud Api PC sandbox via service" + f": {sandbox.sandbox_id} ", + ) + logger.info( + f"Connected Cloud Api PC sandbox via service: " + f"{sandbox.sandbox_id}", + ) + + # Test basic operations + result = sandbox.call_tool( + "run_shell_command", + {"command": "echo 'Hello from Cloud PC Api!'"}, + ) + print("Command result:", result) + logger.info(f"Command result: {result}") + + result = sandbox.call_tool( + "screenshot", + {"file_name": "screenshot.png"}, + ) + print("screenshot result:", result) + logger.info(f"screenshot result: {result}") + + logger.info("Cloud PC Api sandbox service test completed successfully") + return True + except ImportError as e: + logger.warning(f"Cloud PC Api not installed: {e}") + logger.info("This is expected if Cloud Api PC is not available") + return True + except Exception as e: + logger.error(f"Cloud PC Api sandbox service test failed: {e}") + return False + + +async def test_cloud_phone_api_sandbox_service(): + """ + Test Cloud Api sandbox via SandboxService and EnvironmentManager. + """ + logger.info("Testing Cloud Phone API sandbox via SandboxService...") + + try: + load_env_variables() + + # Create environment manager context + async with SandboxService() as service: + sandboxes = service.connect( + session_id="demo_service_session", + user_id="demo_user", + sandbox_types=[SandboxType.CLOUD_PHONE], + ) + + if not sandboxes: + print("No sandboxes returned by SandboxService") + logger.error("No sandboxes returned by SandboxService") + return False + + sandbox = sandboxes[0] + print( + "Connected Cloud Phone Api sandbox via service" + f": {sandbox.sandbox_id} ", + ) + logger.info( + f"Connected Cloud Phone Api sandbox via service: " + f"{sandbox.sandbox_id}", + ) + + # Test basic operations + result = sandbox.call_tool( + "run_shell_command", + {"command": "echo 'Hello from Cloud Phone Api!'"}, + ) + print("Command result:", result) + logger.info(f"Command result: {result}") + + result = sandbox.call_tool( + "screenshot", + {"file_name": "screenshot.png"}, + ) + print("screenshot result:", result) + logger.info(f"screenshot result: {result}") + + logger.info( + "Cloud Phone Api sandbox service test completed successfully", + ) + return True + except ImportError as e: + logger.warning(f"Cloud Phone Api not installed: {e}") + logger.info("This is expected if AgentBay SDK is not available") + return True + except Exception as e: + logger.error(f"Cloud Phone Api sandbox service test failed: {e}") + return False + + +async def main(): + """ + Run all tests. + """ + logger.info("Starting AgentBay integration tests...") + + tests = [ + ("Cloud Api Sandbox Service", test_cloud_pc_api_sandbox_service), + ("Cloud Api Sandbox Service", test_cloud_phone_api_sandbox_service), + ("AgentBay Sandbox Direct", test_cloud_pc_api_sandbox_direct()), + ("AgentBay Sandbox Direct", test_cloud_phone_api_sandbox_direct()), + ] + + results = [] + for test_name, test_func in tests: + logger.info(f"\n--- Running {test_name} ---") + try: + if asyncio.iscoroutinefunction(test_func): + result = await test_func() + else: + result = test_func() + results.append((test_name, result)) + except Exception as e: + logger.error(f"Test {test_name} failed with exception: {e}") + results.append((test_name, False)) + + # Summary + logger.info("\n--- Test Results Summary ---") + passed = 0 + for test_name, result in results: + status = "PASSED" if result else "FAILED" + logger.info(f"{test_name}: {status}") + if result: + passed += 1 + + logger.info(f"\nTotal: {passed}/{len(results)} tests passed") + + if passed == len(results): + logger.info( + "🎉 All tests passed! AgentBay integration is working correctly.", + ) + else: + logger.warning( + "⚠️ Some tests failed. Check the logs above for details.", + ) + + +if __name__ == "__main__": + asyncio.run(main()) diff --git a/examples/e2b_sandbox/.env.template b/examples/e2b_sandbox/.env.template new file mode 100644 index 000000000..cbc0c96ee --- /dev/null +++ b/examples/e2b_sandbox/.env.template @@ -0,0 +1,4 @@ +# E2B API Key +E2B_API_KEY= +# docker 运行环境 $home 替换为用户主目录,直接使用云沙箱的方式下无需配置,unix:///$home/.colima/default/docker.sock +DOCKER_HOST='' \ No newline at end of file diff --git a/examples/e2b_sandbox/README.md b/examples/e2b_sandbox/README.md new file mode 100644 index 000000000..ac87d018a --- /dev/null +++ b/examples/e2b_sandbox/README.md @@ -0,0 +1,172 @@ +# E2B Desktop Sandbox Documentation + +## Overview + +E2bSandBox is a GUI sandbox environment built on the E2B cloud desktop service that allows users to remotely control desktop environments in the cloud. + +## Features + +### E2B Desktop Sandbox (E2bSandBox) + +- **Environment Type**: Cloud desktop environment +- **Provider**: E2B Desktop +- **Security Level**: High +- **Access Method**: E2B Desktop Python SDK invocation + +## Supported Operations + +### Desktop Control Tools + +- click: Click screen coordinates +- right_click: Right-click +- type_text: Input text +- press_key: Press key +- launch_app: Launch application +- click_and_type: Click and input text + +### Command Line Tools + +- run_shell_command: Run shell commands + +### System Tools + +- screenshot: Take screenshots + +## Integration with Agentscope-Runtime + +The E2B Desktop Sandbox has been integrated into Agentscope-Runtime, providing a similar user experience to Docker sandboxes. + +## E2B Sandbox Integration into Agentscope-Runtime: + +Currently, Agentscope-Runtime's sandbox containers are implemented based on Docker, and cloud containers are implemented based on Kubernetes. Integrating E2B Sandbox into AgentScope-Runtime provides users with another choice for cloud sandbox environments, allowing them to choose between Docker container sandboxes and E2B sandboxes. + +### Core Concept: + +The core idea is to encapsulate the E2B Sandbox as a Sandbox integration into AgentScope-Runtime, serving as another cloud sandbox option. Since E2B Sandbox does not depend on containers, we create the [CloudSandbox](file:///Users/zlh/PycharmProjects/1/agentscope-runtime/src/agentscope_runtime/sandbox/box/cloud/cloud_sandbox.py#L18-L253) base class inheriting from the [Sandbox](file:///Users/zlh/PycharmProjects/1/agentscope-runtime/src/agentscope_runtime/sandbox/box/sandbox.py#L14-L170) class. This enables Agentscope-Runtime to support both traditional container sandboxes and cloud-native sandboxes, maintaining consistency with traditional container sandboxes in usage. + +### 1. Core Architecture Integration + +- **New Sandbox Type**: [SandboxType.E2B](file:///Users/zlh/PycharmProjects/1/agentscope-runtime/src/agentscope_runtime/sandbox/enums.py#L74-L74) enumeration for creating E2B Sandboxes, supporting dynamic enumeration extension +- **CloudSandbox Base Class**: Abstract base class providing unified interface for cloud service sandboxes, not dependent on container management, communicating directly through cloud APIs, extensible to different cloud providers +- **E2bSandBox Implementation**: Inherits from [CloudSandbox](file:///Users/zlh/PycharmProjects/1/agentscope-runtime/src/agentscope_runtime/sandbox/box/cloud/cloud_sandbox.py#L18-L253), accesses cloud sandboxes directly through E2B SDK, implementing complete tool mapping and error handling +- **SandboxService Support**: Maintains compatibility with existing [sandbox_service](file:///Users/zlh/PycharmProjects/1/agentscope-runtime/src/agentscope_runtime/engine/services/sandbox/sandbox_service.py#L0-L210) calling methods, specially handles E2B sandbox types, resource cleanup + +### 2. Class Hierarchy Structure + +``` +Sandbox (Base Class) +└── CloudSandbox (Cloud Sandbox Base Class) + └── E2bSandBox (E2B Desktop Implementation) +``` + + +### 3. File Structure + +``` +src/agentscope_runtime/sandbox/ +├── enums.py # Added AGENTBAY enumeration +├── box/ +│ ├── cloud/ +│ │ ├── __init__.py # Added +│ │ └── cloud_sandbox.py # Added CloudSandbox base class +│ └── e2b/ +│ ├── __init__.py # Added +│ └── e2b_sandbox.py # Added E2bSandBox implementation +└── __init__.py # Updated exports +``` + + +### 4. Service Layer Integration + +- **Registration Mechanism**: Register using [@SandboxRegistry.register](file:///Users/zlh/PycharmProjects/1/agentscope-runtime/src/agentscope_runtime/sandbox/registry.py#L38-L89) decorator +- **Service Integration**: Special handling of E2B types in [SandboxService](file:///Users/zlh/PycharmProjects/1/agentscope-runtime/src/agentscope_runtime/engine/services/sandbox/sandbox_service.py#L10-L209) +- **Compatibility**: Full compatibility with existing sandbox interfaces +- **Lifecycle Management**: Supports creation, connection, and release of cloud resources + +## How to Use + +### 1. Set Environment Variables + +Configure authentication information according to E2B official documentation. +##### 1.1.1 E2B Activation +Visit the E2B website to register and obtain credentials, then configure E2B_API_KEY +https://e2b.dev + +Edit the .env.template file in the current directory or set environment variables + +```bash +# E2B API Key +export E2B_API_KEY= +# Docker runtime environment $home replaced with user home directory, no configuration needed when using cloud sandbox directly, unix:///$home/.colima/default/docker.sock +export DOCKER_HOST='' +``` + + +Dependency Installation + +```bash +# Install core dependencies +pip install agentscope-runtime + +# Install extensions +pip install "agentscope-runtime[ext]" +``` + + +### 2. Direct Usage of E2B Desktop Sandbox + +```python +import os +from agentscope_runtime.sandbox import E2bSandBox + +sandbox = E2bSandBox() + +# Run shell command +result = sandbox.call_tool("run_shell_command", {"command": "echo Hello World"}) +print(result["output"]) + +# Screenshot +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_path": f"{os.getcwd()}/screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` + + +### 3. Using via SandboxService + +```python +from agentscope_runtime.sandbox.enums import SandboxType +from agentscope_runtime.engine.services.sandbox import SandboxService + +sandbox_service = SandboxService() +sandboxes = sandbox_service.connect( + session_id="session1", + user_id="user1", + sandbox_types=[SandboxType.E2B] +) +``` + + +## Configuration Parameters + +### E2B Desktop Sandbox Configuration + +| Parameter | Type | Description | +|-----------|------|-------------| +| timeout | int | Operation timeout (seconds), default 600 | +| command_timeout | int | Command execution timeout (seconds), default 60 | + +## Notes + +1. Ensure E2B service is registered and configured before use +2. Need to properly configure corresponding environment variables +3. E2B service will incur corresponding resource costs + +## Running Demo + +```bash +# Sandbox demo +python examples/e2b_sandbox/e2b_sandbox_demo.py +``` diff --git a/examples/e2b_sandbox/README_zh.md b/examples/e2b_sandbox/README_zh.md new file mode 100644 index 000000000..7e5df4a8f --- /dev/null +++ b/examples/e2b_sandbox/README_zh.md @@ -0,0 +1,170 @@ +# E2B Desktop Sandbox 文档 + +## 概述 + +E2bSandBox 是基于 E2B 云桌面服务构建的 GUI 沙箱环境,允许用户远程控制云上的桌面环境。 + +## 功能特性 + +### E2B 桌面沙箱 (E2bSandBox) + +- **环境类型**: 云桌面环境 +- **提供商**: E2B Desktop +- **安全等级**: 高 +- **接入方式**: E2B Desktop Python SDK 调用 + +## 支持的操作 + +### 桌面控制工具 + +- click: 点击屏幕坐标 +- right_click: 右键点击 +- type_text: 输入文本 +- press_key: 按键 +- launch_app: 启动应用程序 +- click_and_type: 点击并输入文本 + +### 命令行工具 + +- run_shell_command: 运行 shell 命令 + +### 系统工具 + +- screenshot: 截图 + +## 集成到 Agentscope-Runtime + +E2B Desktop Sandbox 已经被集成到 Agentscope-Runtime 中,提供了与 Docker 沙箱类似的使用体验。 + +## E2B Sandbox 集成进 Agentscope-Runtime: + +目前,Agentscope-Runtime 的沙箱容器基于 docker 实现,云上容器基于 k8s 实现;E2B Sandbox 集成进 AgentScope-Runtime,能够给使用 Agentscope-Runtime 提供另外一种云上沙箱环境的选择,可以使用除了 docker 容器沙箱之外,也可以选择使用e2b沙箱; + +### 核心思路: + +核心思路是把 E2B Sandbox 封装成 Sandbox 集成进 AgentScope-Runtime,作为另外一种云沙箱的选择; +由于 E2B Sandbox 并不依赖容器,所以创建 CloudSandbox 基类继承 Sandbox 类,这样就使得 Agentscope-Runtime 能够同时支持传统容器沙箱和云原生沙箱,在使用上与传统容器沙箱尽量保持一致; + +### 1. 核心架构集成 + +- **新增沙箱类型**: `SandboxType.E2B` 枚举,用于创建 E2B Sandbox,支持动态枚举扩展; +- **CloudSandbox 基类**: 抽象基类,为云服务沙箱提供统一接口,不依赖容器管理,直接通过云 API 通信,可以支持不同云提供商扩展; +- **E2bSandBox 实现**: 继承自 CloudSandbox,直接通过 E2b sdk 访问云端沙箱,实现完整的工具映射和错误处理; +- **SandboxService 支持**: 保持与原有 sandbox_service 调用方式的兼容性,特殊处理 E2b 沙箱类型,资源清理; + +### 2. 类层次结构 + +``` +Sandbox (基类) +└── CloudSandbox (云沙箱基类) + └── E2bSandBox (E2B桌面实现) +``` + +### 3. 文件结构 + +``` +src/agentscope_runtime/sandbox/ +├── enums.py # 新增 AGENTBAY 枚举 +├── box/ +│ ├── cloud/ +│ │ ├── __init__.py # 新增 +│ │ └── cloud_sandbox.py # 新增 CloudSandbox 基类 +│ └── e2b/ +│ ├── __init__.py # 新增 +│ └── e2b_sandbox.py # 新增 E2bSandBox 实现 +└── __init__.py # 更新导出 +``` + + +### 4. 服务层集成 + +- **注册机制**:使用 `@SandboxRegistry.register` 装饰器注册 +- **服务集成**:在 `SandboxService` 中特殊处理 E2B 类型 +- **兼容性**:保持与现有沙箱接口的完全兼容 +- **生命周期管理**: 支持创建、连接、释放 云资源 + +## 如何使用 + +### 1. 设置环境变量 + +根据 E2B 官方文档配置相应的认证信息。 +##### 1.1.1 E2B 开通 + 访问E2B官网注册并获取,然后配置到E2B_API_KEY + https://e2b.dev + +编辑当前目录下的.env.template文件或者设置环境变量 + +```bash +# E2B API Key +export E2B_API_KEY= +# docker 运行环境 $home 替换为用户主目录,直接使用云沙箱的方式下无需配置,unix:///$home/.colima/default/docker.sock +export DOCKER_HOST='' + +``` + +依赖安装 + +```bash +# 安装核心依赖 +pip install agentscope-runtime + +# 安装拓展 +pip install "agentscope-runtime[ext]" +``` + + +### 2. 直接使用 E2B 桌面沙箱 + +```python +import os +from agentscope_runtime.sandbox import E2bSandBox + +sandbox = E2bSandBox() + +# 运行shell命令 +result = sandbox.call_tool("run_shell_command", {"command": "echo Hello World"}) +print(result["output"]) + +# 截图 +result_screenshot = sandbox.call_tool( + "screenshot", + {"file_path": f"{os.getcwd()}/screenshot.png"}, + ) +print(f"screenshot result: {result_screenshot}") +``` +### 3. 通过 SandboxService 使用 + +```python +from agentscope_runtime.sandbox.enums import SandboxType +from agentscope_runtime.engine.services.sandbox import SandboxService + +sandbox_service = SandboxService() +sandboxes = sandbox_service.connect( + session_id="session1", + user_id="user1", + sandbox_types=[SandboxType.E2B] +) +``` +## 配置参数 + +### E2B 桌面沙箱配置 + +| 参数 | 类型 | 描述 | +|------|------|------| +| timeout | int | 操作超时时间(秒),默认600 | +| command_timeout | int | 命令执行超时时间(秒),默认60 | + +## 注意事项 + +1. 使用前需要确保已注册并配置好 E2B 服务 +2. 需要正确配置相应的环境变量 +3. E2B 服务会产生相应的资源费用 +``` + + +## 运行演示 demo + +```bash +# 沙箱演示 +python examples/e2b_sandbox/e2b_sandbox_demo.py +``` diff --git a/examples/e2b_sandbox/e2b_sandbox_demo.py b/examples/e2b_sandbox/e2b_sandbox_demo.py new file mode 100644 index 000000000..833721c14 --- /dev/null +++ b/examples/e2b_sandbox/e2b_sandbox_demo.py @@ -0,0 +1,185 @@ +# -*- coding: utf-8 -*- +import os +import asyncio +import logging +import time +from pathlib import Path +from dotenv import load_dotenv +from agentscope_runtime.sandbox.enums import SandboxType +from agentscope_runtime.sandbox import ( + E2bSandBox, +) +from agentscope_runtime.engine.services.sandbox import SandboxService + +# Configure logging +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + + +def load_env_variables() -> None: + """ + Load environment variables from .env file into system environment. + + This function loads all variables from .env file in the current directory + into the system environment variables, + making them accessible via os.getenv(). + Variables already present in system environment are not overridden. + """ + current_dir = Path(__file__).parent + env_file = current_dir / ".env.template" + + if env_file.exists(): + # Load environment variables from .env file into system environment + load_dotenv(env_file, override=False) + + from dotenv import dotenv_values + + env_vars = dotenv_values(env_file) + for k, v in env_vars.items(): + os.environ[k] = v + + +def test_e2b_sandbox_direct(): + """ + Test e2b sandbox directly without sandbox service. + """ + + try: + load_env_variables() + try: + sandbox = E2bSandBox() + # Wait for sandbox to be ready + time.sleep(5) + + # Test basic operations + result = sandbox.call_tool( + "run_shell_command", + {"command": "echo 'Hello from e2b sandbox!'"}, + ) + logger.info(f"Command result: {result}") + + result = sandbox.call_tool( + "screenshot", + {"file_path": f"{os.getcwd()}/screenshot.png"}, + ) + + logger.info(f"screenshot result: {result}") + + # Cleanup + sandbox._cleanup() # pylint: disable=protected-access + logger.info("E2B sandbox test completed successfully") + return True + + except ImportError as e: + logger.warning(f"E2B sandbox not installed: {e}") + logger.info("This is expected if E2B sandbox is not available") + return True # Consider this a pass since integration is correct + + except Exception as e: + logger.error(f"E2B sandbox test failed: {e}") + return False + + +async def test_e2b_sandbox_service(): + """ + Test E2B sandbox via SandboxService and EnvironmentManager. + """ + try: + load_env_variables() + + # Initialize sandbox service + + # Create environment manager context + async with SandboxService() as service: + sandboxes = service.connect( + session_id="demo_service_session", + user_id="demo_user", + sandbox_types=[SandboxType.E2B], + ) + if not sandboxes: + print("No sandboxes returned by SandboxService") + logger.error("No sandboxes returned by SandboxService") + return False + + sandbox = sandboxes[0] + print(f"Connected E2B sandbox via service: {sandbox.sandbox_id} ") + logger.info( + f"Connected E2B sandbox via service: " f"{sandbox.sandbox_id}", + ) + + # Wait for sandbox to be ready + time.sleep(5) + + # Test basic operations + result = sandbox.call_tool( + "run_shell_command", + {"command": "echo 'Hello from E2B!'"}, + ) + logger.info(f"Command result: {result}") + + result = sandbox.call_tool( + "screenshot", + {"file_path": f"{os.getcwd()}/screenshot.png"}, + ) + + logger.info(f"screenshot result: {result}") + + logger.info("E2B sandbox service test completed successfully") + return True + except ImportError as e: + logger.warning(f"E2B sandbox not installed: {e}") + logger.info("This is expected if E2B sandbox is not available") + return True + except Exception as e: + logger.error(f"E2B sandbox service test failed: {e}") + return False + + +async def main(): + """ + Run all tests. + """ + logger.info("Starting E2B sandbox integration tests...") + + tests = [ + ("E2B sandbox Service", test_e2b_sandbox_service), + ("AgentBay Sandbox Direct", test_e2b_sandbox_direct()), + ] + + results = [] + for test_name, test_func in tests: + logger.info(f"\n--- Running {test_name} ---") + try: + if asyncio.iscoroutinefunction(test_func): + result = await test_func() + else: + result = test_func() + results.append((test_name, result)) + except Exception as e: + logger.error(f"Test {test_name} failed with exception: {e}") + results.append((test_name, False)) + + # Summary + logger.info("\n--- Test Results Summary ---") + passed = 0 + for test_name, result in results: + status = "PASSED" if result else "FAILED" + logger.info(f"{test_name}: {status}") + if result: + passed += 1 + + logger.info(f"\nTotal: {passed}/{len(results)} tests passed") + + if passed == len(results): + logger.info( + "🎉 All tests passed! E2B sandbox integration is working" + " correctly.", + ) + else: + logger.warning( + "⚠️ Some tests failed. Check the logs above for details.", + ) + + +if __name__ == "__main__": + asyncio.run(main()) diff --git a/pyproject.toml b/pyproject.toml index 9b367afa4..3cf36ced6 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -91,6 +91,12 @@ ext = [ "PyYAML", "agno>=2.3.8", "nacos-sdk-python>=3.0.0", + "aiohttp", + "alibabacloud_ecd20200930", + "alibabacloud_eds_aic20230930", + "alibabacloud_appstream_center20210218", + "e2b-desktop>=2.0.0", + "trio" ] [tool.pytest.ini_options] diff --git a/src/agentscope_runtime/engine/services/sandbox/sandbox_service.py b/src/agentscope_runtime/engine/services/sandbox/sandbox_service.py index c75487ba2..be507a26b 100644 --- a/src/agentscope_runtime/engine/services/sandbox/sandbox_service.py +++ b/src/agentscope_runtime/engine/services/sandbox/sandbox_service.py @@ -84,7 +84,12 @@ def _create_new_environment( box_type = SandboxType(env_type) - if box_type != SandboxType.AGENTBAY: + if box_type not in ( + SandboxType.AGENTBAY, + SandboxType.CLOUD_PHONE, + SandboxType.CLOUD_COMPUTER, + SandboxType.E2B, + ): box_id = self.manager_api.create_from_pool( sandbox_type=box_type.value, meta={"session_ctx_id": session_ctx_id}, diff --git a/src/agentscope_runtime/sandbox/__init__.py b/src/agentscope_runtime/sandbox/__init__.py index 744623b82..ed2efa7ac 100644 --- a/src/agentscope_runtime/sandbox/__init__.py +++ b/src/agentscope_runtime/sandbox/__init__.py @@ -11,6 +11,9 @@ from .box.cloud.cloud_sandbox import CloudSandbox from .box.mobile.mobile_sandbox import MobileSandbox from .box.agentbay.agentbay_sandbox import AgentbaySandbox +from .box.cloud_api.cloud_phone_sandbox import CloudPhoneSandbox +from .box.cloud_api.cloud_computer_sandbox import CloudComputerSandbox +from .box.e2b.e2b_sandbox import E2bSandBox __all__ = [ "BaseSandbox", @@ -21,4 +24,7 @@ "CloudSandbox", "MobileSandbox", "AgentbaySandbox", + "CloudPhoneSandbox", + "CloudComputerSandbox", + "E2bSandBox", ] diff --git a/src/agentscope_runtime/sandbox/box/cloud_api/__init__.py b/src/agentscope_runtime/sandbox/box/cloud_api/__init__.py new file mode 100644 index 000000000..bf43df099 --- /dev/null +++ b/src/agentscope_runtime/sandbox/box/cloud_api/__init__.py @@ -0,0 +1,5 @@ +# -*- coding: utf-8 -*- +from .cloud_phone_sandbox import CloudPhoneSandbox +from .cloud_computer_sandbox import CloudComputerSandbox + +__all__ = ["CloudPhoneSandbox", "CloudComputerSandbox"] diff --git a/src/agentscope_runtime/sandbox/box/cloud_api/client/__init__.py b/src/agentscope_runtime/sandbox/box/cloud_api/client/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/src/agentscope_runtime/sandbox/box/cloud_api/client/cloud_computer_wy.py b/src/agentscope_runtime/sandbox/box/cloud_api/client/cloud_computer_wy.py new file mode 100644 index 000000000..5df1af99e --- /dev/null +++ b/src/agentscope_runtime/sandbox/box/cloud_api/client/cloud_computer_wy.py @@ -0,0 +1,2410 @@ +# -*- coding: utf-8 -*- +import os +import time +import asyncio +import threading +from typing import List, Tuple, Any +import logging +from pydantic import BaseModel +from alibabacloud_tea_openapi import models as open_api_models +from alibabacloud_ecd20200930.client import Client as ecd20200930Client +from alibabacloud_ecd20200930 import models as ecd_20200930_models +from alibabacloud_appstream_center20210218 import ( + models as appstream_center_20210218_models, +) +from alibabacloud_appstream_center20210218.client import ( + Client as appstream_center20210218Client, +) +from alibabacloud_tea_util import models as util_models +from alibabacloud_tea_util.client import Client as UtilClient +from agentscope_runtime.sandbox.box.cloud_api.utils.oss_client import OSSClient +from ..utils.utils import ( + download_oss_image_and_save, + download_oss_image_and_save_async, +) + + +logger = logging.getLogger(__name__) + +execute_wait_time_: int = 3 + + +class CommandQueryError(Exception): + """Command query status error exception""" + + +class InitError(Exception): + """Initialization exception""" + + +class ClientPool: + """Client pool manager - singleton pattern + managing shared client instances""" + + _instance = None + _lock = threading.Lock() + + def __new__(cls): + if cls._instance is None: + with cls._lock: + if cls._instance is None: + cls._instance = super().__new__(cls) + return cls._instance + + def __init__(self): + # Use double-checked locking pattern to ensure initialization only once + if not hasattr(self, "_initialized"): + with self._lock: + if not hasattr(self, "_initialized"): + self._ecd_client = None + self._oss_client = None + self._app_stream_client = None + self._instance_managers = ( + {} + ) # Cache EcdInstanceManager by desktop_id + # Use different locks to avoid deadlocks + self._ecd_lock = threading.Lock() + self._oss_lock = threading.Lock() + self._app_stream_lock = threading.Lock() + self._instance_manager_lock = threading.Lock() + self._initialized = True + + def get_ecd_client(self) -> "EcdClient": + """Get shared EcdClient instance""" + if self._ecd_client is None: + with self._ecd_lock: + if self._ecd_client is None: + self._ecd_client = EcdClient() + return self._ecd_client + + def get_oss_client(self) -> OSSClient: + """Get shared OSSClient instance""" + if self._oss_client is None: + with self._oss_lock: + if self._oss_client is None: + bucket_name = os.environ.get("EDS_OSS_BUCKET_NAME") + endpoint = os.environ.get("EDS_OSS_ENDPOINT") + self._oss_client = OSSClient(bucket_name, endpoint) + return self._oss_client + + def get_app_stream_client( + self, + ) -> "AppStreamClient": + """Get AppStreamClient instance, create new + instance on each call (non-shared mode)""" + # Create new AppStreamClient instance each + # time, do not use cache + return AppStreamClient() + + def get_instance_manager( + self, + desktop_id: str, + ) -> "EcdInstanceManager": + """Get EcdInstanceManager instance for + specified desktop_id""" + # Check if it already exists first to avoid + # unnecessary lock contention + if desktop_id in self._instance_managers: + return self._instance_managers[desktop_id] + + # Pre-fetch clients outside lock to avoid deadlock + ecd_client = self.get_ecd_client() + oss_client = self.get_oss_client() + app_stream_client = self.get_app_stream_client() + + # Use dedicated lock to manage instance managers + with self._instance_manager_lock: + # Check again to prevent creation by another + # thread while waiting for lock + if desktop_id not in self._instance_managers: + # Create new instance manager and pass + # in shared clients + manager = EcdInstanceManager(desktop_id) + manager.ecd_client = ecd_client + manager.oss_client = oss_client + manager.app_stream_client = app_stream_client + self._instance_managers[desktop_id] = manager + return self._instance_managers[desktop_id] + + +class EcdDeviceInfo(BaseModel): + # Cloud computer device information query + # field return class + connection_status: str = (None,) + desktop_id: str = (None,) + desktop_status: str = (None,) + start_time: str = (None,) + + +class CommandTimeoutError(Exception): + """Command execution timeout exception""" + + +class CommandExecutionError(Exception): + """Command execution error exception""" + + +class EcdClient: + def __init__(self) -> None: + config = open_api_models.Config( + access_key_id=os.environ.get( + "ECD_ALIBABA_CLOUD_ACCESS_KEY_ID", + ), + # Your AccessKey Secret + access_key_secret=os.environ.get( + "ECD_ALIBABA_CLOUD_ACCESS_KEY_SECRET", + ), + ) + config.endpoint = os.environ.get( + "ECD_ALIBABA_CLOUD_ENDPOINT", + ) + self.__client__ = ecd20200930Client(config) + + def execute_command( + self, + desktop_ids: List[str], + command: str, + timeout: int = 60, + ) -> Tuple[str, str]: + # Execute command + run_command_request = ecd_20200930_models.RunCommandRequest( + desktop_id=desktop_ids, + command_content=command, + type="RunPowerShellScript", + end_user_id=os.environ.get("ECD_USERNAME"), + content_encoding="PlainText", + timeout=timeout, + ) + runtime = util_models.RuntimeOptions() + try: + rsp = self.__client__.run_command_with_options( + run_command_request, + runtime, + ) + + assert rsp.status_code == 200 + invoke_id = rsp.body.invoke_id + request_id = rsp.body.request_id + # logging.info(invoke_id, request_id) + return invoke_id, request_id + except Exception as error: + logger.error(f"{desktop_ids} excute command failed:{error}") + return "", "" + + def query_execute_state( + self, + desktop_ids: List[str], + message_id: str, + ) -> Any: + # Query command execution result + describe_invocations_request = ( + ecd_20200930_models.DescribeInvocationsRequest( + desktop_ids=desktop_ids, + invoke_id=message_id, + end_user_id=os.environ.get("ECD_USERNAME"), + command_type="RunPowerShellScript", + content_encoding="PlainText", + include_output=True, + ) + ) + runtime = util_models.RuntimeOptions() + try: + rsp = self.__client__.describe_invocations_with_options( + describe_invocations_request, + runtime, + ) + # print(rsp.body) + return rsp.body + except Exception as error: + UtilClient.assert_as_string(error) + logger.error(f"{desktop_ids} query message failed:{error}") + return None + + def run_command_with_wait( + self, + desktop_id: str, + command: str, + slot_time: float = None, + timeout: int = 60, + ) -> Tuple[str, str]: + execute_id, request_id = self.execute_command( + [desktop_id], + command, + timeout=timeout, + ) + print(f"execute_id:{execute_id}, request_id:{request_id}") + start_time = time.time() + if not slot_time: + if ( + "execute_wait_time_" in globals() + and execute_wait_time_ is not None + ): + slot_time = execute_wait_time_ + else: + slot_time = 3 # Default value + slot_time = max(0.5, slot_time) + timeout = slot_time + timeout + if execute_id: + while timeout > 0: + logger.info("start wait execution") + time.sleep(slot_time) + logger.info("execution end") + msgs = self.query_execute_state( + [desktop_id], + execute_id, + ) + for msg in msgs.invocations: + if msg.invocation_status in [ + "Success", + "Failed", + "Timeout", + ]: + logger.info( + f"command cost time: {time.time() - start_time}", + ) + return ( + msg.invocation_status == "Success", + msg.invoke_desktops[0].output, + ) + timeout -= slot_time + raise CommandTimeoutError("Command execution timeout") + + async def run_command_with_wait_async( + self, + desktop_id: str, + command: str, + slot_time: float = None, + timeout: int = 60, + ) -> Tuple[bool, str]: + execute_id, request_id = self.execute_command( + [desktop_id], + command, + timeout=timeout, + ) + print(f"execute_id:{execute_id}, request_id:{request_id}") + start_time = time.time() + if not slot_time: + if ( + "execute_wait_time_" in globals() + and execute_wait_time_ is not None + ): + slot_time = execute_wait_time_ + else: + slot_time = 3 # Default value + slot_time = max(0.5, slot_time) + timeout = slot_time + timeout + if execute_id: + while timeout > 0: + logger.info("start wait execution") + await asyncio.sleep(slot_time) # Use asyncio.sleep + logger.info("execution end") + msgs = self.query_execute_state( + [desktop_id], + execute_id, + ) + if msgs is None: + raise CommandQueryError("query execute state failed") + + for msg in msgs.invocations: + if msg.invocation_status in [ + "Success", + "Failed", + "Timeout", + ]: + logger.info( + f"command cost time: {time.time() - start_time}", + ) + return ( + msg.invocation_status == "Success", + ( + msg.invoke_desktops[0].output + if msg.invoke_desktops + else "" + ), + ) + timeout -= slot_time + raise CommandTimeoutError("Command execution timeout") + + def search_desktop_info( + self, + desktop_ids: List[str], + ) -> List[EcdDeviceInfo]: + describe_desktop_info_request = ( + ecd_20200930_models.DescribeDesktopInfoRequest( + region_id=os.environ.get("ECD_ALIBABA_CLOUD_REGION_ID"), + desktop_id=desktop_ids, + ) + ) + + runtime = util_models.RuntimeOptions() + try: + rsp = self.__client__.describe_desktop_info_with_options( + describe_desktop_info_request, + runtime, + ) + devices_info = [ + EcdDeviceInfo(**inst.__dict__) for inst in rsp.body.desktops + ] + return devices_info + except Exception as error: + logger.error(f"search wuying desktop failed:{error}") + return [] + + def start_desktops(self, desktop_ids: List[str]) -> int: + start_desktops_request = ecd_20200930_models.StartDesktopsRequest( + region_id=os.environ.get("ECD_ALIBABA_CLOUD_REGION_ID"), + desktop_id=desktop_ids, + ) + + runtime = util_models.RuntimeOptions() + try: + e_c = self.__client__ + rsp = e_c.start_desktops_with_options( + start_desktops_request, + runtime, + ) + logger.info( + f"[{desktop_ids}]: start instance ask api success," + f" and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"start_desktops failed:{error}") + return 400 + + async def start_desktops_async(self, desktop_ids: List[str]) -> int: + start_desktops_request = ecd_20200930_models.StartDesktopsRequest( + region_id=os.environ.get("ECD_ALIBABA_CLOUD_REGION_ID"), + desktop_id=desktop_ids, + ) + + runtime = util_models.RuntimeOptions() + try: + e_c = self.__client__ + method = e_c.start_desktops_with_options_async + rsp = await method( + start_desktops_request, + runtime, + ) + logger.info( + f"[{desktop_ids}]: start instance ask api success," + f" and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"start_desktops failed:{error}") + return 400 + + def wakeup_desktops(self, desktop_ids: List[str]) -> int: + wakeup_desktops_request = ecd_20200930_models.WakeupDesktopsRequest( + region_id=os.environ.get("ECD_ALIBABA_CLOUD_REGION_ID"), + desktop_id=desktop_ids, + ) + runtime = util_models.RuntimeOptions() + try: + e_c = self.__client__ + rsp = e_c.wakeup_desktops_with_options( + wakeup_desktops_request, + runtime, + ) + logger.info( + f"[{desktop_ids}]: wakeup instance ask api success," + f" and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"wakeup_desktops failed:{error}") + return 400 + + def hibernate_desktops(self, desktop_ids: List[str]) -> int: + hibernate_desktops_request = ( + ecd_20200930_models.HibernateDesktopsRequest( + region_id=os.environ.get("ECD_ALIBABA_CLOUD_REGION_ID"), + desktop_id=desktop_ids, + ) + ) + runtime = util_models.RuntimeOptions() + try: + e_c = self.__client__ + rsp = e_c.hibernate_desktops_with_options( + hibernate_desktops_request, + runtime, + ) + logger.info( + f"[{desktop_ids}]: hibernate instance ask api success," + f" and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"hibernate_desktops failed:{error}") + return 400 + + async def wakeup_desktops_async(self, desktop_ids: List[str]) -> int: + wakeup_desktops_request = ecd_20200930_models.WakeupDesktopsRequest( + region_id=os.environ.get("ECD_ALIBABA_CLOUD_REGION_ID"), + desktop_id=desktop_ids, + ) + runtime = util_models.RuntimeOptions() + try: + e_c = self.__client__ + method = e_c.wakeup_desktops_with_options_async + rsp = await method( + wakeup_desktops_request, + runtime, + ) + logger.info( + f"[{desktop_ids}]: wakeup instance ask api success," + f" and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"wakeup_desktops failed:{error}") + return 400 + + async def hibernate_desktops_async(self, desktop_ids: List[str]) -> int: + hibernate_desktops_request = ( + ecd_20200930_models.HibernateDesktopsRequest( + region_id=os.environ.get("ECD_ALIBABA_CLOUD_REGION_ID"), + desktop_id=desktop_ids, + ) + ) + runtime = util_models.RuntimeOptions() + try: + e_c = self.__client__ + method = e_c.hibernate_desktops_with_options_async + rsp = await method( + hibernate_desktops_request, + runtime, + ) + logger.info( + f"[{desktop_ids}]: wakeup instance ask api success," + f" and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"hibernate_desktops failed:{error}") + return 400 + + async def restart_equipment(self, desktop_id: str) -> int: + reboot_desktops_request = ecd_20200930_models.RebootDesktopsRequest( + region_id=os.environ.get("ECD_ALIBABA_CLOUD_REGION_ID"), + desktop_id=[desktop_id], + ) + runtime = util_models.RuntimeOptions() + try: + rsp = await self.__client__.reboot_desktops_with_options_async( + reboot_desktops_request, + runtime, + ) + return rsp.status_code + except Exception as error: + logger.error(f"restart equipment failed:{error}") + return 400 + + def stop_desktops(self, desktop_ids: List[str]) -> int: + stop_desktops_request = ecd_20200930_models.StopDesktopsRequest( + region_id=os.environ.get("ECD_ALIBABA_CLOUD_REGION_ID"), + desktop_id=desktop_ids, + ) + + runtime = util_models.RuntimeOptions() + try: + rsp = self.__client__.stop_desktops_with_options( + stop_desktops_request, + runtime, + ) + return rsp.status_code + except Exception as error: + logger.error(f"stop_desktops failed:{error}") + return 400 + + async def stop_desktops_async(self, desktop_ids: List[str]) -> int: + stop_desktops_request = ecd_20200930_models.StopDesktopsRequest( + region_id=os.environ.get("ECD_ALIBABA_CLOUD_REGION_ID"), + desktop_id=desktop_ids, + ) + + runtime = util_models.RuntimeOptions() + try: + e_c = self.__client__ + method = e_c.stop_desktops_with_options_async + rsp = await method( + stop_desktops_request, + runtime, + ) + logger.info( + f"[{desktop_ids}]: wakeup instance ask api success," + f" and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"stop_desktops failed:{error}") + return 400 + + async def rebuild_equipment_image( + self, + desktop_id: str, + image_id: str, + ) -> int: + rebuild_request = ecd_20200930_models.RebuildDesktopsRequest( + region_id=os.environ.get("ECD_ALIBABA_CLOUD_REGION_ID"), + image_id=image_id, + desktop_id=[ + desktop_id, + ], + ) + runtime = util_models.RuntimeOptions() + try: + rsp = await self.__client__.rebuild_desktops_with_options_async( + rebuild_request, + runtime, + ) + return rsp.status_code + except Exception as error: + logger.error(f"rebuild equipment failed:{error}") + return 400 + + +# pylint: disable=too-many-public-methods +class EcdInstanceManager: + def __init__(self, desktop_id: str = None) -> None: + self.desktop_id = desktop_id + self.ctrl_key = "Ctrl" + self.ratio = 1 + self.oss_sk = None + self.oss_ak = None + self.endpoint = None + self.oss_client = None + self.ecd_client = None + self._initialized = False + self._init_error = None + self.app_stream_client = None + self.auth_code = None + + def init_resources(self) -> bool: + if self._initialized: + # Get new auth_code + return self.refresh_aurh_code() + try: + # If no preset clients (set via ClientPool), create new ones + if self.ecd_client is None: + self.ecd_client = EcdClient() + if self.app_stream_client is None: + # Create new AppStreamClient instance + # each time (non-shared mode) + self.app_stream_client = AppStreamClient() + if self.oss_client is None: + bucket_name = os.environ.get("EDS_OSS_BUCKET_NAME") + endpoint = os.environ.get("EDS_OSS_ENDPOINT") + self.oss_client = OSSClient(bucket_name, endpoint) + + # Get auth_code + self.auth_code = self.app_stream_client.search_auth_code() + + # Verify desktop_id is valid (optional) + if self.desktop_id and self.ecd_client: + # Verify device exists and is accessible + desktop_info = self.ecd_client.search_desktop_info( + [self.desktop_id], + ) + if not desktop_info: + raise InitError( + f"Desktop {self.desktop_id} not found " + f"or not accessible", + ) + + # Set OSS endpoint + self.endpoint = os.environ.get("EDS_OSS_ENDPOINT") + + # Configuration parameters + self.oss_ak = os.environ.get("EDS_OSS_ACCESS_KEY_ID") + self.oss_sk = os.environ.get("EDS_OSS_ACCESS_KEY_SECRET") + self.ratio = 1 + self.ctrl_key = "Ctrl" + + self._initialized = True + return True + except Exception as e: + self._init_error = e + logger.error(f"Initialization failed: {e}") + return False + + def refresh_aurh_code(self) -> bool: + # Get new auth_code + self.auth_code = self.app_stream_client.search_auth_code() + # Return False if auth_code is empty, otherwise return True + return bool(self.auth_code) + + async def refresh_aurh_code_async(self) -> bool: + # Get new auth_code + self.auth_code = await self.app_stream_client.search_auth_code_async() + return bool(self.auth_code) + + async def init_resources_async(self) -> bool: + if self._initialized: + # Get new auth_code + self.auth_code = ( + await self.app_stream_client.search_auth_code_async() + ) + return True + try: + # If no preset clients (set via ClientPool), create new ones + if self.ecd_client is None: + self.ecd_client = EcdClient() + if self.app_stream_client is None: + # Create new AppStreamClient instance + # each time (non-shared mode) + self.app_stream_client = AppStreamClient() + if self.oss_client is None: + bucket_name = os.environ.get("EDS_OSS_BUCKET_NAME") + endpoint = os.environ.get("EDS_OSS_ENDPOINT") + self.oss_client = OSSClient(bucket_name, endpoint) + + # Get auth_code + self.auth_code = ( + await self.app_stream_client.search_auth_code_async() + ) + + # Verify desktop_id is valid (optional) + if self.desktop_id and self.ecd_client: + # Verify device exists and is accessible + desktop_info = self.ecd_client.search_desktop_info( + [self.desktop_id], + ) + if not desktop_info: + raise InitError( + f"Desktop {self.desktop_id} not found " + f"or not accessible", + ) + + # Set OSS endpoint + self.endpoint = os.environ.get("EDS_OSS_ENDPOINT") + + # Configuration parameters + self.oss_ak = os.environ.get("EDS_OSS_ACCESS_KEY_ID") + self.oss_sk = os.environ.get("EDS_OSS_ACCESS_KEY_SECRET") + self.ratio = 1 + self.ctrl_key = "Ctrl" + + self._initialized = True + return True + except Exception as e: + self._init_error = e + logger.error(f"Initialization failed: {e}") + return False + + def get_screenshot( + self, + local_file_name: str, + local_save_path: str, + ) -> str: + # local_file_name = f"{uuid.uuid4().hex}__screenshot" + logger.info("Starting screenshot") + save_path = f"C:/file/{local_file_name}" + file_save_path = f"{local_file_name}.png" + file_local_save_path = f"{save_path}.png" + retry = 2 + while retry > 0: + try: + # Take screenshot + # Get OSS presigned URL for upload + oss_signed_url = self.oss_client.get_signal_url( + f"{file_save_path}", + ) + status, file_oss = self.get_screenshot_oss( + file_local_save_path, + oss_signed_url, + ) + logger.debug(f"File output: {file_oss}") + if "Traceback" in file_oss: + return "" + base64_image = "" + file_oss_down = self.oss_client.get_download_url( + f"{file_save_path}", + ) + if status and file_oss: + base64_image = download_oss_image_and_save( + file_oss_down, + local_save_path, + ) + if base64_image: + logger.info( + "Successfully obtained Base64 image data", + ) + return base64_image + + return f"data:image/png;base64,{base64_image}" + + except Exception as e: + retry -= 1 + logger.warning( + f"Screenshot failed, retrying... {retry} " + f"attempts remaining, error: {e}", + ) + time.sleep(2) + + return "" + + async def get_screenshot_async( + self, + local_file_name: str, + local_save_path: str, + ) -> str: + # local_file_name = f"{uuid.uuid4().hex}__screenshot" + logger.info("Starting screenshot") + save_path = f"C:/file/{local_file_name}" + file_save_path = f"{local_file_name}.png" + file_local_save_path = f"{save_path}.png" + retry = 2 + while retry > 0: + try: + # Take screenshot + # Get OSS presigned URL for upload + oss_signed_url = await self.oss_client.get_signal_url_async( + f"{file_save_path}", + ) + status, file_oss = await self.get_screenshot_oss_async( + file_local_save_path, + oss_signed_url, + ) + logger.debug(f"File output: {file_oss}") + if "Traceback" in file_oss: + return "" + base64_image = "" + file_oss_down = await self.oss_client.get_download_url_async( + f"{file_save_path}", + ) + if status and file_oss: + base64_image = await download_oss_image_and_save_async( + file_oss_down, + local_save_path, + ) + if base64_image: + logger.info("Successfully obtained Base64 image data") + return base64_image + + return f"data:image/png;base64,{base64_image}" + + except Exception as e: + retry -= 1 + logger.warning( + f"Screenshot failed, retrying... {retry} " + f"attempts remaining, error: {e}", + ) + await asyncio.sleep(2) + + return "" + + def get_screenshot_oss_url( + self, + local_file_name: str, + local_save_path: str, + ) -> str: + # local_file_name = f"{uuid.uuid4().hex}__screenshot" + save_dir = "C:/file/" + save_path = f"{save_dir}{local_file_name}" + file_save_path = f"{local_file_name}.png" + file_local_save_path = f"{save_path}.png" + retry = 3 + while retry > 0: + try: + # Take screenshot + # Get OSS presigned URL for upload + oss_signed_url = self.oss_client.get_signal_url( + f"{file_save_path}", + ) + status, file_oss = self.get_screenshot_oss( + file_local_save_path, + oss_signed_url, + ) + if "Traceback" in file_oss: + return "" + file_oss_down = self.oss_client.get_download_url( + f"{file_save_path}", + ) + if status and file_oss: + download_oss_image_and_save( + file_oss_down, + local_save_path, + ) + if file_oss_down: + logger.info("Successfully obtained image data") + return file_oss_down + except Exception as e: + retry -= 1 + logger.warning( + f"Screenshot failed, retrying... {retry}" + f" attempts remaining, error: {e}", + ) + time.sleep(2) + + return "" + + async def get_screenshot_oss_url_async( + self, + local_file_name: str, + local_save_path: str, + ) -> str: + # local_file_name = f"{uuid.uuid4().hex}__screenshot" + save_path = f"C:/file/{local_file_name}" + file_save_path = f"{local_file_name}.png" + file_local_save_path = f"{save_path}.png" + retry = 3 + while retry > 0: + try: + # Take screenshot + # Get OSS presigned URL for upload + oss_signed_url = await self.oss_client.get_signal_url_async( + f"{file_save_path}", + ) + status, file_oss = await self.get_screenshot_oss_async( + file_local_save_path, + oss_signed_url, + ) + if "Traceback" in file_oss: + return "" + file_oss_down = await self.oss_client.get_download_url_async( + f"{file_save_path}", + ) + if status and file_oss: + await download_oss_image_and_save_async( + file_oss_down, + local_save_path, + ) + if file_oss_down: + logger.info("Successfully obtained image data") + return file_oss_down + except Exception as e: + retry -= 1 + logger.warning( + f"Screenshot failed, retrying... {retry} " + f"attempts remaining, error: {e}", + ) + await asyncio.sleep(2) + + return "" + + def get_screenshot_oss_down(self, local_file_name: str) -> str: + # local_file_name = f"{uuid.uuid4().hex}__screenshot" + save_path = f"C:/file/{local_file_name}" + file_save_path = f"{local_file_name}.png" + file_local_save_path = f"{save_path}.png" + retry = 3 + while retry > 0: + try: + # Take screenshot + # Get OSS presigned URL for upload + oss_signed_url = self.oss_client.get_signal_url( + f"{file_save_path}", + ) + self.get_screenshot_oss( + file_local_save_path, + oss_signed_url, + ) + file_oss_down = self.oss_client.get_download_url( + f"{file_save_path}", + ) + return file_oss_down + + except Exception as e: + retry -= 1 + logger.warning( + f"Screenshot failed, retrying... {retry} " + f"attempts remaining, error: {e}", + ) + time.sleep(2) + + return "" + + async def get_screenshot_oss_down_async(self, local_file_name: str) -> str: + # local_file_name = f"{uuid.uuid4().hex}__screenshot" + save_path = f"C:/file/{local_file_name}" + file_save_path = f"{local_file_name}.png" + file_local_save_path = f"{save_path}.png" + retry = 3 + while retry > 0: + try: + # Take screenshot + # Get OSS presigned URL for upload + oss_signed_url = await self.oss_client.get_signal_url_async( + f"{file_save_path}", + ) + await self.get_screenshot_oss_async( + file_local_save_path, + oss_signed_url, + ) + file_oss_down = await self.oss_client.get_download_url_async( + f"{file_save_path}", + ) + return file_oss_down + + except Exception as e: + retry -= 1 + logger.warning( + f"Screenshot failed, retrying... {retry} " + f"attempts remaining, error: {e}", + ) + await asyncio.sleep(2) + + return "" + + async def run_command_power_shell_async( + self, + command: str, + slot_time: float = None, + timeout: int = 30, + ) -> Tuple[str, str]: + return await self.ecd_client.run_command_with_wait_async( + self.desktop_id, + command, + slot_time, + timeout, + ) + + def run_command_power_shell( + self, + command: str, + slot_time: float = None, + timeout: int = 30, + ) -> Tuple[str, str]: + return self.ecd_client.run_command_with_wait( + self.desktop_id, + command, + slot_time, + timeout, + ) + + def run_code( + self, + code: str, + slot_time: float = None, + timeout: int = 30, + ) -> Tuple[str, str]: + # Build Python command and Base64 encode (using utf-16le) + full_python_command = f'\npython -c "{code}"' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.ecd_client.run_command_with_wait( + self.desktop_id, + command, + slot_time, + timeout, + ) + + async def run_code_async( + self, + code: str, + slot_time: float = None, + timeout: int = 30, + ) -> Tuple[str, str]: + # Build Python command and Base64 encode (using utf-16le) + full_python_command = f'\npython -c "{code}"' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.ecd_client.run_command_with_wait_async( + self.desktop_id, + command, + slot_time, + timeout, + ) + + def write_file( + self, + file_path: str, + content: str, + encoding: str = "utf-8", + ) -> Tuple[str, str]: + # Use repr() to process content, ensuring all special + # characters are properly escaped + content_repr = repr(content) + + # Use triple quotes to wrap print statement + # to avoid quote conflicts + script = f""" +import os +file_path = r'{file_path}' +content = {content_repr} +encoding = '{encoding}' +# Create directory if it doesn't exist +directory = os.path.dirname(file_path) +if directory and not os.path.exists(directory): + os.makedirs(directory) +# Write file +with open(file_path, 'w', encoding=encoding) as f: + f.write(content) +print('File written successfully') +""" + + # Use @' '@ syntax to wrap script to support multi-line content + full_python_command = f"\npython -c @'{script}'@" + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def read_file( + self, + file_path: str, + encoding: str = "utf-8", + ) -> Tuple[str, str]: + script = f""" +import os +import base64 + +file_path = r'{file_path}' +encoding = '{encoding}' + +if not os.path.exists(file_path): + print(f'Error: File not found - {{file_path}}') + exit(1) + +try: + with open(file_path, 'r', encoding=encoding) as f: + content = f.read() + print(content) +except Exception as e: + print(f'Error reading file: {{e}}') + exit(1) + """ + + full_python_command = f'\npython -c "{script}"' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def remove_file(self, file_path: str) -> Tuple[str, str]: + script = f""" +import os +import shutil + +file_path = r'{file_path}' + +try: + if os.path.isfile(file_path): + os.remove(file_path) + print(f'File {{file_path}} removed successfully') + elif os.path.isdir(file_path): + shutil.rmtree(file_path) + print(f'Directory {{file_path}} removed successfully') + else: + print(f'Path {{file_path}} does not exist') + exit(1) +except Exception as e: + print(f'Error removing {{file_path}}: {{e}}') + exit(1) +""" + + full_python_command = f'\npython -c "{script}"' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def get_screenshot_base64( + self, + screenshot_file: str, + ) -> Tuple[str, str]: + script = f""" +import pyautogui +import os +import base64 +screenshot_file = r'{screenshot_file}' +if os.path.exists(screenshot_file): + os.remove(screenshot_file) +screenshot = pyautogui.screenshot() +screenshot.save(screenshot_file) +with open(screenshot_file, 'rb') as img_file: + image_data = img_file.read() +encoded_bytes = base64.b64encode(image_data).decode('utf-8') +print(encoded_bytes) +os.remove(screenshot_file) + """.format( + screenshot_file=screenshot_file, + ) + + # Escape double quotes + # escaped_script = script.replace('"', '""') + + # Build Python command and Base64 encode (using utf-16le) + full_python_command = f'\npython -c "{script}"' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + async def get_screenshot_base64_async( + self, + screenshot_file: str, + ) -> Tuple[str, str]: + script = f""" +import pyautogui +import os +import base64 +screenshot_file = r'{screenshot_file}' +if os.path.exists(screenshot_file): + os.remove(screenshot_file) +screenshot = pyautogui.screenshot() +screenshot.save(screenshot_file) +with open(screenshot_file, 'rb') as img_file: + image_data = img_file.read() +encoded_bytes = base64.b64encode(image_data).decode('utf-8') +print(encoded_bytes) +os.remove(screenshot_file) + """.format( + screenshot_file=screenshot_file, + ) + + # Escape double quotes + # escaped_script = script.replace('"', '""') + + # Build Python command and Base64 encode (using utf-16le) + full_python_command = f'\npython -c "{script}"' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + def get_screenshot_oss( + self, + file_save_path: str, + oss_signal_url: str, + ) -> Tuple[str, str]: + script = f""" +import pyautogui +import os +import base64 +import requests +oss_signal_url = r'{oss_signal_url}' +file_save_path = r'{file_save_path}' +def upload_file(signed_url, file_path): + try: + with open(file_path, 'rb') as file: + response = requests.put(signed_url, data=file) + print(response.status_code) + except Exception as e: + print(e) +# Ensure directory exists +directory = os.path.dirname(file_save_path) +if directory and not os.path.exists(directory): + os.makedirs(directory) +if os.path.exists(file_save_path): + os.remove(file_save_path) +screenshot = pyautogui.screenshot() +screenshot.save(file_save_path) +upload_file(oss_signal_url, file_save_path) +print(oss_signal_url) +os.remove(file_save_path) + """.format( + oss_signal_url=oss_signal_url, + file_save_path=file_save_path, + ) + + # Escape double quotes + # escaped_script = script.replace('"', '""') + + # Build Python command and Base64 encode (using utf-16le) + full_python_command = f'\npython -c "{script}"' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + async def get_screenshot_oss_async( + self, + file_save_path: str, + oss_signal_url: str, + ) -> Tuple[str, str]: + script = f""" +import pyautogui +import os +import base64 +import requests +oss_signal_url = r'{oss_signal_url}' +file_save_path = r'{file_save_path}' +def upload_file(signed_url, file_path): + try: + with open(file_path, 'rb') as file: + response = requests.put(signed_url, data=file) + print(response.status_code) + except Exception as e: + print(e) +# Ensure directory exists +directory = os.path.dirname(file_save_path) +if directory and not os.path.exists(directory): + os.makedirs(directory) +if os.path.exists(file_save_path): + os.remove(file_save_path) +screenshot = pyautogui.screenshot() +screenshot.save(file_save_path) +upload_file(oss_signal_url, file_save_path) +print(oss_signal_url) +os.remove(file_save_path) + """.format( + oss_signal_url=oss_signal_url, + file_save_path=file_save_path, + ) + + # Escape double quotes + # escaped_script = script.replace('"', '""') + + # Build Python command and Base64 encode (using utf-16le) + full_python_command = f'\npython -c "{script}"' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + def open_app(self, name: str) -> Tuple[str, str]: + script = f""" +import pyautogui +import pyperclip +import time +import re +pyautogui.FAILSAFE = False + +# Define shortcut key +ctrl_key = '{self.ctrl_key}' + +def contains_chinese(text): + return bool(re.search(r'[\u4e00-\u9fff]', text)) + +name = '{name}' +if 'Outlook' in name: + name = name.replace('Outlook', 'Outlook new') + +print(f'Action: open {name}') + +# Open Windows search bar +pyautogui.press('win') # Press Win key +time.sleep(0.3) +pyperclip.copy(name) +time.sleep(0.3) +pyautogui.keyDown(ctrl_key) +pyautogui.press('v') +pyautogui.keyUp(ctrl_key) +# Press Enter to confirm +time.sleep(0.3) +pyautogui.press('enter') +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + async def open_app_async(self, name: str) -> Tuple[str, str]: + script = f""" +import pyautogui +import pyperclip +import time +import re +pyautogui.FAILSAFE = False + +# Define shortcut key +ctrl_key = '{self.ctrl_key}' + +def contains_chinese(text): + return bool(re.search(r'[\u4e00-\u9fff]', text)) + +name = '{name}' +if 'Outlook' in name: + name = name.replace('Outlook', 'Outlook new') + +print(f'Action: open {name}') + +# Open Windows search bar +pyautogui.press('win') # Press Win key +time.sleep(0.3) +pyperclip.copy(name) +time.sleep(0.3) +pyautogui.keyDown(ctrl_key) +pyautogui.press('v') +pyautogui.keyUp(ctrl_key) +# Press Enter to confirm +time.sleep(0.3) +pyautogui.press('enter') +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + def home(self) -> Tuple[str, str]: + # Show desktop + script = """ +import pyautogui +pyautogui.FAILSAFE = False +key1 = 'win' +key2 = 'd' +pyautogui.keyDown(key1) +pyautogui.keyDown(key2) +pyautogui.keyUp(key2) +pyautogui.keyUp(key1) + """ + full_python_command = f'\npython -c "{script}"' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def tap(self, x: int, y: int, count: int = 1) -> Tuple[str, str]: + script = f""" +import pyautogui +from pynput.mouse import Button, Controller +pyautogui.FAILSAFE = False +ratio = {self.ratio} +x = {x} +y = {y} +count = {count} +x, y = x//ratio, y//ratio +print('Action: click (%d, %d) %d times' % (x, y, count)) +mouse = Controller() +pyautogui.moveTo(x,y) +mouse.click(Button.left, count=count) +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def right_tap( + self, + x: int, + y: int, + count: int = 1, + ) -> Tuple[str, str]: + script = f""" +import pyautogui +from pynput.mouse import Button, Controller +pyautogui.FAILSAFE = False +ratio = {self.ratio} +x = {x} +y = {y} +count = {count} +x, y = x//ratio, y//ratio +print('Action: right click (%d, %d) %d times' % (x, y, count)) +pyautogui.rightClick(x, y) +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def shortcut(self, key1: str, key2: str) -> Tuple[str, str]: + script = f""" +import pyautogui +pyautogui.FAILSAFE = False +key1 = '{key1}' +key2 = '{key2}' +ctrl_key = '{self.ctrl_key}' +if key1 == 'command' or key1 == 'ctrl': + key1 = ctrl_key +print('Action: shortcut %s + %s' % (key1, key2)) +pyautogui.keyDown(key1) +pyautogui.keyDown(key2) +pyautogui.keyUp(key2) +pyautogui.keyUp(key1) +""" + + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def hotkey(self, key_list: List[str]) -> Tuple[str, str]: + """ + Execute hotkey operation remotely + (e.g., ['ctrl', 'c'], ['alt', 'f4'], etc.) + :param key_list: Hotkey list, + e.g., ['ctrl', 'a'], ['alt', 'f4'] + """ + script = f""" +import pyautogui +pyautogui.FAILSAFE = False +pyautogui.hotkey('{key_list[0]}', '{key_list[1]}') +""" + + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def press_key(self, key: str) -> Tuple[str, str]: + script = f""" +import pyautogui +pyautogui.FAILSAFE = False +pyautogui.press('{key}') +""" + + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def tap_type_enter( + self, + x: int, + y: int, + text: str, + ) -> Tuple[str, str]: + script = f""" +import pyautogui +import pyperclip +import time +pyautogui.FAILSAFE = False +ratio = {self.ratio} +ctrl_key = '{self.ctrl_key}' +x = {x} +y = {y} +text = '{text}' +x, y = x//ratio, y//ratio +print('Action: click (%d, %d), enter %s and press Enter' % (x, y, text)) +pyautogui.click(x=x, y=y) +time.sleep(0.5) +pyperclip.copy(text) +pyautogui.keyDown(ctrl_key) +pyautogui.keyDown('v') +pyautogui.keyUp('v') +pyautogui.keyUp(ctrl_key) +time.sleep(0.5) +pyautogui.press('enter') +""" + + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def drag( + self, + x1: int, + y1: int, + x2: int, + y2: int, + ) -> Tuple[str, str]: + script = f""" +import pyautogui +pyautogui.FAILSAFE = False +ratio = {self.ratio} +x1 = {x1} +y1 = {y1} +x2 = {x2} +y2 = {y2} +x1, y1 = x1//ratio, y1//ratio +x2, y2 = x2//ratio, y2//ratio +pyautogui.moveTo(x1,y1) +pyautogui.mouseDown() +pyautogui.moveTo(x2,y2,duration=0.5) +pyautogui.mouseUp() +print('Action: drag from (%d, %d) to (%d, %d)' % (x1, y1, x2, y2)) +""" + + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def replace(self, x: int, y: int, text: str) -> Tuple[str, str]: + script = f""" +import pyautogui +import pyperclip +from pynput.mouse import Button, Controller +import re +pyautogui.FAILSAFE = False +ratio = {self.ratio} +ctrl_key = '{self.ctrl_key}' +x = {x} +y = {y} +text = '{text}' +x, y = x//ratio, y//ratio +print('Action: replace the content at (%d, %d) ' + 'with %s and press Enter' % (x, y, text)) +mouse = Controller() +pyautogui.moveTo(x,y) +mouse.click(Button.left, count=2) +shortcut('command', 'a') +pyperclip.copy(text) +pyautogui.keyDown(ctrl_key) +pyautogui.keyDown('v') +pyautogui.keyUp('v') +pyautogui.keyUp(ctrl_key) +time.sleep(0.5) +pyautogui.press('enter') +""" + + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def append(self, x: int, y: int, text: str) -> Tuple[str, str]: + script = f""" +import pyautogui +import pyperclip +import re +from pynput.mouse import Button, Controller +pyautogui.FAILSAFE = False +def contains_chinese(text): + return bool(re.search(r'[\u4e00-\u9fff]', text)) +def shortcut(key1, key2): + # if key1 == 'command' and args.pc_type != "mac": + # key1 = 'ctrl' + if key1 == 'command' or key1 == 'ctrl': + key1 = ctrl_key + print('Action: shortcut %s + %s' % (key1, key2)) + pyautogui.keyDown(key1) + pyautogui.keyDown(key2) + pyautogui.keyUp(key2) + pyautogui.keyUp(key1) + return +x = {x} +y = {y} +text = '{text}' +ctrl_key = '{self.ctrl_key}' +ratio = {self.ratio} +x, y = x//ratio, y//ratio +print('Action: append the content at (%d, %d) ' + 'with %s and press Enter' % (x, y, text)) +mouse = Controller() +pyautogui.moveTo(x,y) +mouse.click(Button.left, count=1) +shortcut('command', 'a') +pyautogui.press('down') +if contains_chinese(text): + pyperclip.copy(text) + pyautogui.keyDown(ctrl_key) + pyautogui.keyDown('v') + pyautogui.keyUp('v') + pyautogui.keyUp(ctrl_key) +else: + pyautogui.typewrite(text) +time.sleep(1) +pyautogui.press('enter') +""" + + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def mouse_move(self, x: int, y: int) -> Tuple[str, str]: + script = f""" +import pyautogui +pyautogui.FAILSAFE = False +ratio = {self.ratio} +x = {x} +y = {y} +x, y = x//ratio, y//ratio +pyautogui.moveTo(x,y) +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def middle_click(self, x: int, y: int) -> Tuple[str, str]: + script = f""" +import pyautogui +pyautogui.FAILSAFE = False +ratio = {self.ratio} +x = {x} +y = {y} +pyautogui.middleClick(x, y) +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def type_with_clear_enter( + self, + text: str, + clear: int, + enter: int, + ) -> Tuple[str, str]: + script = f""" +import pyautogui +import pyperclip +import time +ratio = {self.ratio} +ctrl_key = '{self.ctrl_key}' +text = '{text}' +clear = {clear} +enter = {enter} +if clear == 1: + pyautogui.keyDown(ctrl_key) + pyautogui.keyDown('a') + pyautogui.keyUp('a') + pyautogui.keyUp(ctrl_key) + pyautogui.press('backspace') + time.sleep(0.5) +pyperclip.copy(text) +pyautogui.keyDown(ctrl_key) +pyautogui.keyDown('v') +pyautogui.keyUp('v') +pyautogui.keyUp(ctrl_key) +time.sleep(0.5) +if enter == 1: + pyautogui.press('enter') +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def type_with_clear_enter_pos( + self, + text: str, + x: int, + y: int, + clear: int, + enter: int, + ) -> Tuple[str, str]: + script = f""" +import pyautogui +import pyperclip +import time +ratio = {self.ratio} +ctrl_key = '{self.ctrl_key}' +text = '{text}' +x = {x} +y = {y} +clear = {clear} +enter = {enter} +x, y = x/ratio, y/ratio +pyautogui.click(x=x, y=y) +time.sleep(0.5) +if clear == 1: + pyautogui.keyDown(ctrl_key) + pyautogui.keyDown('a') + pyautogui.keyUp('a') + pyautogui.keyUp(ctrl_key) + pyautogui.press('backspace') + time.sleep(0.5) + +pyperclip.copy(text) +pyautogui.keyDown(ctrl_key) +pyautogui.keyDown('v') +pyautogui.keyUp('v') +pyautogui.keyUp(ctrl_key) +time.sleep(0.5) +if enter == 1: + pyautogui.press('enter') +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def scroll_pos(self, x: int, y: int, pixels: int) -> Tuple[str, str]: + script = f""" +import pyautogui +import time +ratio = {self.ratio} +x = {x} +y = {y} +pixels = {pixels}*150 +x, y = x//ratio, y//ratio +pyautogui.moveTo(x, y) +time.sleep(0.5) +pyautogui.scroll(pixels) +print('scroll_pos') +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + def scroll(self, pixels: int) -> Tuple[str, str]: + script = f""" +import pyautogui +pixels = {pixels}*150 +pyautogui.scroll(pixels) +print('scroll') +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return self.run_command_power_shell(command) + + async def home_async(self) -> Tuple[str, str]: + # Show desktop + script = """ +import pyautogui +pyautogui.FAILSAFE = False +key1 = 'win' +key2 = 'd' +pyautogui.keyDown(key1) +pyautogui.keyDown(key2) +pyautogui.keyUp(key2) +pyautogui.keyUp(key1) + """ + full_python_command = f'\npython -c "{script}"' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + async def tap_async( + self, + x: int, + y: int, + count: int = 1, + ) -> Tuple[str, str]: + script = f""" +import pyautogui +from pynput.mouse import Button, Controller +pyautogui.FAILSAFE = False +ratio = {self.ratio} +x = {x} +y = {y} +count = {count} +x, y = x//ratio, y//ratio +print('Action: click (%d, %d) %d times' % (x, y, count)) +mouse = Controller() +pyautogui.moveTo(x,y) +mouse.click(Button.left, count=count) +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + async def right_tap_async( + self, + x: int, + y: int, + count: int = 1, + ) -> Tuple[str, str]: + script = f""" +import pyautogui +from pynput.mouse import Button, Controller +pyautogui.FAILSAFE = False +ratio = {self.ratio} +x = {x} +y = {y} +count = {count} +x, y = x//ratio, y//ratio +print('Action: right click (%d, %d) %d times' % (x, y, count)) +pyautogui.rightClick(x, y) +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + async def shortcut_async(self, key1: str, key2: str) -> Tuple[str, str]: + script = f""" +import pyautogui +pyautogui.FAILSAFE = False +key1 = '{key1}' +key2 = '{key2}' +ctrl_key = '{self.ctrl_key}' +if key1 == 'command' or key1 == 'ctrl': + key1 = ctrl_key +print('Action: shortcut %s + %s' % (key1, key2)) +pyautogui.keyDown(key1) +pyautogui.keyDown(key2) +pyautogui.keyUp(key2) +pyautogui.keyUp(key1) +""" + + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + async def hotkey_async(self, key_list: List[str]) -> Tuple[str, str]: + """ + Execute hotkey operation remotely + (e.g., ['ctrl', 'c'], ['alt', 'f4'], etc.) + :param key_list: Hotkey list, + e.g., ['ctrl', 'a'], ['alt', 'f4'] + """ + script = f""" +import pyautogui +pyautogui.FAILSAFE = False +pyautogui.hotkey('{key_list[0]}', '{key_list[1]}') +""" + + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + async def press_key_async(self, key: str) -> Tuple[str, str]: + script = f""" +import pyautogui +pyautogui.FAILSAFE = False +pyautogui.press('{key}') +""" + + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + async def tap_type_enter_async( + self, + x: int, + y: int, + text: str, + ) -> Tuple[str, str]: + script = f""" +import pyautogui +import pyperclip +import time +pyautogui.FAILSAFE = False +ratio = {self.ratio} +ctrl_key = '{self.ctrl_key}' +x = {x} +y = {y} +text = '{text}' +x, y = x//ratio, y//ratio +print('Action: click (%d, %d), enter %s and press Enter' % (x, y, text)) +pyautogui.click(x=x, y=y) +time.sleep(0.5) +pyperclip.copy(text) +pyautogui.keyDown(ctrl_key) +pyautogui.keyDown('v') +pyautogui.keyUp('v') +pyautogui.keyUp(ctrl_key) +time.sleep(0.5) +pyautogui.press('enter') +""" + + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + async def drag_async( + self, + x1: int, + y1: int, + x2: int, + y2: int, + ) -> Tuple[str, str]: + script = f""" +import pyautogui +pyautogui.FAILSAFE = False +ratio = {self.ratio} +x1 = {x1} +y1 = {y1} +x2 = {x2} +y2 = {y2} +x1, y1 = x1//ratio, y1//ratio +x2, y2 = x2//ratio, y2//ratio +pyautogui.moveTo(x1,y1) +pyautogui.mouseDown() +pyautogui.moveTo(x2,y2,duration=0.5) +pyautogui.mouseUp() +print('Action: drag from (%d, %d) to (%d, %d)' % (x1, y1, x2, y2)) +""" + + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + async def replace_async( + self, + x: int, + y: int, + text: str, + ) -> Tuple[str, str]: + script = f""" +import pyautogui +import pyperclip +from pynput.mouse import Button, Controller +import re +pyautogui.FAILSAFE = False +ratio = {self.ratio} +ctrl_key = '{self.ctrl_key}' +x = {x} +y = {y} +text = '{text}' +x, y = x//ratio, y//ratio +print('Action: replace the content at (%d, %d) ' + 'with %s and press Enter' % (x, y, text)) +mouse = Controller() +pyautogui.moveTo(x,y) +mouse.click(Button.left, count=2) +shortcut('command', 'a') +pyperclip.copy(text) +pyautogui.keyDown(ctrl_key) +pyautogui.keyDown('v') +pyautogui.keyUp('v') +pyautogui.keyUp(ctrl_key) +time.sleep(0.5) +pyautogui.press('enter') +""" + + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + async def append_async(self, x: int, y: int, text: str) -> Tuple[str, str]: + script = f""" +import pyautogui +import pyperclip +import re +from pynput.mouse import Button, Controller +pyautogui.FAILSAFE = False +def contains_chinese(text): + return bool(re.search(r'[\u4e00-\u9fff]', text)) +def shortcut(key1, key2): + # if key1 == 'command' and args.pc_type != "mac": + # key1 = 'ctrl' + if key1 == 'command' or key1 == 'ctrl': + key1 = ctrl_key + print('Action: shortcut %s + %s' % (key1, key2)) + pyautogui.keyDown(key1) + pyautogui.keyDown(key2) + pyautogui.keyUp(key2) + pyautogui.keyUp(key1) + return +x = {x} +y = {y} +text = '{text}' +ctrl_key = '{self.ctrl_key}' +ratio = {self.ratio} +x, y = x//ratio, y//ratio +print('Action: append the content at (%d, %d) ' + 'with %s and press Enter' % (x, y, text)) +mouse = Controller() +pyautogui.moveTo(x,y) +mouse.click(Button.left, count=1) +shortcut('command', 'a') +pyautogui.press('down') +if contains_chinese(text): + pyperclip.copy(text) + pyautogui.keyDown(ctrl_key) + pyautogui.keyDown('v') + pyautogui.keyUp('v') + pyautogui.keyUp(ctrl_key) +else: + pyautogui.typewrite(text) +time.sleep(1) +pyautogui.press('enter') +""" + + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + async def mouse_move_async(self, x: int, y: int) -> Tuple[str, str]: + script = f""" +import pyautogui +pyautogui.FAILSAFE = False +ratio = {self.ratio} +x = {x} +y = {y} +x, y = x//ratio, y//ratio +pyautogui.moveTo(x,y) +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + async def middle_click_async(self, x: int, y: int) -> Tuple[str, str]: + script = f""" +import pyautogui +pyautogui.FAILSAFE = False +ratio = {self.ratio} +x = {x} +y = {y} +pyautogui.middleClick(x, y) +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + async def type_with_clear_enter_async( + self, + text: str, + clear: int, + enter: int, + ) -> Tuple[str, str]: + script = f""" +import pyautogui +import pyperclip +import time +ratio = {self.ratio} +ctrl_key = '{self.ctrl_key}' +text = '{text}' +clear = {clear} +enter = {enter} +if clear == 1: + pyautogui.keyDown(ctrl_key) + pyautogui.keyDown('a') + pyautogui.keyUp('a') + pyautogui.keyUp(ctrl_key) + pyautogui.press('backspace') + time.sleep(0.5) +pyperclip.copy(text) +pyautogui.keyDown(ctrl_key) +pyautogui.keyDown('v') +pyautogui.keyUp('v') +pyautogui.keyUp(ctrl_key) +time.sleep(0.5) +if enter == 1: + pyautogui.press('enter') +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + async def type_with_clear_enter_pos_async( + self, + text: str, + x: int, + y: int, + clear: int, + enter: int, + ) -> Tuple[str, str]: + script = f""" +import pyautogui +import pyperclip +import time +ratio = {self.ratio} +ctrl_key = '{self.ctrl_key}' +text = '{text}' +x = {x} +y = {y} +clear = {clear} +enter = {enter} +x, y = x/ratio, y/ratio +pyautogui.click(x=x, y=y) +time.sleep(0.5) +if clear == 1: + pyautogui.keyDown(ctrl_key) + pyautogui.keyDown('a') + pyautogui.keyUp('a') + pyautogui.keyUp(ctrl_key) + pyautogui.press('backspace') + time.sleep(0.5) + +pyperclip.copy(text) +pyautogui.keyDown(ctrl_key) +pyautogui.keyDown('v') +pyautogui.keyUp('v') +pyautogui.keyUp(ctrl_key) +time.sleep(0.5) +if enter == 1: + pyautogui.press('enter') +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + async def scroll_pos_async( + self, + x: int, + y: int, + pixels: int, + ) -> Tuple[str, str]: + script = f""" +import pyautogui +import time +ratio = {self.ratio} +x = {x} +y = {y} +pixels = {pixels}*150 +x, y = x//ratio, y//ratio +pyautogui.moveTo(x, y) +time.sleep(0.5) +pyautogui.scroll(pixels) +print('scroll_pos') +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + async def scroll_async(self, pixels: int) -> Tuple[str, str]: + script = f""" +import pyautogui +pixels = {pixels}*150 +pyautogui.scroll(pixels) +print('scroll') +""" + full_python_command = f'\npython -c @"{script}"@' + + # Construct PowerShell command + command = ( + r'$env:Path += ";C:\Program Files\Python310"' + f"{full_python_command}" + ) + + return await self.run_command_power_shell_async(command) + + +class AppStreamClient: + def __init__(self) -> None: + config = open_api_models.Config( + access_key_id=os.environ.get("ECD_ALIBABA_CLOUD_ACCESS_KEY_ID"), + # Your AccessKey Secret + access_key_secret=os.environ.get( + "ECD_ALIBABA_CLOUD_ACCESS_KEY_SECRET", + ), + ) + # Endpoint reference: https://api.aliyun.com/product/eds-aic + config.endpoint = ( + f"appstream-center." + f'{os.environ.get("ECD_APP_STREAM_REGION_ID")}.aliyuncs.com' + ) + self.__client__ = appstream_center20210218Client(config) + + async def search_auth_code_async(self) -> str: + """Get new auth_code, generates new authentication code on each call""" + get_auth_code_request = ( + appstream_center_20210218_models.GetAuthCodeRequest( + end_user_id=os.environ.get("ECD_USERNAME"), + ) + ) + runtime = util_models.RuntimeOptions() + try: + # Copy code and run, print API return value yourself + rep = await self.__client__.get_auth_code_with_options_async( + get_auth_code_request, + runtime, + ) + auth_code = rep.body.auth_model.auth_code + logger.info( + f"Successfully obtained new auth_code: {auth_code[:20]}...", + ) + return auth_code + except Exception as error: + logger.error(f"search authcode failed:{error}") + return "" + + def search_auth_code(self) -> str: + """Get new auth_code, generates new authentication code on each call""" + get_auth_code_request = ( + appstream_center_20210218_models.GetAuthCodeRequest( + end_user_id=os.environ.get("ECD_USERNAME"), + ) + ) + runtime = util_models.RuntimeOptions() + try: + # Copy code and run, print API return value yourself + rep = self.__client__.get_auth_code_with_options( + get_auth_code_request, + runtime, + ) + auth_code = rep.body.auth_model.auth_code + logger.info( + f"Successfully obtained new auth_code: {auth_code[:20]}...", + ) + return auth_code + except Exception as error: + logger.error(f"search authcode failed:{error}") + return "" diff --git a/src/agentscope_runtime/sandbox/box/cloud_api/client/cloud_phone_wy.py b/src/agentscope_runtime/sandbox/box/cloud_api/client/cloud_phone_wy.py new file mode 100644 index 000000000..7ad670a5b --- /dev/null +++ b/src/agentscope_runtime/sandbox/box/cloud_api/client/cloud_phone_wy.py @@ -0,0 +1,1559 @@ +# -*- coding: utf-8 -*- +import os +import threading +import asyncio +import time +import uuid +import logging +from typing import Tuple, Optional, Any, List +from pydantic import BaseModel +from alibabacloud_tea_openapi import models as open_api_models +from alibabacloud_eds_aic20230930.client import Client as eds_aic20230930Client +from alibabacloud_eds_aic20230930 import models as eds_aic_20230930_models +from alibabacloud_tea_util import models as util_models +from alibabacloud_tea_util.client import Client as UtilClient + +from agentscope_runtime.sandbox.box.cloud_api.utils.oss_client import OSSClient + +logger = logging.getLogger(__name__) + + +execute_wait_time_: int = 5 + + +class ScreenshotError(Exception): + """Screenshot related operation exception""" + + +class InitError(Exception): + """Initialization exception""" + + +class ClientPool: + """Client pool manager - singleton pattern managing + shared client instances""" + + _instance = None + _lock = threading.Lock() + + def __new__(cls): + if cls._instance is None: + with cls._lock: + if cls._instance is None: + cls._instance = super().__new__(cls) + cls._instance._initialized = False + return cls._instance + + def __init__(self): + # Use hasattr to ensure attribute exists + if not getattr(self, "_initialized", False): + self._eds_client = None + self._oss_client = None + self._client_lock = threading.Lock() + self._instance_managers = {} + # Use different locks to avoid deadlocks + self._eds_lock = threading.Lock() + self._oss_lock = threading.Lock() + self._instance_manager_lock = threading.Lock() + self._initialized = True + + def get_eds_client(self) -> "EdsClient": + """Get shared EdsClient instance""" + if self._eds_client is None: + with self._eds_lock: + if self._eds_client is None: + self._eds_client = EdsClient() + return self._eds_client + + def get_oss_client(self) -> OSSClient: + """Get shared OSSClient instance""" + if self._oss_client is None: + with self._oss_lock: + if self._oss_client is None: + bucket_name = os.environ.get("EDS_OSS_BUCKET_NAME") + endpoint = os.environ.get("EDS_OSS_ENDPOINT") + self._oss_client = OSSClient(bucket_name, endpoint) + return self._oss_client + + def get_instance_manager( + self, + instance_id: str, + ) -> "EdsInstanceManager": + """Get EdsInstanceManager instance for + specified instance_id""" + # Check if it already exists first to avoid + # unnecessary lock contention + if instance_id in self._instance_managers: + return self._instance_managers[instance_id] + + # Pre-fetch clients outside lock to avoid deadlock + eds_client = self.get_eds_client() + oss_client = self.get_oss_client() + + # Use dedicated lock to manage instance managers + with self._instance_manager_lock: + # Check again to prevent creation by another thread + # while waiting for lock + if instance_id not in self._instance_managers: + # Create new instance manager and pass + # in shared clients + manager = EdsInstanceManager(instance_id) + manager.eds_client = eds_client + manager.oss_client = oss_client + self._instance_managers[instance_id] = manager + return self._instance_managers[instance_id] + + +class EdsDeviceInfo(BaseModel): + # Cloud phone device information query field return class + android_instance_name: str + android_instance_id: str + network_interface_ip: str + android_instance_status: str + + +class CommandTimeoutError(Exception): + """Exception raised when command execution times out""" + + +# pylint: disable=too-many-public-methods +class EdsClient: + def __init__(self) -> None: + config = open_api_models.Config( + access_key_id=os.environ.get( + "EDS_ALIBABA_CLOUD_ACCESS_KEY_ID", + ), + # Your AccessKey Secret + access_key_secret=os.environ.get( + "EDS_ALIBABA_CLOUD_ACCESS_KEY_SECRET", + ), + ) + # Endpoint reference: https://api.aliyun.com/product/eds-aic + config.endpoint = os.environ.get( + "EDS_ALIBABA_CLOUD_ENDPOINT", + ) + config.read_timeout = 6000 + self._client = eds_aic20230930Client(config) + + def client_ticket_create( + self, + instance_id: str, + ) -> Tuple[str, str, str]: + logger.info(f"[{instance_id}]: create ticket") + batch_get_acp_connection_ticket_request = ( + eds_aic_20230930_models.BatchGetAcpConnectionTicketRequest( + instance_ids=[ + instance_id, + ], + ) + ) + runtime = util_models.RuntimeOptions() + try: + # Copy code and run, print API return value yourself + rsp = self._client.batch_get_acp_connection_ticket_with_options( + batch_get_acp_connection_ticket_request, + runtime, + ) + info = rsp.body.instance_connection_models[0] + logger.info( + f"[{instance_id}]: create ticket success", + ) + return ( + info.ticket, + info.persistent_app_instance_id, + info.app_instance_id, + ) + except Exception as error: + logger.error( + f"[{instance_id}]: error when create ticket error:{error}", + ) + return "", "", "" + + async def client_ticket_create_async( + self, + instance_id: str, + ) -> Tuple[str, str, str]: + logger.info(f"[{instance_id}]: start to create ticket") + batch_get_acp_connection_ticket_request = ( + eds_aic_20230930_models.BatchGetAcpConnectionTicketRequest( + instance_ids=[ + instance_id, + ], + ) + ) + runtime = util_models.RuntimeOptions() + try: + method = ( + self._client.batch_get_acp_connection_ticket_with_options_async + ) + rsp = await method( + batch_get_acp_connection_ticket_request, + runtime, + ) + info = rsp.body.instance_connection_models[0] + logger.info(f"[{instance_id}]: create ticket success") + return ( + info.ticket, + info.persistent_app_instance_id, + info.app_instance_id, + ) + except Exception as error: + logger.error( + f"[{instance_id}]: error when create ticket error:{error}", + ) + return "", "", "" + + def execute_command( + self, + instance_ids: List[str], + command: str, + timeout: int = 60, + ) -> tuple[str, str | None]: + logger.info(f"[{instance_ids}]: start to execute command: {command}") + # Execute command + run_command_request = eds_aic_20230930_models.RunCommandRequest( + instance_ids=instance_ids, + command_content=command, + timeout=timeout, + ) + runtime = util_models.RuntimeOptions() + try: + rsp = self._client.run_command_with_options( + run_command_request, + runtime, + ) + assert rsp.status_code == 200 + logger.info( + f"[{instance_ids}]: execute command success", + ) + invoke_id = rsp.body.invoke_id + request_id = rsp.body.request_id + # logging.info(invoke_id, request_id) + return invoke_id, request_id + except Exception as error: + logger.error( + f"[{instance_ids}]: error when excute command:" + f" {command}, error:{error}", + ) + return "", "" + + async def execute_command_async( + self, + instance_ids: List[str], + command: str, + timeout: int = 60, + ) -> tuple[str, str | None]: + # Execute command + run_command_request = eds_aic_20230930_models.RunCommandRequest( + instance_ids=instance_ids, + command_content=command, + timeout=timeout, + ) + runtime = util_models.RuntimeOptions() + try: + rsp = await self._client.run_command_with_options_async( + run_command_request, + runtime, + ) + + assert rsp.status_code == 200 + invoke_id = rsp.body.invoke_id + request_id = rsp.body.request_id + # logging.info(invoke_id, request_id) + return invoke_id, request_id + except Exception as error: + logger.error( + f"[{instance_ids}]: error when excute command:" + f" {command}, error:{error}", + ) + return "", "" + + def query_execute_state( + self, + instance_ids: List[str], + message_id: str, + ) -> Any: + # Query command execution result + describe_invocations_request = ( + eds_aic_20230930_models.DescribeInvocationsRequest( + instance_ids=instance_ids, + invocation_id=message_id, + ) + ) + runtime = util_models.RuntimeOptions() + try: + rsp = self._client.describe_invocations_with_options( + describe_invocations_request, + runtime, + ) + # print(rsp.body) + return rsp.body + except Exception as error: + UtilClient.assert_as_string(error) + logger.error( + f"[{instance_ids}]: error when query message:" + f" {message_id}, error:{error}", + ) + return None + + def run_command_with_wait( + self, + instances_id: str, + command: str, + slot_time: float = None, + timeout: int = 60, + ) -> tuple[bool, str | None]: + logger.info(f"[{instances_id}]: start to run command async:{command}") + execute_id, request_id = self.execute_command( + [instances_id], + command, + timeout=timeout, + ) + logger.info(f"[{request_id}{instances_id}]: start to wait command") + start_time = time.time() + if not slot_time: + if ( + "execute_wait_time_" in globals() + and execute_wait_time_ is not None + ): + slot_time = execute_wait_time_ + else: + slot_time = 3 # Default value + slot_time = max(0.5, slot_time) + timeout = slot_time + timeout + if execute_id: + while timeout > 0: + time.sleep(slot_time) + msgs = self.query_execute_state( + [instances_id], + execute_id, + ) + for msg in msgs.data: + if msg.invocation_status in [ + "Success", + "Failed", + "Timeout", + ]: + print( + f"command cost time: " + f"{time.time() - start_time}", + ) + logger.info( + f"[{instances_id}]: command status:" + f" {msg.invocation_status}", + ) + return ( + msg.invocation_status == "Success", + msg.output, + ) + timeout -= slot_time + logger.error(f"[{instances_id}]: command timeout") + raise CommandTimeoutError("command timeout") + + async def run_command_with_wait_async( + self, + instances_id: str, + command: str, + slot_time: float = None, + timeout: int = 60, + ) -> tuple[bool, str | None]: + logger.info(f"[{instances_id}]: start to run command async:{command}") + execute_id, request_id = await self.execute_command_async( + [instances_id], + command, + timeout=timeout, + ) + logger.info(f"[{request_id}{instances_id}]: start to wait command") + start_time = time.time() + if not slot_time: + if ( + "execute_wait_time_" in globals() + and execute_wait_time_ is not None + ): + slot_time = execute_wait_time_ + else: + slot_time = 3 # Default value + slot_time = max(0.5, slot_time) + timeout = slot_time + timeout + if execute_id: + while timeout > 0: + await asyncio.sleep(slot_time) + msgs = self.query_execute_state( + [instances_id], + execute_id, + ) + for msg in msgs.data: + if msg.invocation_status in [ + "Success", + "Failed", + "Timeout", + ]: + print( + f"command cost time: " + f"{time.time() - start_time}", + ) + logger.info( + f"[{instances_id}]: command status:" + f" {msg.invocation_status}", + ) + return ( + msg.invocation_status == "Success", + msg.output, + ) + timeout -= slot_time + logger.error(f"[{instances_id}]: command timeout") + raise CommandTimeoutError("command timeout") + + async def create_screenshot_async(self, instances_id: str) -> str: + logger.info(f"[{instances_id}]: start to ask api to do screenshot") + create_screenshot_request = ( + eds_aic_20230930_models.CreateScreenshotRequest( + android_instance_id_list=[ + instances_id, + ], + ) + ) + runtime = util_models.RuntimeOptions() + try: + # Copy code and run, print API return value yourself + rsp = await self._client.create_screenshot_with_options_async( + create_screenshot_request, + runtime, + ) + logger.info( + f"[{instances_id}]: start to ask api to do screenshot success", + ) + return rsp.body.tasks[0].task_id + except Exception as error: + logger.error( + f"[{instances_id}]: error when ask api to do screenshot:" + f" {error}", + ) + return "" + + def create_screenshot(self, instances_id: str) -> str: + logger.info(f"[{instances_id}]: start to ask api to do screenshot") + create_screenshot_request = ( + eds_aic_20230930_models.CreateScreenshotRequest( + android_instance_id_list=[ + instances_id, + ], + ) + ) + runtime = util_models.RuntimeOptions() + try: + # Copy code and run, print API return value yourself + rsp = self._client.create_screenshot_with_options( + create_screenshot_request, + runtime, + ) + logger.info( + f"[{instances_id}]: start to ask api to do screenshot success", + ) + return rsp.body.tasks[0].task_id + except Exception as error: + logger.error( + f"[{instances_id}]: error when ask api to do screenshot:" + f" {error}", + ) + return "" + + async def describe_tasks_async(self, task_ids: List[str]) -> str: + logger.info(f"[{task_ids}]: start to wait task") + describe_tasks_request = eds_aic_20230930_models.DescribeTasksRequest( + task_ids=task_ids, + ) + runtime = util_models.RuntimeOptions() + retry = 3 + while retry > 0: + try: + await asyncio.sleep(1) + # Copy code and run, print API return value yourself + rsp = await self._client.describe_tasks_with_options_async( + describe_tasks_request, + runtime, + ) + result = rsp.body.data[0].result + logger.info(f"[{task_ids}]: task result: {result}") + if not result: + logger.error( + f"[{task_ids}]: task result is empty and retry", + ) + retry += 1 + continue + return result + except Exception as error: + retry -= 1 + logger.error(f"[{task_ids}]: task result error: {error}") + return "" + + def describe_tasks(self, task_ids: List[str]) -> str: + logger.info(f"[{task_ids}]: start to wait task") + describe_tasks_request = eds_aic_20230930_models.DescribeTasksRequest( + task_ids=task_ids, + ) + runtime = util_models.RuntimeOptions() + retry = 3 + while retry > 0: + try: + time.sleep(1) + # Copy code and run, print API return value yourself + rsp = self._client.describe_tasks_with_options( + describe_tasks_request, + runtime, + ) + result = rsp.body.data[0].result + logger.info(f"[{task_ids}]: task result: {result}") + if not result: + logger.error( + f"[{task_ids}]: task result is empty and retry", + ) + retry += 1 + continue + return result + except Exception as error: + retry -= 1 + logger.error(f"[{task_ids}]: task result error: {error}") + return "" + + def list_instance( + self, + page_size: Optional[int] = 10, + next_token: Optional[int] = None, + status: Optional[int] = None, + instance_ids: List[str] = None, + ) -> Any: + logger.info(f"start to list instances {instance_ids}") + describe_android_instances_request = ( + eds_aic_20230930_models.DescribeAndroidInstancesRequest( + max_results=page_size, + next_token=next_token, + status=status, + android_instance_ids=instance_ids, + ) + ) + + runtime = util_models.RuntimeOptions() + try: + rsp = self._client.describe_android_instances_with_options( + describe_android_instances_request, + runtime, + ) + devices_info = [ + EdsDeviceInfo(**inst.__dict__) + for inst in rsp.body.instance_model + ] + logger.info(f"list instances success: {devices_info}") + return rsp.body.total_count, rsp.body.next_token, devices_info + except Exception as error: + logger.error(f"list wuying mobile failed: {error}") + return 0, None, [] + + def list_all_instance( + self, + page_size: int = 5, + ) -> List[EdsDeviceInfo]: + instances = [] + count, next_token, page_instances = self.list_instance( + page_size=page_size, + next_token=None, + ) + print("count:", count) + instances += page_instances + while next_token is not None: + _, next_token, page_instances = self.list_instance( + page_size=page_size, + next_token=next_token, + ) + instances += page_instances + # print("------", next_token) + return instances + + def restart_equipment(self, instance_ids: List[str]) -> None: + logger.info(f"{instance_ids}: start to restart equipment") + reboot_android_instances_in_group_request = ( + eds_aic_20230930_models.RebootAndroidInstancesInGroupRequest( + android_instance_ids=instance_ids, + force_stop=True, + ) + ) + runtime = util_models.RuntimeOptions() + try: + e_c = self._client + method = e_c.reboot_android_instances_in_group_with_options + rsp = method( + reboot_android_instances_in_group_request, + runtime, + ) + logger.info( + "instance_ids: restart equipment ask api success," + " and wait finish", + ) + print(rsp) + except Exception as error: + logger.info( + f"restart equipment failed:{error}", + ) + + async def restart_equipment_async(self, instance_ids: List[str]) -> None: + logger.info(f"{instance_ids}: start to restart equipment") + reboot_android_instances_in_group_request = ( + eds_aic_20230930_models.RebootAndroidInstancesInGroupRequest( + android_instance_ids=instance_ids, + force_stop=True, + ) + ) + runtime = util_models.RuntimeOptions() + try: + e_c = self._client + method = e_c.reboot_android_instances_in_group_with_options_async + rsp = await method( + reboot_android_instances_in_group_request, + runtime, + ) + logger.info( + "instance_ids: restart equipment ask api success," + " and wait finish", + ) + print(rsp) + except Exception as error: + logger.info( + f"restart equipment failed:{error}", + ) + + async def start_equipment_async(self, instance_ids: List[str]) -> int: + logger.info(f"{instance_ids}: start to start instance") + start_android_instance_request = ( + eds_aic_20230930_models.StartAndroidInstanceRequest( + android_instance_ids=instance_ids, + ) + ) + + runtime = util_models.RuntimeOptions() + try: + e_c = self._client + method = e_c.start_android_instance_with_options_async + rsp = await method( + start_android_instance_request, + runtime, + ) + logger.info( + f"{instance_ids}: start instance ask api success," + f" and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"start instance failed:{error}") + return 400 + + def start_equipment(self, instance_ids: List[str]) -> int: + logger.info(f"{instance_ids}: start to start instance") + start_android_instance_request = ( + eds_aic_20230930_models.StartAndroidInstanceRequest( + android_instance_ids=instance_ids, + ) + ) + + runtime = util_models.RuntimeOptions() + try: + e_c = self._client + method = e_c.start_android_instance_with_options + rsp = method( + start_android_instance_request, + runtime, + ) + logger.info( + f"{instance_ids}: start instance ask api success," + f" and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"start instance failed:{error}") + return 400 + + def stop_equipment(self, instance_ids: List[str]) -> int: + logger.info(f"{instance_ids}: start to stop instance") + stop_android_instance_request = ( + eds_aic_20230930_models.StopAndroidInstanceRequest( + android_instance_ids=instance_ids, + ) + ) + + runtime = util_models.RuntimeOptions() + try: + rsp = self._client.stop_android_instance_with_options( + stop_android_instance_request, + runtime, + ) + logger.info( + f"{instance_ids}: stop instance ask api success," + f" and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"stop_equipment failed:{error}") + return 400 + + async def stop_equipment_async(self, instance_ids: List[str]) -> int: + logger.info(f"{instance_ids}: start to stop instance") + stop_android_instance_request = ( + eds_aic_20230930_models.StopAndroidInstanceRequest( + android_instance_ids=instance_ids, + ) + ) + + runtime = util_models.RuntimeOptions() + try: + e_c = self._client + method = e_c.stop_android_instance_with_options_async + rsp = await method( + stop_android_instance_request, + runtime, + ) + logger.info( + f"{instance_ids}: stop instance ask api success," + f" and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"stop instance failed:{error}") + return 400 + + async def reset_equipment_async(self, instance_ids: List[str]) -> int: + logger.info(f"{instance_ids}: start to reset equipment") + reset_android_instances_in_group_request = ( + eds_aic_20230930_models.ResetAndroidInstancesInGroupRequest( + android_instance_ids=instance_ids, + ) + ) + runtime = util_models.RuntimeOptions() + try: + e_c = self._client + method = e_c.reset_android_instances_in_group_with_options_async + rsp = await method( + reset_android_instances_in_group_request, + runtime, + ) + logger.info( + f"{instance_ids}: reset equipment ask api success," + f" and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"reset_equipment failed:{error}") + return 400 + + def reset_equipment(self, instance_ids: List[str]) -> int: + logger.info(f"{instance_ids}: start to reset equipment") + reset_android_instances_in_group_request = ( + eds_aic_20230930_models.ResetAndroidInstancesInGroupRequest( + android_instance_ids=instance_ids, + ) + ) + runtime = util_models.RuntimeOptions() + try: + e_c = self._client + method = e_c.reset_android_instances_in_group_with_options + rsp = method( + reset_android_instances_in_group_request, + runtime, + ) + logger.info( + f"{instance_ids}: reset equipment ask api success," + f" and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"reset_equipment failed:{error}") + return 400 + + def rebuild_equipment_image( + self, + instance_ids: List[str], + image_id: str, + ) -> int: + logger.info(f"{instance_ids}: start to rebuild equipment image") + update_instance_image_request = ( + eds_aic_20230930_models.UpdateInstanceImageRequest( + instance_id_list=instance_ids, + image_id=image_id, + ) + ) + runtime = util_models.RuntimeOptions() + try: + rsp = self._client.update_instance_image_with_options( + update_instance_image_request, + runtime, + ) + logger.info( + f"{instance_ids}: rebuild equipment image ask api " + f"success, and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"rebuild_equipment_image failed:{error}") + return 400 + + async def rebuild_equipment_image_async( + self, + instance_ids: List[str], + image_id: str, + ) -> int: + logger.info(f"{instance_ids}: start to rebuild equipment image") + update_instance_image_request = ( + eds_aic_20230930_models.UpdateInstanceImageRequest( + instance_id_list=instance_ids, + image_id=image_id, + ) + ) + runtime = util_models.RuntimeOptions() + try: + rsp = await self._client.update_instance_image_with_options_async( + update_instance_image_request, + runtime, + ) + logger.info( + f"{instance_ids}: rebuild equipment image ask api " + f"success, and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"rebuild_equipment_image failed:{error}") + return 400 + + def send_file( + self, + instance_ids: List[str], + source_file_path: str, + upload_url: str, + ) -> int: + logger.info(f"{instance_ids}: start to rebuild equipment image") + send_file_request = eds_aic_20230930_models.SendFileRequest( + android_instance_id_list=instance_ids, + source_file_path=source_file_path, + upload_type="DOWNLOAD_URL", + upload_url=upload_url, + ) + runtime = util_models.RuntimeOptions() + try: + rsp = self._client.send_file_with_options( + send_file_request, + runtime, + ) + logger.info( + f"{instance_ids}: send file ask api " + f"success, and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"send file failed:{error}") + return 400 + + async def send_file_async( + self, + instance_ids: List[str], + source_file_path: str, + upload_url: str, + ) -> int: + logger.info(f"{instance_ids}: start to rebuild equipment image") + send_file_request = eds_aic_20230930_models.SendFileRequest( + android_instance_id_list=instance_ids, + source_file_path=source_file_path, + upload_type="DOWNLOAD_URL", + upload_url=upload_url, + ) + runtime = util_models.RuntimeOptions() + try: + rsp = await self._client.send_file_with_options_async( + send_file_request, + runtime, + ) + logger.info( + f"{instance_ids}: send file ask api " + f"success, and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"send file failed:{error}") + return 400 + + def fetch_file( + self, + instance_ids: List[str], + source_file_path: str, + upload_endpoint: str, + upload_url: str, + ) -> int: + logger.info(f"{instance_ids}: start to rebuild equipment image") + fetch_file_request = eds_aic_20230930_models.FetchFileRequest( + android_instance_id_list=instance_ids, + source_file_path=source_file_path, + upload_type="OSS", + upload_endpoint=upload_endpoint, + upload_url=upload_url, + ) + runtime = util_models.RuntimeOptions() + try: + rsp = self._client.fetch_file_with_options( + fetch_file_request, + runtime, + ) + logger.info( + f"{instance_ids}: fetch file ask api " + f"success, and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"fetch file failed:{error}") + return 400 + + async def fetch_file_async( + self, + instance_ids: List[str], + source_file_path: str, + upload_endpoint: str, + upload_url: str, + ) -> int: + logger.info(f"{instance_ids}: start to rebuild equipment image") + fetch_file_request = eds_aic_20230930_models.FetchFileRequest( + android_instance_id_list=instance_ids, + source_file_path=source_file_path, + upload_type="OSS", + upload_endpoint=upload_endpoint, + upload_url=upload_url, + ) + runtime = util_models.RuntimeOptions() + try: + rsp = await self._client.fetch_file_with_options_async( + fetch_file_request, + runtime, + ) + logger.info( + f"{instance_ids}: fetch file ask api " + f"success, and wait finish", + ) + return rsp.status_code + except Exception as error: + logger.error(f"fetch file failed:{error}") + return 400 + + +# pylint: disable=too-many-public-methods +class EdsInstanceManager: + def __init__(self, instance_id: str = ""): + # Directly use the passed instance_id, remove local cache logic + if not instance_id: + logger.error( + "instance_id is required for " + "EdsInstanceManager initialization", + ) + raise InitError( + "instance_id is required for " + "EdsInstanceManager initialization", + ) + + self.instance_id = instance_id + self.client_pool = ClientPool() + self.eds_client = self.client_pool.get_eds_client() + self.oss_client = self.client_pool.get_oss_client() + bucket_name = os.environ.get("EDS_OSS_BUCKET_NAME") + endpoint = os.environ.get("EDS_OSS_ENDPOINT") + # self.oss_client = OSSClient(bucket_name, endpoint) + self.endpoint = endpoint + self.des_oss_dir = f"oss://{bucket_name}/__mPLUG__/{self.instance_id}/" + self.oss_ak = (os.environ.get("EDS_OSS_ACCESS_KEY_ID"),) + self.oss_sk = os.environ.get("EDS_OSS_ACCESS_KEY_SECRET") + self._initialized = False + self.ticket = None + self.person_app_id = None + self.app_instance_id = None + + def refresh_ticket(self): + logger.info(f"Instance {self.instance_id}: refreshing ticket...") + ( + self.ticket, + self.person_app_id, + self.app_instance_id, + ) = self.eds_client.client_ticket_create( + self.instance_id, + ) + self._initialized = True + logger.info(f"Instance {self.instance_id}: refresh_ticket succeeded") + + async def refresh_ticket_async(self): + logger.info(f"Instance {self.instance_id}: refreshing ticket...") + ( + self.ticket, + self.person_app_id, + self.app_instance_id, + ) = await self.eds_client.client_ticket_create_async( + self.instance_id, + ) + self._initialized = True + logger.info(f"Instance {self.instance_id}: refresh_ticket succeeded") + + def _ensure_initialized(self): + if not self._initialized: + logger.warning( + f"Instance {self.instance_id}: please initialize first", + ) + raise InitError( + "Manager not initialized. Call await initialize() first.", + ) + + # run_list_instance function has been removed, + # device allocation is now managed by backend.py + + async def get_screenshot_sdk_async(self) -> str: + logger.info(f"Instance {self.instance_id}: getting screenshot") + task_id = await self.eds_client.create_screenshot_async( + self.instance_id, + ) + logger.info( + f"Instance {self.instance_id}: screenshot " + f"task created successfully, task_id:{task_id}", + ) + result = await self.eds_client.describe_tasks_async([task_id]) + return result + + def get_screenshot_sdk(self) -> str: + logger.info(f"Instance {self.instance_id}: getting screenshot") + task_id = self.eds_client.create_screenshot(self.instance_id) + logger.info( + f"Instance {self.instance_id}: screenshot " + f"task created successfully, task_id:{task_id}", + ) + result = self.eds_client.describe_tasks([task_id]) + return result + + async def get_screenshot_async(self) -> str: + local_file_name = f"{uuid.uuid4().hex}__screenshot.png" + mobile_screen_file_path = f"/sdcard/{local_file_name}" + des_oss_sub_path = f"__mPLUG__/{self.instance_id}/{local_file_name}" + print( + f"mobile path: {mobile_screen_file_path} , " + f"des_oss_sub_path: {des_oss_sub_path}", + ) + logger.info( + f"mobile path: {mobile_screen_file_path}" + f"des_oss_sub_path: {des_oss_sub_path}", + ) + retry = 3 + while retry > 0: + try: + logger.info(f"Instance {self.instance_id}: getting screenshot") + ( + status, + output, + ) = await self.eds_client.run_command_with_wait_async( + self.instance_id, + f"screencap {mobile_screen_file_path} " + f"&& md5sum {mobile_screen_file_path}", + ) + logger.info( + f"Instance {self.instance_id}: " + f"screenshot {status}{output}," + f" starting OSS upload", + ) + await self.eds_client.run_command_with_wait_async( + self.instance_id, + f"ossutil cp {mobile_screen_file_path} {self.des_oss_dir}" + f" -i {self.oss_ak} -k {self.oss_sk} -e {self.endpoint}", + ) + + screen_url = await self.oss_client.get_url_async( + des_oss_sub_path, + ) + logger.info( + f"Instance {self.instance_id}: screenshot" + f" succeeded {screen_url}" + f", starting to delete phone file", + ) + await self.eds_client.execute_command_async( + [self.instance_id], + f"rm {mobile_screen_file_path}", + ) + if screen_url is None: + logger.error("screen_shot is None") + raise ScreenshotError("screen_shot is None") + return screen_url + except Exception as e: + retry -= 1 + logger.error( + f"screen_shot error {e}" f", retrying: remain {retry}", + ) + continue + return "" + + def get_screenshot(self) -> str: + local_file_name = f"{uuid.uuid4().hex}__screenshot.png" + mobile_screen_file_path = f"/sdcard/{local_file_name}" + des_oss_sub_path = f"__mPLUG__/{self.instance_id}/{local_file_name}" + print( + f"mobile path: {mobile_screen_file_path} , " + f"des_oss_sub_path: {des_oss_sub_path}", + ) + logger.info( + f"mobile path: {mobile_screen_file_path}" + f"des_oss_sub_path: {des_oss_sub_path}", + ) + retry = 3 + while retry > 0: + try: + logger.info(f"Instance {self.instance_id}: getting screenshot") + status, output = self.eds_client.run_command_with_wait( + self.instance_id, + f"screencap {mobile_screen_file_path} " + f"&& md5sum {mobile_screen_file_path}", + ) + logger.info( + f"Instance {self.instance_id}: " + f"screenshot {status}{output}," + f" starting OSS upload", + ) + self.eds_client.run_command_with_wait( + self.instance_id, + f"ossutil cp {mobile_screen_file_path} {self.des_oss_dir}" + f" -i {self.oss_ak} -k {self.oss_sk} -e {self.endpoint}", + ) + + screen_url = self.oss_client.get_url( + des_oss_sub_path, + ) + logger.info( + f"Instance {self.instance_id}: screenshot succeeded" + f" {screen_url}, starting to delete phone file", + ) + self.eds_client.execute_command( + [self.instance_id], + f"rm {mobile_screen_file_path}", + ) + if screen_url is None: + logger.error("screen_shot is None") + raise ScreenshotError("Failed to get screenshot URL") + return screen_url + except Exception as e: + retry -= 1 + logger.error( + f"screen_shot error {e}" f", retrying: remain {retry}", + ) + continue + return "" + + def tab( + self, + x1: int, + y1: int, + x2: int, + y2: int, + width: int, + height: int, + ) -> tuple[bool, str | None]: + x, y = int((x1 + x2) / 2), int((y1 + y2) / 2) + input_x = int(x / 1000 * width) + input_y = int(y / 1000 * height) + return self.eds_client.run_command_with_wait( + self.instance_id, + f"input tap {input_x} {input_y}", + ) + + def long_press( + self, + x: int, + y: int, + press_time: str, + ) -> tuple[bool, str | None]: + time_ms = int(press_time) * 1000 + return self.eds_client.run_command_with_wait( + self.instance_id, + f"input swipe {x} {y} {x} {y} {time_ms}", + ) + + def download_set_apk( + self, + oss_url: str, + apk_name: str, + ) -> tuple[bool, str]: + """ + Download APK file from OSS URL and install + + Args: + oss_url (str): OSS download URL of APK file + apk_name (str): APK file name + + Returns: + tuple: (status, response) Installation + status and response information + """ + # Download APK file to cloud phone + download_path = f"/data/local/tmp/{apk_name}" + # download_command = f"curl -o {download_path} {oss_url}" + # Combine download and install commands, separated by semicolon + combined_command = ( + f"curl -o {download_path} {oss_url} && pm install {download_path}" + ) + + try: + status, rsp = self.eds_client.run_command_with_wait( + self.instance_id, + combined_command, + ) + + if not status: + return ( + False, + f"Download or installation failed" + f": {rsp or 'Unknown error'}", + ) + + # Check if installation succeeded + # (check if output contains Success) + if rsp and "Success" in rsp: + return True, rsp + else: + return False, f"Installation failed: {rsp or 'Unknown error'}" + + except Exception as e: + return False, f"Error downloading and installing APK: {str(e)}" + + def check_and_setup_app( + self, + internal_oss_url: str, + app_name: str, + ) -> tuple[bool, str | None]: + if internal_oss_url is None or app_name is None: + return False, "param is empty" + + return self.eds_client.run_command_with_wait( + internal_oss_url, + app_name, + ) + + def type(self, text: str) -> str | None: + time_start = time.time() + # Escape text content + escaped_text = text.replace('"', '\\"') + escaped_text = escaped_text.replace("'", "\\'") + + # Combine complete command: check input method -> + # install ADBKeyboard (if needed) -> + # enable and set ADBKeyboard -> send text -> + # disable ADBKeyboard + # Note: Simplified handling here, assuming ADBKeyboard + # is already installed + combined_command = ( + f"ime enable com.android.adbkeyboard/.AdbIME && " + f"ime set com.android.adbkeyboard/.AdbIME && " + f"sleep 0.3 && " + f'am broadcast -a ADB_INPUT_TEXT --es msg "{escaped_text}" && ' + f"sleep 0.2 && " + f"ime disable com.android.adbkeyboard/.AdbIME" + ) + + status, rsp = self.eds_client.run_command_with_wait( + self.instance_id, + combined_command, + slot_time=0.5, + ) + print(f"{status}{rsp}") + print(f"Text input time: {time.time() - time_start}") + return rsp + + def slide( + self, + x1: int, + y1: int, + x2: int, + y2: int, + ) -> tuple[bool, str | None]: + return self.eds_client.run_command_with_wait( + self.instance_id, + f"input swipe {x1} {y1} {x2} {y2} 500", + ) + + def back(self) -> tuple[bool, str | None]: + return self.eds_client.run_command_with_wait( + self.instance_id, + "input keyevent KEYCODE_BACK", + ) + + def home(self) -> tuple[bool, str | None]: + return self.eds_client.run_command_with_wait( + self.instance_id, + "am start -a android.intent.action.MAIN" + " -c android.intent.category.HOME", + ) + + def menu(self) -> tuple[bool, str | None]: + return self.eds_client.run_command_with_wait( + self.instance_id, + "input keyevent 82", + ) + + def enter(self) -> tuple[bool, str | None]: + return self.eds_client.run_command_with_wait( + self.instance_id, + "input keyevent 66", + ) + + def kill_the_front_app(self) -> tuple[bool, str | None]: + command = ( + "am force-stop $(dumpsys activity activities | " + "grep mResumedActivity" + " | awk '{print $4}' | cut -d " + "'/' -f 1)" + ) + return self.eds_client.run_command_with_wait( + self.instance_id, + command, + ) + + def run_command(self, command: str) -> tuple[bool, str | None]: + return self.eds_client.run_command_with_wait( + self.instance_id, + command, + ) + + async def tab_async( + self, + x1: int, + y1: int, + x2: int, + y2: int, + width: int, + height: int, + ) -> tuple[bool, str | None]: + x, y = int((x1 + x2) / 2), int((y1 + y2) / 2) + input_x = int(x / 1000 * width) + input_y = int(y / 1000 * height) + return await self.eds_client.run_command_with_wait_async( + self.instance_id, + f"input tap {input_x} {input_y}", + ) + + async def long_press_async( + self, + x: int, + y: int, + press_time: str, + ) -> tuple[bool, str | None]: + time_ms = int(press_time) * 1000 + return await self.eds_client.run_command_with_wait_async( + self.instance_id, + f"input swipe {x} {y} {x} {y} {time_ms}", + ) + + async def download_set_apk_async( + self, + oss_url: str, + apk_name: str, + ) -> tuple[bool, str]: + """ + Download APK file from OSS URL and install + + Args: + oss_url (str): OSS download URL of APK file + apk_name (str): APK file name + + Returns: + tuple: (status, response) Installation status and + response information + """ + # Download APK file to cloud phone + download_path = f"/data/local/tmp/{apk_name}" + # download_command = f"curl -o {download_path} {oss_url}" + # Combine download and install commands, separated by semicolon + combined_command = ( + f"curl -o {download_path} {oss_url} && pm install {download_path}" + ) + + try: + status, rsp = await self.eds_client.run_command_with_wait_async( + self.instance_id, + combined_command, + ) + + if not status: + return ( + False, + f"Download or installation failed:" + f" {rsp or 'Unknown error'}", + ) + + # Check if installation succeeded + # (check if output contains Success) + if rsp and "Success" in rsp: + return True, rsp + else: + return False, f"Installation failed: {rsp or 'Unknown error'}" + + except Exception as e: + return False, f"Error downloading and installing APK: {str(e)}" + + async def check_and_setup_app_async( + self, + internal_oss_url: str, + app_name: str, + ) -> tuple[bool, str | None]: + if internal_oss_url is None or app_name is None: + return False, "param is empty" + + status_in, rsp_in = await self.download_set_apk_async( + internal_oss_url, + app_name, + ) + + # Return original input method ID for later restoration + return status_in, rsp_in + + async def type_async(self, text: str) -> str | None: + time_start = time.time() + # Escape text content + escaped_text = text.replace('"', '\\"') + escaped_text = escaped_text.replace("'", "\\'") + + # Combine complete command: check input method -> + # install ADBKeyboard (if needed) -> + # enable and set ADBKeyboard -> send text -> + # disable ADBKeyboard + # Note: Simplified handling here, assuming ADBKeyboard + # is already installed + combined_command = ( + f"ime enable com.android.adbkeyboard/.AdbIME && " + f"ime set com.android.adbkeyboard/.AdbIME && " + f"sleep 0.3 && " + f'am broadcast -a ADB_INPUT_TEXT --es msg "{escaped_text}" && ' + f"sleep 0.2 && " + f"ime disable com.android.adbkeyboard/.AdbIME" + ) + + status, rsp = await self.eds_client.run_command_with_wait_async( + self.instance_id, + combined_command, + slot_time=0.5, + ) + print(f"{status}{rsp}") + print(f"Text input time: {time.time() - time_start}") + return rsp + + async def slide_async( + self, + x1: int, + y1: int, + x2: int, + y2: int, + ) -> tuple[bool, str | None]: + return await self.eds_client.run_command_with_wait_async( + self.instance_id, + f"input swipe {x1} {y1} {x2} {y2} 500", + ) + + async def back_async(self) -> tuple[bool, str | None]: + return await self.eds_client.run_command_with_wait_async( + self.instance_id, + "input keyevent KEYCODE_BACK", + ) + + async def home_async(self) -> tuple[bool, str | None]: + return await self.eds_client.run_command_with_wait_async( + self.instance_id, + "am start -a android.intent.action.MAIN" + " -c android.intent.category.HOME", + ) + + async def menu_async(self) -> tuple[bool, str | None]: + return await self.eds_client.run_command_with_wait_async( + self.instance_id, + "input keyevent 82", + ) + + async def enter_async(self) -> tuple[bool, str | None]: + return await self.eds_client.run_command_with_wait_async( + self.instance_id, + "input keyevent 66", + ) + + async def kill_the_front_app_async(self) -> tuple[bool, str | None]: + command = ( + "am force-stop $(dumpsys activity activities | " + "grep mResumedActivity" + " | awk '{print $4}' | cut -d " + "'/' -f 1)" + ) + return await self.eds_client.run_command_with_wait_async( + self.instance_id, + command, + ) + + async def run_command_async(self, command: str) -> tuple[bool, str | None]: + return await self.eds_client.run_command_with_wait_async( + self.instance_id, + command, + ) + + def send_file(self, source_file_path: str, upload_url: str) -> int: + return self.eds_client.send_file( + [self.instance_id], + source_file_path, + upload_url, + ) + + async def send_file_async( + self, + source_file_path: str, + upload_url: str, + ) -> int: + return await self.eds_client.send_file_async( + [self.instance_id], + source_file_path, + upload_url, + ) + + def fetch_file( + self, + source_file_path: str, + upload_endpoint: str, + upload_url: str, + ) -> int: + return self.eds_client.fetch_file( + [self.instance_id], + source_file_path, + upload_endpoint, + upload_url, + ) + + async def fetch_file_async( + self, + source_file_path: str, + upload_endpoint: str, + upload_url: str, + ) -> int: + return await self.eds_client.fetch_file_async( + [self.instance_id], + source_file_path, + upload_endpoint, + upload_url, + ) + + def remove_file(self, file_path: str) -> tuple[bool, str | None]: + # Use rm command to delete file, -r recursively + # delete directory, -f force delete + command = f"rm -rf '{file_path}'" + + return self.eds_client.run_command_with_wait( + self.instance_id, + command, + ) + + async def remove_file_async( + self, + file_path: str, + ) -> tuple[bool, str | None]: + # Use rm command to delete file, -r recursively + # delete directory, -f force delete + command = f"rm -rf '{file_path}'" + + return await self.eds_client.run_command_with_wait_async( + self.instance_id, + command, + ) diff --git a/src/agentscope_runtime/sandbox/box/cloud_api/cloud_computer_sandbox.py b/src/agentscope_runtime/sandbox/box/cloud_api/cloud_computer_sandbox.py new file mode 100644 index 000000000..c2efa8e26 --- /dev/null +++ b/src/agentscope_runtime/sandbox/box/cloud_api/cloud_computer_sandbox.py @@ -0,0 +1,761 @@ +# -*- coding: utf-8 -*- +import os +import uuid +import logging +import time +from typing import Callable, List, Any, Optional, Dict +from fastapi import HTTPException +from agentscope_runtime.sandbox.enums import SandboxType +from agentscope_runtime.sandbox.registry import SandboxRegistry +from .client.cloud_computer_wy import ClientPool +from ..cloud.cloud_sandbox import CloudSandbox + + +logger = logging.getLogger(__name__) + + +@SandboxRegistry.register( + "aliyun-cloud-computer", + sandbox_type=SandboxType.CLOUD_COMPUTER, + security_level="high", + timeout=600, + description="Alibaba Cloud Wuying Cloud Computer Sandbox", +) +class CloudComputerSandbox(CloudSandbox): + def __init__( + self, + *, + desktop_id: Optional[str] = None, + timeout: int = 600, + sandbox_type: SandboxType = SandboxType.CLOUD_COMPUTER, + auto_wakeup: bool = True, + screenshot_dir: Optional[str] = None, + command_timeout: int = 60, + **kwargs, + ) -> None: + resolved_desktop_id = desktop_id or os.environ.get("DESKTOP_ID") + + if not resolved_desktop_id: + raise ValueError( + "desktop_id is required. Provide desktop_id, sandbox_id," + " or set DESKTOP_ID.", + ) + + self.desktop_id = resolved_desktop_id + self.auto_wakeup = auto_wakeup + if screenshot_dir: + self.screenshot_dir = screenshot_dir + elif os.environ.get("CLOUD_COMPUTER_SCREENSHOT_DIR"): + self.screenshot_dir = os.environ.get( + "CLOUD_COMPUTER_SCREENSHOT_DIR", + ) + else: + # Get the directory of the current file and create + # a screenshots subdirectory under it + current_dir = os.path.dirname(os.path.abspath(__file__)) + self.screenshot_dir = os.path.join( + current_dir, + "cloud_computer_screenshots", + ) + + os.makedirs(self.screenshot_dir, exist_ok=True) + self.command_timeout = command_timeout + + kwargs.pop("desktop_id", None) + + super().__init__( + timeout=timeout, + sandbox_type=sandbox_type, + **kwargs, + ) + + # ------------------------------------------------------------------ + # CloudSandbox abstract implementations + # ------------------------------------------------------------------ + def _initialize_cloud_client(self): # type: ignore[override] + self._client_pool = ClientPool() + instance_manager = self._client_pool.get_instance_manager( + self.desktop_id, + ) + if instance_manager is None: + raise RuntimeError( + "Failed to acquire EcdInstanceManager for cloud computer", + ) + + self.instance_manager = instance_manager + self.oss_client = self._client_pool.get_oss_client() + return instance_manager + + def _create_cloud_sandbox(self) -> Optional[str]: + try: + if self.auto_wakeup: + try: + ready_status = self._wait_for_pc_ready( + self.desktop_id, + ) + if not ready_status: + logger.warning( + "Wakeup desktop returned non-success " + "status %s for %s", + ready_status, + self.desktop_id, + ) + except Exception as wake_error: + logger.warning( + f"Wakeup desktop failed: {wake_error}", + ) + + self.instance_manager.refresh_aurh_code() + return self.desktop_id + except Exception as error: + logger.error( + f"Error preparing cloud computer sandbox: {error}", + ) + return None + + def _delete_cloud_sandbox(self, sandbox_id: str) -> bool: + try: + status = self.instance_manager.ecd_client.hibernate_desktops( + [sandbox_id], + ) + return status == 200 + except Exception as error: # pylint: disable=broad-except + logger.error( + "Failed to hibernate desktop %s: %s", + sandbox_id, + error, + ) + return False + + def _call_cloud_tool( + self, + tool_name: str, + arguments: Dict[str, Any], + ) -> Any: + tool_mapping: Dict[str, Callable[[Dict[str, Any]], Dict[str, Any]]] = { + "run_shell_command": self._tool_run_shell_command, + "run_ipython_cell": self._tool_execute_code, + "screenshot": self._tool_screenshot, + "write_file": self._tool_write_file, + "read_file": self._tool_read_file, + "remove_file": self._tool_remove_file, + "press_key": self._tool_press_key, + "click": self._tool_click, + "right_click": self._tool_right_click, + "click_and_type": self._tool_click_and_type, + "append_text": self._tool_append_text, + "launch_app": self._tool_launch_app, + "go_home": self._tool_go_home, + "mouse_move": self._tool_mouse_move, + "scroll": self._tool_scroll, + "scroll_pos": self._tool_scroll_pos, + } + + handler: Callable[[Dict[str, Any]], Dict[str, Any]] = tool_mapping.get( + tool_name, + ) + + if handler is None: + return { + "success": False, + "error": f"Tool '{tool_name}' is not supported in " + f"CloudComputerSandbox", + "tool_name": tool_name, + } + + try: + return handler(arguments or {}) + except Exception as error: # pylint: disable=broad-except + logger.error( + "Error executing tool %s: %s", + tool_name, + error, + ) + return { + "success": False, + "error": str(error), + "tool_name": tool_name, + "arguments": arguments, + } + + def _get_cloud_provider_name(self) -> str: # type: ignore[override] + return "Alibaba Cloud Wuying" + + def _handle_desktop_status( + self, + desktop_id: str, + current_status: str, + ) -> None: + """Handle logic for different desktop statuses""" + if current_status == "stopped": + self._start_desktop(desktop_id) + elif current_status == "hibernated": + self._wakeup_desktop(desktop_id) + else: + # Device status not found, wait a bit and query again + print( + f"Equipment for desktop_id {desktop_id} {current_status}," + " and wait", + ) + logger.info( + f"Equipment for desktop_id {desktop_id} {current_status}," + " and wait", + ) + time.sleep(2) + + def _start_desktop(self, desktop_id: str) -> None: + """Start desktop""" + print(f"Equipment restart for desktop_id {desktop_id}") + logger.info(f"Equipment restart for desktop_id {desktop_id}") + e_client = self.instance_manager.ecd_client + method = e_client.start_desktops + status = method([desktop_id]) + if status != 200: + raise HTTPException( + 503, + "Failed to start computer resource", + ) + + def _wakeup_desktop(self, desktop_id: str) -> None: + """Wake up desktop""" + print(f"Equipment wakeup for desktop_id {desktop_id}") + logger.info(f"Equipment wakeup for desktop_id {desktop_id}") + e_client = self.instance_manager.ecd_client + method = e_client.wakeup_desktops + status = method([desktop_id]) + if status != 200: + raise HTTPException( + 503, + "Failed to start computer resource", + ) + + def _wait_for_pc_ready( + self, + desktop_id: str, + max_wait_time: int = 300, + stability_check_duration: int = 3, + ): + """Asynchronously wait for PC device to be ready, + with stability check added""" + start_time = time.time() + stable_start_time = None + ready_status = False + while True: + try: + pc_info = self.instance_manager.ecd_client.search_desktop_info( + [desktop_id], + ) + + if pc_info and pc_info[0].desktop_status.lower() == "running": + # First time detecting running status, + # start stability check + if stable_start_time is None: + stable_start_time = time.time() + print( + f"PC {desktop_id} status: running, " + "starting stability check...", + ) + + # Check if device has been running + # stably for sufficient duration + stable_duration = time.time() - stable_start_time + if stable_duration >= stability_check_duration: + print( + f"✓ PC {desktop_id} is stable and ready" + f" (stable for {stable_duration:.1f}s)", + ) + ready_status = True + break + print( + f"PC {desktop_id} stability check: " + f"{stable_duration:.1f}" + f"s/{stability_check_duration}s", + ) + + else: + # Status is not running, reset stability check + if stable_start_time is not None: + print( + f"PC {desktop_id} status changed, resetting" + f" stability check", + ) + stable_start_time = None + + current_status = ( + pc_info[0].desktop_status.lower() + if pc_info + else "unknown" + ) + print( + f"PC {desktop_id} status: {current_status}, " + "waiting...", + ) + + # Handle different desktop statuses + self._handle_desktop_status(desktop_id, current_status) + + # Check if timeout occurred + if time.time() - start_time > max_wait_time: + raise TimeoutError( + f"PC {desktop_id} failed to become ready within" + f" {max_wait_time} seconds", + ) + + except Exception as e: + print(f"Error checking PC status for {desktop_id}: {e}") + # Reset stability check when exception occurs + stable_start_time = None + + time.sleep(3) + return ready_status + + # ------------------------------------------------------------------ + # Tool handlers + # ------------------------------------------------------------------ + def _tool_run_shell_command( + self, + arguments: Dict[str, Any], + ) -> Dict[str, Any]: + command = arguments.get("command") + if not command: + return { + "success": False, + "error": "'command' argument is required", + } + + slot_time = arguments.get("slot_time") + timeout = arguments.get("timeout", self.command_timeout) + _sin = self.instance_manager + status, output = _sin.run_command_power_shell( + command, + slot_time, + timeout, + ) + return { + "success": bool(status), + "output": output, + } + + def _tool_execute_code( + self, + arguments: Dict[str, Any], + ) -> Dict[str, Any]: + """Execute Python code .""" + code = arguments.get("code") + if not code: + return {"success": False, "error": "'code' argument is required"} + + slot_time = arguments.get("slot_time") + timeout = arguments.get("timeout", self.command_timeout) + status, output = self.instance_manager.run_code( + code, + slot_time, + timeout, + ) + return { + "success": bool(status), + "output": output, + } + + def _tool_press_key(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + key = arguments.get("key") + if not key: + return {"success": False, "error": "'key' argument is required"} + + status, output = self.instance_manager.press_key(key) + return { + "success": bool(status), + "output": output, + } + + def _tool_click(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + x = arguments.get("x") + y = arguments.get("y") + if x is None or y is None: + return { + "success": False, + "error": "'x' and 'y' arguments are required", + } + count = arguments.get("count", 1) + + status, output = self.instance_manager.tap(int(x), int(y), int(count)) + return { + "success": bool(status), + "output": output, + } + + def _tool_right_click(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + x = arguments.get("x") + y = arguments.get("y") + if x is None or y is None: + return { + "success": False, + "error": "'x' and 'y' arguments are required", + } + count = arguments.get("count", 1) + + status, output = self.instance_manager.right_tap( + int(x), + int(y), + int(count), + ) + return { + "success": bool(status), + "output": output, + } + + def _tool_click_and_type( + self, + arguments: Dict[str, Any], + ) -> Dict[str, Any]: + x = arguments.get("x") + y = arguments.get("y") + text = arguments.get("text", "") + if x is None or y is None: + return { + "success": False, + "error": "'x' and 'y' arguments are required", + } + + status, output = self.instance_manager.tap_type_enter( + int(x), + int(y), + str(text), + ) + return { + "success": bool(status), + "output": output, + } + + def _tool_append_text(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + x = arguments.get("x") + y = arguments.get("y") + text = arguments.get("text", "") + if x is None or y is None: + return { + "success": False, + "error": "'x' and 'y' arguments are required", + } + + status, output = self.instance_manager.append( + int(x), + int(y), + str(text), + ) + return { + "success": bool(status), + "output": output, + } + + def _tool_launch_app(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + app = arguments.get("app") or arguments.get("name") + if not app: + return {"success": False, "error": "'app' argument is required"} + + status, output = self.instance_manager.open_app(str(app)) + return { + "success": bool(status), + "output": output, + } + + def _tool_go_home(self, _arguments: Dict[str, Any]) -> Dict[str, Any]: + status, output = self.instance_manager.home() + return { + "success": bool(status), + "output": output, + } + + def _tool_mouse_move(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + x = arguments.get("x") + y = arguments.get("y") + if x is None or y is None: + return { + "success": False, + "error": "'x' and 'y' arguments are required", + } + + status, output = self.instance_manager.mouse_move(int(x), int(y)) + return { + "success": bool(status), + "output": output, + } + + def _tool_scroll(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + pixels = int(arguments.get("pixels", 1)) + status, output = self.instance_manager.scroll(pixels) + return { + "success": bool(status), + "output": output, + } + + def _tool_scroll_pos(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + x = arguments.get("x") + y = arguments.get("y") + pixels = int(arguments.get("pixels", 1)) + if x is None or y is None: + return { + "success": False, + "error": "'x' and 'y' arguments are required", + } + + status, output = self.instance_manager.scroll_pos( + int(x), + int(y), + pixels, + ) + return { + "success": bool(status), + "output": output, + } + + def _tool_screenshot(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + file_name = arguments.get("file_name", uuid.uuid4().hex) + local_dir = arguments.get("local_dir", self.screenshot_dir) + os.makedirs(local_dir, exist_ok=True) + local_path = os.path.join(local_dir, f"{file_name}.png") + + result = self.get_screenshot_oss_save_local(file_name, local_path) + + success = bool(result) and result != "Error" + return { + "success": success, + "output": result if success else None, + "error": result if hasattr(result, "error") else None, + } + + # Add the following new tool methods in the tool handlers section + + def _tool_write_file(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + file_path = arguments.get("file_path") + content = arguments.get("content", "") + encoding = arguments.get("encoding", "utf-8") + + if not file_path: + return { + "success": False, + "error": "'file_path' argument is required", + } + + try: + status, output = self.instance_manager.write_file( + file_path, + content, + encoding, + ) + return { + "success": bool(status), + "output": output, + "file_path": file_path, + } + except Exception as error: + logger.error("Error writing file %s: %s", file_path, error) + return { + "success": False, + "error": str(error), + "file_path": file_path, + } + + def _tool_read_file(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + file_path = arguments.get("file_path") + encoding = arguments.get("encoding", "utf-8") + + if not file_path: + return { + "success": False, + "error": "'file_path' argument is required", + } + + try: + status, output = self.instance_manager.read_file( + file_path, + encoding, + ) + return { + "success": bool(status), + "output": output if status else None, + "file_path": file_path, + } + except Exception as error: + logger.error("Error reading file %s: %s", file_path, error) + return { + "success": False, + "error": str(error), + "file_path": file_path, + } + + def _tool_remove_file(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + file_path = arguments.get("file_path") + + if not file_path: + return { + "success": False, + "error": "'file_path' argument is required", + } + + try: + status, output = self.instance_manager.remove_file(file_path) + return { + "success": bool(status), + "output": output, + "file_path": file_path, + } + except Exception as error: + logger.error("Error removing file %s: %s", file_path, error) + return { + "success": False, + "error": str(error), + "file_path": file_path, + } + + # ------------------------------------------------------------------ + # Sandbox metadata APIs + # ------------------------------------------------------------------ + def list_tools(self, tool_type: Optional[str] = None) -> Dict[str, Any]: + command_tools = [ + "run_shell_command", + "run_ipython_cell", + "write_file", + "read_file", + "remove_file", + ] + input_tools = [ + "press_key", + "click", + "right_click", + "click_and_type", + "append_text", + "mouse_move", + "scroll", + "scroll_pos", + ] + system_tools = [ + "screenshot", + "go_home", + "launch_app", + ] + + tools_by_type = { + "command": command_tools, + "input": input_tools, + "system": system_tools, + } + + if tool_type: + tools = tools_by_type.get(tool_type, []) + return { + "tools": tools, + "tool_type": tool_type, + "sandbox_id": self._sandbox_id, + "total_count": len(tools), + } + + all_tools: List[str] = [] + for group in tools_by_type.values(): + all_tools.extend(group) + + return { + "tools": all_tools, + "tools_by_type": tools_by_type, + "tool_type": tool_type, + "sandbox_id": self._sandbox_id, + "total_count": len(all_tools), + } + + def get_screenshot_base64_save_local( + self, + local_file_name: str, + local_save_path: str, + max_retry: int = 5, + ) -> str: + try: + for _ in range(max_retry): + screen_base64 = self.instance_manager.get_screenshot( + local_file_name, + local_save_path, + ) + if screen_base64: + return screen_base64 + return "Error" # Return error after retry attempts are exhausted + except Exception as error: # pylint: disable=broad-except + logger.error("Failed to screenshot_base64 desktop %s", error) + return "Error" + + async def get_screenshot_base64_save_local_async( + self, + local_file_name: str, + local_save_path: str, + max_retry: int = 5, + ) -> str: + try: + for _ in range(max_retry): + screen_base64 = ( + await self.instance_manager.get_screenshot_async( + local_file_name, + local_save_path, + ) + ) + if screen_base64: + return screen_base64 + return "Error" + except Exception as error: # pylint: disable=broad-except + logger.error("Failed to screenshot_base64 desktop %s", error) + return "Error" + + def get_screenshot_oss_save_local( + self, + local_file_name: str, + local_save_path: str, + max_retry: int = 5, + ) -> str: + try: + for _ in range(max_retry): + screen_oss_url = self.instance_manager.get_screenshot_oss_url( + local_file_name, + local_save_path, + ) + if screen_oss_url: + return screen_oss_url + return "Error" + except Exception as error: # pylint: disable=broad-except + logger.error("Failed to screenshot_oss desktop %s", error) + return "Error" + + async def get_screenshot_oss_save_local_async( + self, + local_file_name: str, + local_save_path: str, + max_retry: int = 5, + ) -> str: + try: + for _ in range(max_retry): + screen_oss_url = ( + await self.instance_manager.get_screenshot_oss_url( + local_file_name, + local_save_path, + ) + ) + if screen_oss_url: + return screen_oss_url + return "Error" + except Exception as error: # pylint: disable=broad-except + logger.error("Failed to screenshot_oss desktop %s", error) + return "Error" + + def get_instance_manager(self, desktop_id: str) -> Any: + retry = 3 + while retry > 0: + try: + # Use ClientPool to get instance manager, + # avoid creating duplicate + # client connections + client_pool = getattr(self, "_client_pool", ClientPool()) + manager = client_pool.get_instance_manager(desktop_id) + manager.refresh_aurh_code() + return manager + except Exception as e: + retry -= 1 + logger.warning( + f"get manager error, retrying: remain {retry}, {e}", + ) + continue + return None diff --git a/src/agentscope_runtime/sandbox/box/cloud_api/cloud_phone_sandbox.py b/src/agentscope_runtime/sandbox/box/cloud_api/cloud_phone_sandbox.py new file mode 100644 index 000000000..5e812a99a --- /dev/null +++ b/src/agentscope_runtime/sandbox/box/cloud_api/cloud_phone_sandbox.py @@ -0,0 +1,627 @@ +# -*- coding: utf-8 -*- +import os +import asyncio +import logging +import time +from typing import Optional, Any, List, Dict +from typing import Callable +from fastapi import HTTPException +from agentscope_runtime.sandbox.registry import SandboxRegistry +from agentscope_runtime.sandbox.enums import SandboxType +from agentscope_runtime.sandbox.box.cloud.cloud_sandbox import CloudSandbox +from .client.cloud_phone_wy import ClientPool, EdsInstanceManager + +logger = logging.getLogger(__name__) + + +@SandboxRegistry.register( + "aliyun-cloud-phone", + sandbox_type=SandboxType.CLOUD_PHONE, + security_level="high", + timeout=600, + description="Alibaba Cloud EDS Cloud Phone Sandbox Environment", +) +class CloudPhoneSandbox(CloudSandbox): + def __init__( + self, + *, + instance_id: Optional[str] = None, + timeout: int = 600, + sandbox_type: SandboxType = SandboxType.CLOUD_PHONE, + auto_start: bool = True, + **kwargs, + ) -> None: + """ + Initialize the CloudPhone sandbox. + + Args: + instance_id: Cloud phone instance ID (from environment + or parameter) + timeout: Timeout for operations in seconds + sandbox_type: Type of sandbox (default: CLOUD_PHONE) + auto_start: Whether to auto-start the instance if stopped + **kwargs: Additional configuration + """ + resolved_instance_id = instance_id or os.environ.get( + "PHONE_INSTANCE_ID", + ) + if not resolved_instance_id: + raise ValueError( + "instance_id is required. Provide instance_id.", + ) + + self.instance_id = resolved_instance_id + self.auto_start = auto_start + + kwargs.pop("instance_id", None) + + super().__init__( + timeout=timeout, + sandbox_type=sandbox_type, + **kwargs, + ) + + # ------------------------------------------------------------------ + # CloudSandbox abstract implementations + # ------------------------------------------------------------------ + def _initialize_cloud_client(self): # type: ignore[override] + """Initialize EDS client via shared client pool.""" + self._client_pool = ClientPool() + instance_manager = self._client_pool.get_instance_manager( + self.instance_id, + ) + if instance_manager is None: + raise RuntimeError( + "Failed to acquire EdsInstanceManager for cloud phone", + ) + + self.instance_manager = instance_manager + self.eds_client = self._client_pool.get_eds_client() + self.oss_client = self._client_pool.get_oss_client() + return self.eds_client + + def _create_cloud_sandbox(self) -> Optional[str]: + """Ensure cloud phone instance is ready.""" + try: + # Auto-start instance if needed + if self.auto_start: + try: + ready_status = self._wait_for_phone_ready( + self.instance_id, + stability_check_duration=2, + ) + if not ready_status: + logger.warning( + "Wakeup desktop returned non-success" + " status %s for %s", + ready_status, + self.instance_id, + ) + except Exception as start_error: + logger.warning( + f"Start instance failed: {start_error}", + ) + self.instance_manager.refresh_ticket() + return self.instance_id + except Exception as error: + logger.error( + f"Error preparing cloud phone sandbox: {error}", + ) + return None + + def _delete_cloud_sandbox(self, sandbox_id: str) -> bool: + """Stop cloud phone instance (optional cleanup).""" + try: + # Note: We don't delete the instance, just stop it + # The instance can be reused later + status = self.eds_client.stop_equipment([sandbox_id]) + return status == 200 + except Exception as error: # pylint: disable=broad-except + logger.error( + f"Failed to stop instance {sandbox_id}: {error}", + ) + return False + + def _call_cloud_tool( + self, + tool_name: str, + arguments: Dict[str, Any], + ) -> Any: + """ + Call a tool in the cloud phone environment. + + Args: + tool_name: Name of the tool to call + arguments: Arguments for the tool + + Returns: + Tool execution result + """ + tool_mapping: Dict[str, Callable[[Dict[str, Any]], Dict[str, Any]]] = { + "run_shell_command": self._tool_run_shell_command, + "screenshot": self._tool_screenshot, + "send_file": self._tool_send_file, + "remove_file": self._tool_remove_file, + "click": self._tool_click, + "type_text": self._tool_type_text, + "slide": self._tool_slide, + "go_home": self._tool_go_home, + "back": self._tool_back, + "menu": self._tool_menu, + "enter": self._tool_enter, + "kill_front_app": self._tool_kill_front_app, + } + + handler: Callable[ + [Dict[str, Any]], + Dict[str, Any], + ] = tool_mapping.get(tool_name) + + if handler is None: + return { + "success": False, + "error": f"Tool '{tool_name}' is not supported in" + f" CloudPhoneSandbox", + "tool_name": tool_name, + } + + try: + return handler(arguments or {}) + except Exception as error: # pylint: disable=broad-except + logger.error( + "Error executing tool %s: %s", + tool_name, + error, + ) + return { + "success": False, + "error": str(error), + "tool_name": tool_name, + "arguments": arguments, + } + + def _get_cloud_provider_name(self) -> str: # type: ignore[override] + """Get the name of the cloud provider.""" + return "Alibaba Cloud EDS" + + # ------------------------------------------------------------------ + # Tool handlers + # ------------------------------------------------------------------ + def _tool_run_shell_command( + self, + arguments: Dict[str, Any], + ) -> Dict[str, Any]: + """Execute ADB shell command in the cloud phone.""" + command = arguments.get("command") + if not command: + return { + "success": False, + "error": "'command' argument is required", + } + + status, output = self.instance_manager.run_command(str(command)) + return { + "success": bool(status), + "output": output or "", + } + + def _tool_click(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + """Click at coordinates in the cloud phone.""" + x1 = arguments.get("x1", 0) + y1 = arguments.get("y1", 0) + x2 = arguments.get("x2", 0) + y2 = arguments.get("y2", 0) + width = arguments.get("width", 0) + height = arguments.get("height", 0) + + status, output = self.instance_manager.tab( + int(x1), + int(y1), + int(x2), + int(y2), + int(width), + int(height), + ) + return { + "success": bool(status), + "output": output or "", + } + + def _tool_type_text(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + """Type text in the cloud phone. need install ADBKeyboard""" + text = arguments.get("text", "") + if not text: + return {"success": False, "error": "'text' argument is required"} + + output = self.instance_manager.type(str(text)) + return { + "success": True, + "output": output or "", + } + + def _tool_slide(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + """Slide from one point to another.""" + x1 = arguments.get("x1") + y1 = arguments.get("y1") + x2 = arguments.get("x2") + y2 = arguments.get("y2") + if x1 is None or y1 is None or x2 is None or y2 is None: + return { + "success": False, + "error": "'x1', 'y1', 'x2', 'y2' arguments are required", + } + + status, output = self.instance_manager.slide( + int(x1), + int(y1), + int(x2), + int(y2), + ) + return { + "success": bool(status), + "output": output or "", + } + + def _tool_go_home(self, _arguments: Dict[str, Any]) -> Dict[str, Any]: + """Go to home screen.""" + status, output = self.instance_manager.home() + return { + "success": bool(status), + "output": output or "", + } + + def _tool_back(self, _arguments: Dict[str, Any]) -> Dict[str, Any]: + """Press back button.""" + status, output = self.instance_manager.back() + return { + "success": bool(status), + "output": output or "", + } + + def _tool_menu(self, _arguments: Dict[str, Any]) -> Dict[str, Any]: + """Press menu button.""" + status, output = self.instance_manager.menu() + return { + "success": bool(status), + "output": output or "", + } + + def _tool_enter(self, _arguments: Dict[str, Any]) -> Dict[str, Any]: + """Press enter button.""" + status, output = self.instance_manager.enter() + return { + "success": bool(status), + "output": output or "", + } + + def _tool_kill_front_app( + self, + _arguments: Dict[str, Any], + ) -> Dict[str, Any]: + """Kill the front app.""" + status, output = self.instance_manager.kill_the_front_app() + return { + "success": bool(status), + "output": output or "", + } + + def _tool_screenshot(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + """Take a screenshot.""" + max_retry = arguments.get("max_retry", 5) + + result = self.get_screenshot_oss_phone(max_retry) + + success = bool(result) and result != "Error" + return { + "success": success, + "output": result if success else None, + "error": result if hasattr(result, "error") else None, + } + + def _tool_send_file(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + """Send file to the cloud phone.""" + # File path on cloud phone includes filename + source_file_path = arguments.get("source_file_path") + # Public download URL of the file + upload_url = arguments.get("upload_url") + + if not source_file_path or not upload_url: + return { + "success": False, + "error": "'source_file_path' and 'upload_url' " + "arguments are required", + } + + try: + status_code = self.instance_manager.send_file( + source_file_path, + upload_url, + ) + return { + "success": status_code == 200, + "status_code": status_code, + "output": upload_url, + } + except Exception as error: + logger.error("Error sending file: %s", error) + return { + "success": False, + "error": str(error), + "source_file_path": source_file_path, + "upload_url": upload_url, + } + + def _tool_remove_file(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + """Remove file from the cloud phone.""" + file_path = arguments.get("file_path") + + if not file_path: + return { + "success": False, + "error": "'file_path' argument is required", + } + + try: + status, output = self.instance_manager.remove_file(file_path) + return { + "success": bool(status), + "output": output or "", + } + except Exception as error: + logger.error("Error removing file: %s", error) + return { + "success": False, + "error": str(error), + "file_path": file_path, + } + + # ------------------------------------------------------------------ + # Sandbox metadata APIs + # ------------------------------------------------------------------ + def list_tools(self, tool_type: Optional[str] = None) -> Dict[str, Any]: + """ + List available tools in the cloud phone sandbox. + + Args: + tool_type: Optional filter for tool type + (e.g., "input", "navigation", "command", "system") + + Returns: + Dictionary containing available tools organized by type + """ + input_tools = [ + "click", + "type_text", + "slide", + ] + navigation_tools = [ + "go_home", + "back", + "menu", + "enter", + ] + command_tools = [ + "run_shell_command", + "kill_front_app", + ] + system_tools = [ + "screenshot", + "send_file", + "remove_file", + ] + + tools_by_type = { + "input": input_tools, + "navigation": navigation_tools, + "command": command_tools, + "system": system_tools, + } + + if tool_type: + tools = tools_by_type.get(tool_type, []) + return { + "tools": tools, + "tool_type": tool_type, + "sandbox_id": self._sandbox_id, + "total_count": len(tools), + } + + all_tools: List[str] = [] + for group in tools_by_type.values(): + all_tools.extend(group) + + return { + "tools": all_tools, + "tools_by_type": tools_by_type, + "tool_type": tool_type, + "sandbox_id": self._sandbox_id, + "total_count": len(all_tools), + } + + # ------------------------------------------------------------------ + # Internal helpers + # ------------------------------------------------------------------ + async def _get_instance_manager_async(self, instance_id: str) -> Any: + """Get or create instance manager for the cloud phone.""" + retry = 3 + while retry > 0: + try: + logger.info( + "Starting cloud phone instance initialization," + " attempt count: %s", + retry, + ) + manager = await asyncio.to_thread( + EdsInstanceManager, + instance_id, + ) + return manager + except Exception as e: # pylint: disable=broad-except + retry -= 1 + logger.error( + "get manager error, retrying: remain %s, %s", + retry, + e, + ) + await asyncio.sleep(5) + continue + return None + + def _ensure_initialized(self) -> None: + """Helper method to check if async initialization was called.""" + if ( + not hasattr(self, "instance_manager") + or self.instance_manager is None + ): + raise RuntimeError( + "CloudPhone not initialized. Call 'await cloud_phone." + "initialize()' first.", + ) + + def _wait_for_phone_ready( + self, + instance_id: str, + max_wait_time: int = 300, + stability_check_duration: int = 4, + ): + """Asynchronously wait for phone device to be ready""" + start_time = time.time() + stable_start_time = None + while True: + try: + # Execute synchronous status check operation in thread pool + method = self.instance_manager.eds_client.list_instance + total_count, next_token, devices_info = method( + instance_ids=[instance_id], + ) + print(f"{total_count}{next_token}") + if ( + devices_info + and devices_info[0].android_instance_status.lower() + == "running" + ): + # First time detecting running status, + # start stability check + if stable_start_time is None: + stable_start_time = time.time() + print( + f"Phone {instance_id} status: running, " + "starting stability check...", + ) + + # Check if device has been running stably + # for sufficient duration + stable_duration = time.time() - stable_start_time + if stable_duration >= stability_check_duration: + print( + f"✓ Phone {instance_id} is stable and ready" + f" (stable for {stable_duration:.1f}s)", + ) + ready_status = True + break + print( + f"Phone {instance_id} stability check: " + f"{stable_duration:.1f}" + f"s/{stability_check_duration}s", + ) + + else: + # Status is not running, reset stability check + if stable_start_time is not None: + print( + f"PHONE {instance_id} status changed, " + "resetting stability check", + ) + stable_start_time = None + current_status = ( + devices_info[0].android_instance_status.lower() + if devices_info + else "unknown" + ) + print( + f"PHONE {instance_id} status: " + f"{current_status}, waiting...", + ) + if current_status == "stopped": + # Start device + print( + f"Equipment restart for instance_id {instance_id}", + ) + logger.info( + f"Equipment restart for instance_id {instance_id}", + ) + e_client = self.instance_manager.eds_client + method = e_client.start_equipment + status = method( + [instance_id], + ) + if status != 200: + raise HTTPException( + 503, + "Failed to start computer resource", + ) + else: + # Device status not found, wait a bit and query again + print( + f"Equipment for instance_id {instance_id} unknown," + " and wait", + ) + logger.info( + f"Equipment for instance_id {instance_id} unknown," + " and wait", + ) + time.sleep(2) + + # Check if timeout occurred + if time.time() - start_time > max_wait_time: + raise TimeoutError( + f"Phone {instance_id} failed to become ready " + f"within {max_wait_time} seconds", + ) + + except Exception as e: + print(f"Error checking phone status for {instance_id}: {e}") + + time.sleep(5) + return ready_status + + async def get_screenshot_oss_phone_async( + self, + max_retry: int = 5, + ) -> str: + self._ensure_initialized() + for _ in range(max_retry): + screen_url = await self.instance_manager.get_screenshot_sdk_async() + if screen_url: + return screen_url + return "Error" + + def get_screenshot_oss_phone( + self, + max_retry: int = 5, + ) -> str: + self._ensure_initialized() + for _ in range(max_retry): + screen_url = self.instance_manager.get_screenshot_sdk() + if screen_url: + return screen_url + return "Error" + + def get_instance_manager(self, instance_id: str) -> Any: + """Get or create instance manager for the cloud phone.""" + retry = 3 + while retry > 0: + try: + # Use ClientPool to get instance manager, avoid + # creating duplicate client connections + client_pool = getattr(self, "_client_pool", ClientPool()) + manager = client_pool.get_instance_manager(instance_id) + manager.refresh_ticket() + return manager + except Exception as e: # pylint: disable=broad-except + retry -= 1 + logger.warning( + f"get manager error, retrying: remain {retry}, {e}", + ) + continue + return None diff --git a/src/agentscope_runtime/sandbox/box/cloud_api/utils/__init__.py b/src/agentscope_runtime/sandbox/box/cloud_api/utils/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/src/agentscope_runtime/sandbox/box/cloud_api/utils/oss_client.py b/src/agentscope_runtime/sandbox/box/cloud_api/utils/oss_client.py new file mode 100644 index 000000000..6ccbdbbeb --- /dev/null +++ b/src/agentscope_runtime/sandbox/box/cloud_api/utils/oss_client.py @@ -0,0 +1,324 @@ +# -*- coding: utf-8 -*- +import asyncio +import os +import time +from typing import Optional +import aiofiles + + +class OSSFileNotFoundError(Exception): + """Exception raised when specified file is not found in OSS""" + + +class OSSClient: + def __init__( + self, + bucket_name: Optional[str] = "", + endpoint: Optional[str] = "", + ) -> None: + import oss2 + + if not bucket_name: + bucket_name = os.environ.get("EDS_OSS_BUCKET_NAME") + if not endpoint: + endpoint = os.environ.get("EDS_OSS_ENDPOINT") + ak = os.environ.get("EDS_OSS_ACCESS_KEY_ID") + # Your AccessKey Secret + sk = os.environ.get("EDS_OSS_ACCESS_KEY_SECRET") + auth = oss2.Auth(ak, sk) + self.__bucket__ = oss2.Bucket(auth, endpoint, bucket_name) + self.oss_path = os.environ.get("EDS_OSS_PATH") + + def get_signal_url( + self, + file_name: str, + expire: int = 3600 * 24 * 1, + ) -> str: + signed_url = self.__bucket__.sign_url( + "PUT", + f"{self.oss_path}{file_name}", + expire, + slash_safe=True, + ) + return signed_url + + def get_download_url( + self, + file_name: str, + expire: int = 3600 * 24 * 7, + ) -> str: + """ + Generate presigned URL for download + :param file_name: File name (relative to bucket path) + :param expire: Expiration time (seconds) + :return: Presigned URL + """ + return self.__bucket__.sign_url( + "GET", + f"{self.oss_path}{file_name}", + expire, + ) + + def oss_upload_data_and_sign( + self, + data: bytes, + file_name: str, + expire: int = 3600 * 1 * 1, + ) -> str: + """ + Upload byte data to OSS and return signed URL + + Args: + data (bytes): File data to upload + file_name (str): File name + expire (int): Expiration time for signed URL + (seconds), default 1 hour. + Returns: + str: Signed URL + """ + # Upload data + object_name = f"__mPLUG__/uploads/{file_name}" + self.__bucket__.put_object(object_name, data) + + # Generate signed URL + signed_url = self.__bucket__.sign_url("GET", object_name, expire) + return signed_url + + def upload_local_and_sign( + self, + file: bytes, + file_name: str, + expire: int = 3600 * 1 * 1, + ) -> str: + remote_path = f"{self.oss_path}{file_name}" + self.__bucket__.put_object(remote_path, file) + signed_url = self.__bucket__.sign_url("GET", remote_path, expire) + return signed_url + + def oss_upload_file_and_sign( + self, + filepath: str, + filename: str, + expire: int = 3600 * 1 * 1, + ) -> str: + """ + Upload local file to OSS and return signed URL. + + Args: + filepath (str): Full path of local file. + filename (str): File name to upload to OSS. + expire (int): Expiration time for signed URL + (seconds), default 1 hour. + + Returns: + str: Signed download URL of the file. + """ + remote_path = f"{self.oss_path}{filename}" + + # Open local file in binary read mode and upload + with open(filepath, "rb") as file_obj: + self.__bucket__.put_object(remote_path, file_obj) + + signed_url = self.__bucket__.sign_url("GET", remote_path, expire) + return signed_url + + def get_url(self, path: str, expire: int = 3600 * 90 * 24) -> str: + # Check if file exists + start_time = time.time() + while ( + not self.__bucket__.object_exists(path) + and time.time() - start_time < 20 + ): + print( + f"waiting for file to be uploaded, seconds:" + f" {time.time() - start_time}", + ) + time.sleep(1.5) + if not self.__bucket__.object_exists(path): + raise OSSFileNotFoundError(f"{path} does not exist") + signed_url = self.__bucket__.sign_url("GET", path, expire) + return signed_url + + async def get_signal_url_async( + self, + file_name: str, + expire: int = 3600 * 24 * 1, + ) -> str: + """Async version of get_signal_url method""" + loop = asyncio.get_event_loop() + signed_url = await loop.run_in_executor( + None, + self.__bucket__.sign_url, + "PUT", + f"{self.oss_path}{file_name}", + expire, + ) + return signed_url + + async def get_download_url_async( + self, + file_name: str, + expire: int = 3600 * 24 * 7, + ) -> str: + """ + Async version of generating presigned URL for download + :param file_name: File name (relative to bucket path) + :param expire: Expiration time (seconds) + :return: Presigned URL + """ + loop = asyncio.get_event_loop() + signed_url = await loop.run_in_executor( + None, + self.__bucket__.sign_url, + "GET", + f"{self.oss_path}{file_name}", + expire, + ) + return signed_url + + async def oss_upload_data_and_sign_async( + self, + data: bytes, + file_name: str, + expire: int = 3600 * 1 * 1, + ) -> str: + """ + Async version of uploading byte data to OSS + and returning signed URL + + Args: + data (bytes): File data to upload + file_name (str): File name + expire (int): Expiration time for signed URL + (seconds), default 1 hour. + Returns: + str: Signed URL + """ + # Upload data + object_name = f"__mPLUG__/uploads/{file_name}" + loop = asyncio.get_event_loop() + + await loop.run_in_executor( + None, + self.__bucket__.put_object, + object_name, + data, + ) + + # Generate signed URL + signed_url = await loop.run_in_executor( + None, + self.__bucket__.sign_url, + "GET", + object_name, + expire, + ) + return signed_url + + async def upload_local_and_sign_async( + self, + file: bytes, + file_name: str, + expire: int = 3600 * 1 * 1, + ) -> str: + """Async version of upload_local_and_sign method""" + remote_path = f"{self.oss_path}{file_name}" + loop = asyncio.get_event_loop() + + await loop.run_in_executor( + None, + self.__bucket__.put_object, + remote_path, + file, + ) + + signed_url = await loop.run_in_executor( + None, + self.__bucket__.sign_url, + "GET", + remote_path, + expire, + ) + return signed_url + + async def oss_upload_file_and_sign_async( + self, + filepath: str, + filename: str, + expire: int = 3600 * 1 * 1, + ) -> str: + """ + Async version of uploading local file to OSS and returning signed URL. + + Args: + filepath (str): Full path of local file. + filename (str): File name to upload to OSS. + expire (int): Expiration time for signed URL + (seconds), default 1 hour. + + Returns: + str: Signed download URL of the file. + """ + remote_path = f"{self.oss_path}{filename}" + loop = asyncio.get_event_loop() + + # Use aiofiles to read file asynchronously + async with aiofiles.open(filepath, "rb") as file_obj: + file_data = await file_obj.read() + + await loop.run_in_executor( + None, + self.__bucket__.put_object, + remote_path, + file_data, + ) + + signed_url = await loop.run_in_executor( + None, + self.__bucket__.sign_url, + "GET", + remote_path, + expire, + ) + return signed_url + + async def get_url_async( + self, + path: str, + expire: int = 3600 * 90 * 24, + ) -> str: + """Async version of get_url method""" + # Check if file exists + start_time = time.time() + loop = asyncio.get_event_loop() + + while time.time() - start_time < 20: + exists = await loop.run_in_executor( + None, + self.__bucket__.object_exists, + path, + ) + if exists: + break + print( + f"waiting for file to be uploaded, seconds:" + f" {time.time() - start_time}", + ) + await asyncio.sleep(1.5) + + exists = await loop.run_in_executor( + None, + self.__bucket__.object_exists, + path, + ) + if not exists: + raise OSSFileNotFoundError(f"{path} does not exist") + + signed_url = await loop.run_in_executor( + None, + self.__bucket__.sign_url, + "GET", + path, + expire, + ) + return signed_url diff --git a/src/agentscope_runtime/sandbox/box/cloud_api/utils/utils.py b/src/agentscope_runtime/sandbox/box/cloud_api/utils/utils.py new file mode 100644 index 000000000..d6806e1b8 --- /dev/null +++ b/src/agentscope_runtime/sandbox/box/cloud_api/utils/utils.py @@ -0,0 +1,146 @@ +# -*- coding: utf-8 -*- +import logging +import os +from typing import Optional +import base64 +from io import BytesIO +import requests +import aiohttp +from PIL import Image +from requests.exceptions import RequestException + +logger = logging.getLogger(__name__) + + +async def download_oss_image_and_save_return_base64( + oss_url: str, + local_save_path: str, +) -> Optional[str]: + """ + Download image from OSS presigned URL, save to local, + and return Base64 encoding + :param oss_url: str, Presigned URL of OSS image + :param local_save_path: str, Local save path + (including filename) + :return: str, Base64 encoded image data + """ + try: + # Download image + async with aiohttp.ClientSession() as session: + async with session.get(oss_url) as response: + if response.status != 200: + raise RequestException( + f"Download failed with status code {response.status}", + ) + + # Ensure directory exists + os.makedirs(os.path.dirname(local_save_path), exist_ok=True) + + # Save to local + content = await response.read() + with open(local_save_path, "wb") as f: + f.write(content) + print(f"Image saved to {local_save_path}") + + # Convert to Base64 + with open(local_save_path, "rb") as image_file: + encoded_str = base64.b64encode(image_file.read()).decode("utf-8") + + return f"data:image/png;base64,{encoded_str}" + + except Exception as e: + print(f"Error downloading or saving image: {e}") + return "" + + +async def get_image_size_from_url(image_url: str) -> tuple[int, int]: + async with aiohttp.ClientSession() as session: + async with session.get(image_url) as response: + response.raise_for_status() + content = await response.read() + image_data = BytesIO(content) + with Image.open(image_data) as img: + return img.size # Return (width, height) + + +async def download_oss_image_and_save_async( + oss_url: str, + local_save_path: str, +) -> str: + """ + Download image from OSS presigned URL, save to local, + and return Base64 encoding + :param oss_url: str, Presigned URL of OSS image + :param local_save_path: str, Local save path + (including filename) + :return: str, Base64 encoded image data + """ + try: + # Download image + async with aiohttp.ClientSession() as session: + async with session.get(oss_url) as response: + if response.status != 200: + raise RequestException( + f"Download failed with status code {response.status}", + ) + content = await response.read() + + # Ensure directory exists + os.makedirs(os.path.dirname(local_save_path), exist_ok=True) + + # Save to local + with open(local_save_path, "wb") as f: + f.write(content) + logger.info(f"Image saved to {local_save_path}") + + # Convert to Base64 + with open(local_save_path, "rb") as image_file: + encoded_str = base64.b64encode( + image_file.read(), + ).decode("utf-8") + + return f"data:image/png;base64,{encoded_str}" + + except Exception as e: + logger.error(f"Error downloading or saving image: {e}") + return "" + + +def download_oss_image_and_save( + oss_url: str, + local_save_path: str, +) -> str: + """ + Download image from OSS presigned URL, save to local, + and return Base64 + encoding (synchronous version) + :param oss_url: str, Presigned URL of OSS image + :param local_save_path: str, Local save path + (including filename) + :return: str, Base64 encoded image data + """ + try: + # Download image + response = requests.get(oss_url) + if response.status_code != 200: + raise RequestException( + f"Download failed with status code {response.status_code}", + ) + + # Ensure directory exists + os.makedirs(os.path.dirname(local_save_path), exist_ok=True) + + # Save to local + with open(local_save_path, "wb") as f: + f.write(response.content) + logger.info(f"Image saved to {local_save_path}") + + # Convert to Base64 + with open(local_save_path, "rb") as image_file: + encoded_str = base64.b64encode(image_file.read()).decode("utf-8") + + return f"data:image/png;base64,{encoded_str}" + + except Exception as e: + logger.error(f"Error downloading or saving image: {e}") + return "" diff --git a/src/agentscope_runtime/sandbox/box/e2b/__init__.py b/src/agentscope_runtime/sandbox/box/e2b/__init__.py new file mode 100644 index 000000000..68236562b --- /dev/null +++ b/src/agentscope_runtime/sandbox/box/e2b/__init__.py @@ -0,0 +1,4 @@ +# -*- coding: utf-8 -*- +from .e2b_sandbox import E2bSandBox + +__all__ = ["E2bSandBox"] diff --git a/src/agentscope_runtime/sandbox/box/e2b/e2b_sandbox.py b/src/agentscope_runtime/sandbox/box/e2b/e2b_sandbox.py new file mode 100644 index 000000000..999ffab9c --- /dev/null +++ b/src/agentscope_runtime/sandbox/box/e2b/e2b_sandbox.py @@ -0,0 +1,481 @@ +# -*- coding: utf-8 -*- +""" +E2BSandbox implementation for E2B cloud environment. + +This module provides a sandbox implementation that integrates with E2B, +a cloud-native sandbox environment service. +""" +import logging +from typing import Optional, Dict, Any +from e2b_desktop import Sandbox +from PIL import Image +from agentscope_runtime.sandbox.enums import ( + SandboxType, +) +from agentscope_runtime.sandbox.registry import SandboxRegistry +from agentscope_runtime.sandbox.box.cloud.cloud_sandbox import CloudSandbox +from .utils.grounding_utils import ( + perform_gui_grounding_with_api, +) + + +logger = logging.getLogger(__name__) + +execute_wait_time_: int = 5 + + +@SandboxRegistry.register( + "e2b-desktop", # Virtual image name indicating cloud service + sandbox_type=SandboxType.E2B, + security_level="high", + timeout=300, + description="E2B Desktop Sandbox Environment", +) +class E2bSandBox(CloudSandbox): + def __init__( + self, + *, + timeout: int = 600, + sandbox_type: SandboxType = SandboxType.E2B, + command_timeout: int = 60, + **kwargs, + ) -> None: + self.command_timeout = command_timeout + + super().__init__( + timeout=timeout, + sandbox_type=sandbox_type, + **kwargs, + ) + + # ------------------------------------------------------------------ + # CloudSandbox abstract implementations + # ------------------------------------------------------------------ + def _initialize_cloud_client(self): # type: ignore[override] + return "" + + def _create_cloud_sandbox(self, timeout: int = 600) -> Optional[str]: + try: + self.device = Sandbox.create(timeout=timeout) + self.device.stream.start() + logger.info( + f"E2B sandbox initialized with ID: {self.device.sandbox_id}", + ) + return self.device.sandbox_id + except Exception as error: # pylint: disable=broad-except + logger.error( + f"Error preparing cloud phone sandbox: {error}", + ) + return None + + def _delete_cloud_sandbox(self, sandbox_id: str = None) -> bool: + """Stop cloud phone instance (optional cleanup).""" + try: + # Note: We don't delete the instance, just stop it + # The instance can be reused later + print(f"Stopping sandbox {sandbox_id}...") + self.device.stream.stop() + return True + except Exception as error: # pylint: disable=broad-except + logger.error("Failed to stop instance %s", error) + return False + + def _call_cloud_tool( + self, + tool_name: str, + arguments: Dict[str, Any], + ) -> Any: + """ + Call a tool in the E2B environment. + + Args: + tool_name: Name of the tool to call + arguments: Arguments for the tool + + Returns: + Tool execution result + """ + try: + # Map tool names to E2B methods + tool_mapping = { + "run_shell_command": self._tool_run_command, + "screenshot": self._tool_screenshot, + "click": self._tool_click, + "right_click": self._tool_right_click, + "click_and_type": self._tool_click_and_type, + "type_text": self._tool_type_text, + "press_key": self._tool_press_key, + "launch_app": self._tool_launch_app, + } + + if tool_name in tool_mapping: + return tool_mapping[tool_name](arguments) + else: + logger.warning( + f"Tool {tool_name} not supported in E2B sandbox", + ) + return { + "success": False, + "error": f"Tool '{tool_name}' not supported", + "tool_name": tool_name, + } + + except Exception as e: + logger.error(f"Error calling tool {tool_name}: {e}") + return { + "success": False, + "error": str(e), + "tool_name": tool_name, + "arguments": arguments, + } + + def _get_cloud_provider_name(self) -> str: # type: ignore[override] + """Get the name of the cloud provider.""" + return "E2B DESKTOP" + + def list_tools(self, tool_type: Optional[str] = None) -> Dict[str, Any]: + """ + List available tools in the E2B sandbox. + + Args: + tool_type: Optional filter for tool type + + Returns: + Dictionary containing available tools + """ + # Define tool categories + desktop_tools = [ + "click", + "right_click", + "type_text", + "press_key", + "launch_app", + "click_and_type", + ] + command_tools = ["run_shell_command"] + system_tools = [ + "screenshot", + ] + # Organize tools by type + tools_by_type = { + "desktop": desktop_tools, + "command": command_tools, + "system": system_tools, + } + + # If tool_type is specified, return only that type + if tool_type: + tools = tools_by_type.get(tool_type, []) + return { + "tools": tools, + "tool_type": tool_type, + "sandbox_id": self.device.id, + "total_count": len(tools), + } + + # Return all tools organized by type + all_tools = [] + for tool_list in tools_by_type.values(): + all_tools.extend(tool_list) + + return { + "tools": all_tools, + "tools_by_type": tools_by_type, + "tool_type": tool_type, + "sandbox_id": self.device.id, + "total_count": len(all_tools), + } + + def _tool_run_command( + self, + arguments: Dict[str, Any], + ) -> Dict[str, Any]: + """Execute a shell command in E2B.""" + try: + command = arguments.get("command") + if not command: + return { + "success": False, + "error": "'command' argument is required", + } + + background = arguments.get("background") + timeout = arguments.get("timeout", self.command_timeout) + if background: + self.device.commands.run(command, background=True) + return { + "success": True, + "output": "The command has been started.", + } + else: + result = self.device.commands.run(command, timeout=timeout) + stdout, stderr = result.stdout, result.stderr + if stdout and stderr: + output = stdout + "\n" + stderr + elif stdout or stderr: + output = stdout + stderr + else: + output = "The command finished running." + + return { + "success": True, + "output": output, + } + except Exception as e: + return { + "success": False, + "error": str(e), + } + + def _tool_press_key(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + """ + Press a key or key combination. + + Args: + arguments: Dictionary containing 'key' or + 'key_combination' parameters + + Returns: + Execution result dictionary with success status and output or + error message + """ + try: + key = arguments.get("key") + key_combination = arguments.get("key_combination") + + if key and not key_combination: + self.device.press(key) + return { + "success": True, + "output": f"The key {key} has been pressed.", + } + elif key_combination and not key: + self.device.press(key_combination) + return { + "success": True, + "output": f"The key combination {key_combination} " + "has been pressed.", + } + else: + raise ValueError("Invalid key or key combination") + except Exception as e: + return { + "success": False, + "error": str(e), + } + + def _tool_type_text(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + """ + Type text in the sandbox environment. + + Args: + arguments: Dictionary containing 'text' parameter + + Returns: + Execution result dictionary with success status and + output or error message + """ + try: + text = arguments.get("text") + if not text: + return { + "success": False, + "error": "'text' argument is required", + } + + self.device.write( + text, + chunk_size=50, + delay_in_ms=12, + ) + return { + "success": True, + "output": "The text has been typed.", + } + except Exception as e: + return { + "success": False, + "error": str(e), + } + + def _tool_click(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + """ + Click at specific coordinates or based on visual query. + + Args: + arguments: Dictionary containing click parameters + + Returns: + Execution result dictionary with success status and + output or error message + """ + try: + x = arguments.get("x", 0) + y = arguments.get("y", 0) + count = arguments.get("count", 1) + query = arguments.get("query", "") + if isinstance(count, str): + count = int(count) + if query: + # Visual query-based clicking + img_bytes = self.device.screenshot() + position = perform_gui_grounding_with_api( + min_pixels=4096, + screenshot=img_bytes, + user_query=query, + ) + x, y = position + + self.device.move_mouse(x, y) + if count == 1: + self.device.left_click() + elif count == 2: + self.device.double_click() + else: + raise ValueError( + f"Invalid count: {count}, only support 1 or 2", + ) + + return { + "success": True, + "output": f"The mouse has clicked {count} times " + f"at ({x}, {y}).", + } + except Exception as e: + return { + "success": False, + "error": str(e), + } + + def _tool_right_click(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + """ + Right click at specific coordinates. + + Args: + arguments: Dictionary containing 'x' and 'y' parameters + + Returns: + Execution result dictionary with success status and + output or error message + """ + try: + x = arguments.get("x", 0) + y = arguments.get("y", 0) + + self.device.move_mouse(x, y) + self.device.right_click() + return { + "success": True, + "output": f"The mouse has right clicked at ({x}, {y}).", + } + except Exception as e: + return { + "success": False, + "error": str(e), + } + + def _tool_click_and_type( + self, + arguments: Dict[str, Any], + ) -> Dict[str, Any]: + """ + Click at coordinates and then type text. + + Args: + arguments: Dictionary containing 'x', 'y', and 'text' parameters + + Returns: + Execution result dictionary with success status + and output or error message + """ + try: + x = arguments.get("x", 0) + y = arguments.get("y", 0) + text = arguments.get("text", "") + + if not text: + return { + "success": False, + "error": "'text' argument is required", + } + + self.device.move_mouse(x, y) + self.device.left_click() + self.device.write(text) + return { + "success": True, + "output": "The mouse has clicked and typed " + f"the text at ({x}, {y}).", + } + except Exception as e: + return { + "success": False, + "error": str(e), + } + + def _tool_launch_app(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + """ + Launch an application. + + Args: + arguments: Dictionary containing 'app' parameter + + Returns: + Execution result dictionary with success status and + output or error message + """ + try: + app = arguments.get("app") + if not app: + return { + "success": False, + "error": "'app' argument is required", + } + + self.device.launch(app) + return { + "success": True, + "output": f"The application {app} has been launched.", + } + except Exception as e: + return { + "success": False, + "error": str(e), + } + + def _tool_screenshot(self, arguments: Dict[str, Any]) -> Dict[str, Any]: + """ + Take a screenshot and save it to a file. + + Args: + arguments: Dictionary containing 'file_path' parameter + + Returns: + Execution result dictionary with success status + and output or error message + """ + try: + file = self.device.screenshot() + file_path = arguments.get("file_path") + + # 检查 file_path 是否存在 + if not file_path: + return { + "success": False, + "error": "'file_path' argument is required", + } + + if isinstance(file, Image.Image): + file.save(file_path) + else: + with open(file_path, "wb") as f: + f.write(file) + return { + "success": True, + "output": file_path, + } + except Exception as e: + return { + "success": False, + "error": str(e), + } diff --git a/src/agentscope_runtime/sandbox/box/e2b/utils/__init__.py b/src/agentscope_runtime/sandbox/box/e2b/utils/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/src/agentscope_runtime/sandbox/box/e2b/utils/grounding_utils.py b/src/agentscope_runtime/sandbox/box/e2b/utils/grounding_utils.py new file mode 100644 index 000000000..180da4da3 --- /dev/null +++ b/src/agentscope_runtime/sandbox/box/e2b/utils/grounding_utils.py @@ -0,0 +1,205 @@ +# -*- coding: utf-8 -*- +import base64 +import json +import math +import re +import os +import io +from typing import Union +from openai import OpenAI +from PIL import Image, ImageDraw, ImageColor + + +def encode_image(img_bytes: Union[bytes, Image.Image]) -> str: + if isinstance(img_bytes, Image.Image): + img_bytes = img_bytes.tobytes() + return base64.b64encode(img_bytes).decode("utf-8") + + +def smart_resize( + height: int, + width: int, + factor: int = 28, + min_pixels: int = 56 * 56, + max_pixels: int = 14 * 14 * 4 * 1280, +) -> tuple[int, int]: + """Rescales the image so that the following conditions are met: + + 1. Both dimensions (height and width) are divisible by 'factor'. + + 2. The total number of pixels is within the range + ['min_pixels', 'max_pixels']. + + 3. The aspect ratio of the image is maintained as closely as possible. + + """ + if height < factor or width < factor: + raise ValueError( + f"height:{height} and width: {width} " + f"must be larger than factor:{factor}", + ) + if max(height, width) / min(height, width) > 200: + raise ValueError( + f"absolute aspect ratio must be smaller than 200, " + f"got {max(height, width) / min(height, width)}", + ) + h_bar = round(height / factor) * factor + w_bar = round(width / factor) * factor + if h_bar * w_bar > max_pixels: + beta = math.sqrt((height * width) / max_pixels) + h_bar = math.floor(height / beta / factor) * factor + w_bar = math.floor(width / beta / factor) * factor + elif h_bar * w_bar < min_pixels: + beta = math.sqrt(min_pixels / (height * width)) + h_bar = math.ceil(height * beta / factor) * factor + w_bar = math.ceil(width * beta / factor) * factor + return h_bar, w_bar + + +def draw_point( + image: Image.Image, + point: list, + color: str = "red", +) -> Image.Image: + try: + color_code = ImageColor.getrgb(color) + color_code = color_code + (128,) + except ValueError: + color_code = (255, 0, 0, 128) + + overlay = Image.new("RGBA", image.size, (255, 255, 255, 0)) + overlay_draw = ImageDraw.Draw(overlay) + radius = min(image.size) * 0.05 + x, y = point + + overlay_draw.ellipse( + [(x - radius, y - radius), (x + radius, y + radius)], + fill=color_code, + ) + + center_radius = radius * 0.1 + overlay_draw.ellipse( + [ + (x - center_radius, y - center_radius), + (x + center_radius, y + center_radius), + ], + fill=(0, 255, 0, 255), + ) + + image = image.convert("RGBA") + combined = Image.alpha_composite(image, overlay) + + return combined.convert("RGB") + + +def parse_json_blobs(text: str) -> dict: + """Extract json block from the LLM's output. + + If a valid json block is passed, it returns it directly. + """ + pattern = r"```(?:json)?\s*\n(.*?)\n```" + matches = re.findall(pattern, text, re.DOTALL) + if matches: + try: + return json.loads(matches[0].strip()) + except json.JSONDecodeError: + pass + # Maybe the LLM outputted a json blob directly + try: + return json.loads(text) + except json.JSONDecodeError: + return {} + + +def perform_gui_grounding_with_api( + screenshot: bytes, + user_query: str, + min_pixels: int = 3136, + max_pixels: int = 12845056, +) -> list: + """ + Perform GUI grounding to interpret user query. + + Args: + screenshot_path (str): Path to the screenshot image + user_query (str): User's query/instruction + min_pixels: Minimum pixels for the image + max_pixels: Maximum pixels for the image + + Returns: + tuple: (output_text, display_image) - Model's output + """ + print(f"Performing GUI grounding with API for user query: {user_query}") + # process image + base64_image = encode_image(screenshot) + input_image = Image.open(io.BytesIO(screenshot)) + with OpenAI( + api_key=os.getenv("DASHSCOPE_API_KEY"), + base_url="https://dashscope.aliyuncs.com/compatible-mode/v1", + ) as client: + resized_height, resized_width = smart_resize( + input_image.height, + input_image.width, + min_pixels=min_pixels, + max_pixels=max_pixels, + ) + + messages = [ + { + "role": "system", + "content": [ + { + "type": "text", + "text": ( + "You are a helpful assistant. " + "Locate the element that the user wants to click." + "Output its center coordinates using JSON format." + "Only output the JSON object, " + "no other text." + "Output format: {'coordinate': [x, y]}." + ), + }, + ], + }, + { + "role": "user", + "content": [ + { + "type": "image_url", + "min_pixels": min_pixels, + "max_pixels": max_pixels, + "image_url": { + "url": f"data:image/jpeg;base64,{base64_image}", + }, + }, + {"type": "text", "text": user_query}, + ], + }, + ] + # with open("messages.json", "w") as f: + # f.write(json.dumps(messages, indent=4)) + completion = client.chat.completions.create( + model="qwen-vl-max", + messages=messages, + stream=False, + ) + + print(f"completion: {completion}") + output_text = completion.choices[0].message.content + + # Parse action and visualize + action = parse_json_blobs(output_text.strip()) + print(f"action: {action}") + coordinate_normalized = action["coordinate"] + coordinate_absolute = [ + coordinate_normalized[0] / resized_width * input_image.width, + coordinate_normalized[1] / resized_height * input_image.height, + ] + return coordinate_absolute + + +if __name__ == "__main__": + img_path = "/Users/panrong/Downloads/screenshot.png" + img = Image.open(img_path) + print(type(img)) + print(encode_image(img)) diff --git a/src/agentscope_runtime/sandbox/enums.py b/src/agentscope_runtime/sandbox/enums.py index e2ea2a505..6db1248f1 100644 --- a/src/agentscope_runtime/sandbox/enums.py +++ b/src/agentscope_runtime/sandbox/enums.py @@ -70,3 +70,11 @@ class SandboxType(DynamicEnum): APPWORLD = "appworld" BFCL = "bfcl" AGENTBAY = "agentbay" + CLOUD_COMPUTER = "cloud_computer" + CLOUD_PHONE = "cloud_phone" + E2B = "e2b_desktop" + + +class OperationStatus(DynamicEnum): + DEVICE_UN_SUPPORTED_OPERATION = "Device did not supported this operation !" + DEVICE_UN_SUPPORTED = "Did not supported this device !" diff --git a/tests/sandbox/test_sandbox_cloud_api.py b/tests/sandbox/test_sandbox_cloud_api.py new file mode 100644 index 000000000..5d1ab160e --- /dev/null +++ b/tests/sandbox/test_sandbox_cloud_api.py @@ -0,0 +1,234 @@ +# -*- coding: utf-8 -*- +# pylint: disable=redefined-outer-name, protected-access, unused-argument +""" +Cloud API sandbox demo tests adapted to sandbox test style. +- Loads .env if present +- Skips gracefully when required environment variables are missing +""" +import os + +import pytest +from dotenv import load_dotenv +from agentscope_runtime.sandbox.enums import SandboxType +from agentscope_runtime.engine.services.sandbox import SandboxService + + +@pytest.fixture +def env(): + # Align with existing tests under tests/sandbox + if os.path.exists("../../.env"): + load_dotenv("../../.env") + + +def _has_cloud_api_dependencies() -> bool: + try: + __import__( + "agentscope_runtime.sandbox.box.cloud_api.cloud_computer_sandbox", + ) + __import__( + "agentscope_runtime.sandbox.box.cloud_api.cloud_phone_sandbox", + ) + return True + except ImportError: + return False + + +@pytest.mark.skipif( + not _has_cloud_api_dependencies() + or not os.getenv("DESKTOP_ID") + or not os.getenv("ECD_USERNAME") + or not os.getenv("ECD_APP_STREAM_REGION_ID") + or not os.getenv("ECD_ALIBABA_CLOUD_REGION_ID") + or not os.getenv("ECD_ALIBABA_CLOUD_ENDPOINT") + or not os.getenv("ECD_ALIBABA_CLOUD_ACCESS_KEY_ID") + or not os.getenv("ECD_ALIBABA_CLOUD_ACCESS_KEY_SECRET") + or not os.getenv("EDS_OSS_ACCESS_KEY_ID") + or not os.getenv("EDS_OSS_ACCESS_KEY_SECRET") + or not os.getenv("EDS_OSS_BUCKET_NAME") + or not os.getenv("EDS_OSS_ENDPOINT") + or not os.getenv("EDS_OSS_PATH"), + reason="Cloud Computer dependencies or required " + "environment variables not available", +) +def test_cloud_computer_sandbox_direct(env): # noqa: ARG001 + """Test CloudComputerSandbox directly with basic operations.""" + from agentscope_runtime.sandbox.box.cloud_api import ( + CloudComputerSandbox, + ) + + desktop_id = os.getenv("DESKTOP_ID") + + # Basic happy path: create sandbox and run minimal commands + with CloudComputerSandbox(desktop_id=desktop_id) as box: + # List tools + tools = box.list_tools() + print("CloudComputer tools:", tools) + + # Run a trivial shell command + res_cmd = box.call_tool( + "run_shell_command", + {"command": "echo 'Hello from Cloud Computer!'"}, + ) + print("run_shell_command:", res_cmd) + + # Screenshot + res_screenshot = box.call_tool( + "screenshot", + {"file_name": "test_screenshot.png"}, + ) + print("screenshot:", res_screenshot) + + # File operations + res_write = box.call_tool( + "write_file", + { + "file_path": "C:/cloud_test.txt", + "content": "Hello from Cloud Computer sandbox!", + }, + ) + print("write_file:", res_write) + + res_read = box.call_tool( + "read_file", + {"file_path": "C:/cloud_test.txt"}, + ) + print("read_file:", res_read) + + # UI operations + res_home = box.call_tool("go_home", {}) + print("go_home:", res_home) + + +@pytest.mark.skipif( + not _has_cloud_api_dependencies() + or not os.getenv("PHONE_INSTANCE_ID") + or not os.getenv("EDS_ALIBABA_CLOUD_ENDPOINT") + or not os.getenv("EDS_ALIBABA_CLOUD_ACCESS_KEY_ID") + or not os.getenv("EDS_ALIBABA_CLOUD_ACCESS_KEY_SECRET") + or not os.getenv("EDS_OSS_ACCESS_KEY_ID") + or not os.getenv("EDS_OSS_ACCESS_KEY_SECRET") + or not os.getenv("EDS_OSS_BUCKET_NAME") + or not os.getenv("EDS_OSS_ENDPOINT") + or not os.getenv("EDS_OSS_PATH"), + reason="Cloud Phone dependencies or required " + "environment variables not available", +) +def test_cloud_phone_sandbox_direct(env): # noqa: ARG001 + """Test CloudPhoneSandbox directly with basic operations.""" + from agentscope_runtime.sandbox.box.cloud_api import ( + CloudPhoneSandbox, + ) + + instance_id = os.getenv("PHONE_INSTANCE_ID") + + with CloudPhoneSandbox(instance_id=instance_id) as box: + # List tools + tools = box.list_tools() + print("CloudPhone tools:", tools) + + # Run a trivial shell command + res_cmd = box.call_tool( + "run_shell_command", + {"command": "echo 'Hello from Cloud Phone!'"}, + ) + print("run_shell_command:", res_cmd) + + # Screenshot + res_screenshot = box.call_tool( + "screenshot", + {"file_name": "phone_screenshot.png"}, + ) + print("screenshot:", res_screenshot) + + # Navigation operations + res_home = box.call_tool("go_home", {}) + print("go_home:", res_home) + + +@pytest.mark.asyncio +@pytest.mark.skipif( + not _has_cloud_api_dependencies() + or not os.getenv("DESKTOP_ID") + or not os.getenv("ECD_USERNAME") + or not os.getenv("ECD_APP_STREAM_REGION_ID") + or not os.getenv("ECD_ALIBABA_CLOUD_REGION_ID") + or not os.getenv("ECD_ALIBABA_CLOUD_ENDPOINT") + or not os.getenv("ECD_ALIBABA_CLOUD_ACCESS_KEY_ID") + or not os.getenv("ECD_ALIBABA_CLOUD_ACCESS_KEY_SECRET") + or not os.getenv("EDS_OSS_ACCESS_KEY_ID") + or not os.getenv("EDS_OSS_ACCESS_KEY_SECRET") + or not os.getenv("EDS_OSS_BUCKET_NAME") + or not os.getenv("EDS_OSS_ENDPOINT") + or not os.getenv("EDS_OSS_PATH") + or not os.getenv("DOCKER_HOST"), + reason="Cloud Computer dependencies or required environment" + " variables not available", +) +async def test_cloud_computer_sandbox_via_service(env): # noqa: ARG001 + """Create CloudComputerSandbox via SandboxService + and run a tiny smoke test.""" + async with SandboxService() as service: + sandboxes = service.connect( + session_id="cloud_computer_demo_session", + user_id="cloud_computer_demo_user", + sandbox_types=[SandboxType.CLOUD_COMPUTER.value], + ) + assert sandboxes and len(sandboxes) > 0 + box = sandboxes[0] + + print("CloudComputer list_tools:", box.list_tools()) + + res_cmd = box.call_tool( + "run_shell_command", + {"command": "echo 'Cloud Computer Service path OK'"}, + ) + print("CloudComputer run_shell_command:", res_cmd) + + res_screenshot = box.call_tool( + "screenshot", + {"file_name": "service_screenshot.png"}, + ) + print("CloudComputer screenshot:", res_screenshot) + + +@pytest.mark.asyncio +@pytest.mark.skipif( + not _has_cloud_api_dependencies() + or not os.getenv("PHONE_INSTANCE_ID") + or not os.getenv("EDS_ALIBABA_CLOUD_ENDPOINT") + or not os.getenv("EDS_ALIBABA_CLOUD_ACCESS_KEY_ID") + or not os.getenv("EDS_ALIBABA_CLOUD_ACCESS_KEY_SECRET") + or not os.getenv("EDS_OSS_ACCESS_KEY_ID") + or not os.getenv("EDS_OSS_ACCESS_KEY_SECRET") + or not os.getenv("EDS_OSS_BUCKET_NAME") + or not os.getenv("EDS_OSS_ENDPOINT") + or not os.getenv("EDS_OSS_PATH") + or not os.getenv("DOCKER_HOST"), + reason="Cloud Phone dependencies or required environment" + " variables not available", +) +async def test_cloud_phone_sandbox_via_service(env): # noqa: ARG001 + """Create CloudPhoneSandbox via SandboxService and + run a tiny smoke test.""" + async with SandboxService() as service: + sandboxes = service.connect( + session_id="cloud_phone_demo_session", + user_id="cloud_phone_demo_user", + sandbox_types=[SandboxType.CLOUD_PHONE.value], + ) + assert sandboxes and len(sandboxes) > 0 + box = sandboxes[0] + + print("CloudPhone list_tools:", box.list_tools()) + + res_cmd = box.call_tool( + "run_shell_command", + {"command": "echo 'Cloud Phone Service path OK'"}, + ) + print("CloudPhone run_shell_command:", res_cmd) + + res_screenshot = box.call_tool( + "screenshot", + {"file_name": "phone_service_screenshot.png"}, + ) + print("CloudPhone screenshot:", res_screenshot) diff --git a/tests/sandbox/test_sandbox_e2b.py b/tests/sandbox/test_sandbox_e2b.py new file mode 100644 index 000000000..7eb7764dd --- /dev/null +++ b/tests/sandbox/test_sandbox_e2b.py @@ -0,0 +1,97 @@ +# -*- coding: utf-8 -*- +# pylint: disable=redefined-outer-name, protected-access, unused-argument +""" +E2B sandbox demo tests adapted to sandbox test style. +- Loads .env if present +- Skips gracefully when SDK or API key is missing +""" +import os + +import pytest +from dotenv import load_dotenv + +from agentscope_runtime.sandbox.box.e2b import ( + E2bSandBox, +) +from agentscope_runtime.sandbox.enums import SandboxType +from agentscope_runtime.engine.services.sandbox import SandboxService + + +@pytest.fixture +def env(): + # Align with existing tests under tests/sandbox + if os.path.exists("../../.env"): + load_dotenv("../../.env") + + +def _has_e2b_sdk() -> bool: + try: + import e2b # noqa: F401 # pylint: disable=unused-import + + return True + except Exception: + return False + + +@pytest.mark.skipif( + not _has_e2b_sdk() or not os.getenv("E2B_API_KEY"), + reason="E2B SDK or E2B_API_KEY not available", +) +def test_e2b_sandbox_direct(env): # noqa: ARG001 + """Test E2BSandbox directly with basic operations.""" + + # Basic happy path: create sandbox and run minimal commands + with E2bSandBox() as box: + # List tools + tools = box.list_tools() + print("E2B tools:", tools) + + # Run a trivial shell command + res_cmd = box.call_tool( + "run_shell_command", + {"command": "echo 'Hello from E2B!'"}, + ) + print("run_shell_command:", res_cmd) + + # screenshot + res_screenshot = box.call_tool( + "screenshot", + {"file_path": f"{os.getcwd()}/screenshot.png"}, + ) + print("screenshot:", res_screenshot) + + +@pytest.mark.asyncio +@pytest.mark.skipif( + not _has_e2b_sdk() + or not os.getenv("E2B_API_KEY") + or not os.getenv("DOCKER_HOST"), + reason="E2B SDK or E2B_API_KEY or DOCKER_HOST not available", +) +async def test_e2b_sandbox_via_service(env): # noqa: ARG001 + """Create E2B sandbox via SandboxService and run a tiny smoke test.""" + service = SandboxService() + + async with service: + sandboxes = service.connect( + session_id="e2b_demo_session", + user_id="e2b_demo_user", + sandbox_types=[SandboxType.E2B.value], + ) + assert sandboxes and len(sandboxes) > 0 + box = sandboxes[0] + + print("E2B list_tools:", box.list_tools()) + + res_cmd = box.call_tool( + "run_shell_command", + {"command": "echo 'Service path OK'"}, + ) + print("E2B run_shell_command:", res_cmd) + + # screenshot + res_screenshot = box.call_tool( + "screenshot", + {"file_path": f"{os.getcwd()}/screenshot.png"}, + ) + print("screenshot:", res_screenshot) diff --git a/tests/unit/test_cloud_computer_api_sandbox.py b/tests/unit/test_cloud_computer_api_sandbox.py new file mode 100644 index 000000000..65551ff99 --- /dev/null +++ b/tests/unit/test_cloud_computer_api_sandbox.py @@ -0,0 +1,548 @@ +# -*- coding: utf-8 -*- +# pylint: disable=redefined-outer-name, protected-access, unused-argument +# pylint: disable=too-many-public-methods +""" +Unit tests for CloudComputerSandbox implementation. +""" + +import os +from unittest.mock import MagicMock, patch +import pytest + +from agentscope_runtime.sandbox.box.cloud_api import ( + CloudComputerSandbox, +) +from agentscope_runtime.sandbox.enums import SandboxType + + +@pytest.fixture +def mock_instance_manager(): + """Create a mock instance manager.""" + manager = MagicMock() + + # Mock methods that will be called during tests + manager.ecd_client = MagicMock() + manager.ecd_client.search_desktop_info.return_value = [ + MagicMock(desktop_status="running"), + ] + manager.ecd_client.start_desktops.return_value = 200 + manager.ecd_client.wakeup_desktops.return_value = 200 + manager.ecd_client.hibernate_desktops.return_value = 200 + + manager.refresh_aurh_code = MagicMock() + manager.run_command_power_shell.return_value = (True, "command output") + manager.run_code.return_value = (True, "code execution result") + manager.press_key.return_value = (True, "key pressed") + manager.tap.return_value = (True, "clicked") + manager.right_tap.return_value = (True, "right clicked") + manager.tap_type_enter.return_value = (True, "typed and entered") + manager.append.return_value = (True, "text appended") + manager.open_app.return_value = (True, "app launched") + manager.home.return_value = (True, "went home") + manager.mouse_move.return_value = (True, "mouse moved") + manager.scroll.return_value = (True, "scrolled") + manager.scroll_pos.return_value = (True, "scrolled at position") + manager.write_file.return_value = (True, "file written") + manager.read_file.return_value = (True, "file content") + manager.remove_file.return_value = (True, "file removed") + manager.get_screenshot_oss_url.return_value = ( + "http://screenshot.url/image.png" + ) + + return manager + + +@pytest.fixture +def mock_client_pool(mock_instance_manager): + """Create a mock client pool.""" + pool = MagicMock() + pool.get_instance_manager.return_value = mock_instance_manager + pool.get_oss_client.return_value = MagicMock() + return pool + + +@pytest.fixture +def cloud_computer_sandbox(mock_client_pool, mock_instance_manager): + """Create a CloudComputerSandbox instance with mocked dependencies.""" + with patch( + "agentscope_runtime.sandbox.box.cloud_api." + "cloud_computer_sandbox.ClientPool", + ) as mock_client_pool_class: + mock_client_pool_class.return_value = mock_client_pool + + with patch.dict(os.environ, {"DESKTOP_ID": "test-desktop-id"}): + # Mock _create_cloud_sandbox to avoid actual API + # calls during initialization + with patch.object( + CloudComputerSandbox, + "_create_cloud_sandbox", + return_value="test-desktop-id", + ): + sandbox = CloudComputerSandbox() + sandbox.instance_manager = mock_instance_manager + return sandbox + + +class TestCloudComputerSandbox: + """Test cases for CloudComputerSandbox class.""" + + def test_init_with_desktop_id_from_env( + self, + mock_client_pool, + mock_instance_manager, + ): + """Test initialization with desktop_id from environment variable.""" + with patch( + "agentscope_runtime.sandbox.box.cloud_api." + "cloud_computer_sandbox.ClientPool", + ) as mock_client_pool_class: + mock_client_pool_class.return_value = mock_client_pool + + with patch.dict(os.environ, {"DESKTOP_ID": "env-desktop-id"}): + # Mock _create_cloud_sandbox to avoid actual API calls + with patch.object( + CloudComputerSandbox, + "_create_cloud_sandbox", + return_value="env-desktop-id", + ): + sandbox = CloudComputerSandbox() + assert sandbox.desktop_id == "env-desktop-id" + assert sandbox.sandbox_type == SandboxType.CLOUD_COMPUTER + assert sandbox.auto_wakeup is True + + def test_init_with_explicit_desktop_id( + self, + mock_client_pool, + mock_instance_manager, + ): + """Test initialization with explicit desktop_id.""" + with patch( + "agentscope_runtime.sandbox.box.cloud_api." + "cloud_computer_sandbox.ClientPool", + ) as mock_client_pool_class: + mock_client_pool_class.return_value = mock_client_pool + + # Mock _create_cloud_sandbox to avoid actual API calls + with patch.object( + CloudComputerSandbox, + "_create_cloud_sandbox", + return_value="explicit-desktop-id", + ): + sandbox = CloudComputerSandbox( + desktop_id="explicit-desktop-id", + ) + assert sandbox.desktop_id == "explicit-desktop-id" + + def test_init_without_desktop_id_raises_error(self): + """Test initialization without desktop_id raises error.""" + with patch.dict(os.environ, {}, clear=True): + with pytest.raises(ValueError, match="desktop_id is required"): + CloudComputerSandbox() + + def test_screenshot_dir_creation( + self, + tmpdir, + mock_client_pool, + mock_instance_manager, + ): + """Test screenshot directory creation.""" + screenshot_dir = tmpdir.join("screenshots").strpath + with patch( + "agentscope_runtime.sandbox.box.cloud_api." + "cloud_computer_sandbox.ClientPool", + ) as mock_client_pool_class: + mock_client_pool_class.return_value = mock_client_pool + + # Mock _create_cloud_sandbox to avoid actual API calls + with patch.object( + CloudComputerSandbox, + "_create_cloud_sandbox", + return_value="test-desktop", + ): + CloudComputerSandbox( + desktop_id="test-desktop", + screenshot_dir=screenshot_dir, + ) + assert os.path.exists(screenshot_dir) + + def test_initialize_cloud_client_success(self, mock_client_pool): + """Test successful cloud client initialization.""" + with patch( + "agentscope_runtime.sandbox.box.cloud_api." + "cloud_computer_sandbox.ClientPool", + ) as mock_client_pool_class: + mock_client_pool_class.return_value = mock_client_pool + + # Mock _create_cloud_sandbox to avoid actual API + # calls during initialization + # This prevents _initialize_cloud_client from being + # called during __init__ + with patch.object( + CloudComputerSandbox, + "_create_cloud_sandbox", + return_value="test-desktop", + ): + sandbox = CloudComputerSandbox(desktop_id="test-desktop") + + # Reset mock to only count the explicit call in the test + mock_client_pool.get_instance_manager.reset_mock() + mock_client_pool.get_oss_client.reset_mock() + + instance_manager = sandbox._initialize_cloud_client() + + assert instance_manager is not None + mock_client_pool.get_instance_manager.assert_called_once_with( + "test-desktop", + ) + mock_client_pool.get_oss_client.assert_called_once() + + def test_create_cloud_sandbox_success( + self, + cloud_computer_sandbox, + mock_instance_manager, + ): + """Test successful cloud sandbox creation.""" + # Reset the call count since refresh_aurh_code may have + # been called during fixture initialization + mock_instance_manager.refresh_aurh_code.reset_mock() + + cloud_computer_sandbox.auto_wakeup = True + sandbox_id = cloud_computer_sandbox._create_cloud_sandbox() + + assert sandbox_id == "test-desktop-id" + mock_instance_manager.refresh_aurh_code.assert_called_once() + + def test_create_cloud_sandbox_with_wakeup_failure( + self, + cloud_computer_sandbox, + mock_instance_manager, + ): + """Test cloud sandbox creation when wakeup fails.""" + # Reset the call count since refresh_aurh_code may + # have been + # called during fixture initialization + mock_instance_manager.refresh_aurh_code.reset_mock() + + cloud_computer_sandbox.auto_wakeup = True + # Mock _wait_for_pc_ready to raise exception to + # simulate wakeup failure + # This avoids the long wait loop in _wait_for_pc_ready + with patch.object( + cloud_computer_sandbox, + "_wait_for_pc_ready", + side_effect=Exception("Connection error"), + ): + # Should still succeed even if wakeup fails + sandbox_id = cloud_computer_sandbox._create_cloud_sandbox() + + assert sandbox_id == "test-desktop-id" + mock_instance_manager.refresh_aurh_code.assert_called_once() + + def test_delete_cloud_sandbox_success( + self, + cloud_computer_sandbox, + mock_instance_manager, + ): + """Test successful cloud sandbox deletion.""" + result = cloud_computer_sandbox._delete_cloud_sandbox( + "test-desktop-id", + ) + + assert result is True + m_ec = mock_instance_manager.ecd_client + m_ec.hibernate_desktops.assert_called_once_with( + ["test-desktop-id"], + ) + + def test_delete_cloud_sandbox_failure( + self, + cloud_computer_sandbox, + mock_instance_manager, + ): + """Test cloud sandbox deletion failure.""" + m_ec = mock_instance_manager.ecd_client + m_ec.hibernate_desktops.side_effect = Exception("Network error") + + result = cloud_computer_sandbox._delete_cloud_sandbox( + "test-desktop-id", + ) + + assert result is False + + def test_call_cloud_tool_supported_tool( + self, + cloud_computer_sandbox, + mock_instance_manager, + ): + """Test calling a supported tool.""" + result = cloud_computer_sandbox._call_cloud_tool( + "run_shell_command", + {"command": "ls -la"}, + ) + + assert result["success"] is True + assert result["output"] == "command output" + mock_instance_manager.run_command_power_shell.assert_called_once_with( + "ls -la", + None, + 60, + ) + + def test_call_cloud_tool_unsupported_tool(self, cloud_computer_sandbox): + """Test calling an unsupported tool.""" + result = cloud_computer_sandbox._call_cloud_tool( + "unsupported_tool", + {}, + ) + + assert result["success"] is False + assert "not supported" in result["error"] + + def test_call_cloud_tool_execution_exception( + self, + cloud_computer_sandbox, + mock_instance_manager, + ): + """Test calling a tool that throws exception.""" + mock_instance_manager.run_command_power_shell.side_effect = Exception( + "Command failed", + ) + + result = cloud_computer_sandbox._call_cloud_tool( + "run_shell_command", + {"command": "ls -la"}, + ) + + assert result["success"] is False + assert "Command failed" in result["error"] + + def test_wait_for_pc_ready_success( + self, + cloud_computer_sandbox, + mock_instance_manager, + ): + """Test waiting for PC ready successfully.""" + mock_instance_manager.ecd_client.search_desktop_info.return_value = [ + MagicMock(desktop_status="running"), + ] + + result = cloud_computer_sandbox._wait_for_pc_ready( + "test-desktop-id", + max_wait_time=5, + ) + + assert result is True + + def test_handle_desktop_status_stopped( + self, + cloud_computer_sandbox, + mock_instance_manager, + ): + """Test handling stopped desktop status.""" + cloud_computer_sandbox._handle_desktop_status( + "test-desktop-id", + "stopped", + ) + m_ec = mock_instance_manager.ecd_client + m_ec.start_desktops.assert_called_once_with( + ["test-desktop-id"], + ) + + def test_handle_desktop_status_hibernated( + self, + cloud_computer_sandbox, + mock_instance_manager, + ): + """Test handling hibernated desktop status.""" + cloud_computer_sandbox._handle_desktop_status( + "test-desktop-id", + "hibernated", + ) + m_ec = mock_instance_manager.ecd_client + m_ec.wakeup_desktops.assert_called_once_with( + ["test-desktop-id"], + ) + + def test_tool_handlers( + self, + cloud_computer_sandbox, + mock_instance_manager, + ): + """Test various tool handlers.""" + # Test run_shell_command + result = cloud_computer_sandbox._tool_run_shell_command( + {"command": "pwd"}, + ) + assert result["success"] is True + assert result["output"] == "command output" + + # Test execute_code + result = cloud_computer_sandbox._tool_execute_code( + {"code": "print('hello')"}, + ) + assert result["success"] is True + assert result["output"] == "code execution result" + + # Test press_key + result = cloud_computer_sandbox._tool_press_key({"key": "enter"}) + assert result["success"] is True + assert result["output"] == "key pressed" + + # Test click + result = cloud_computer_sandbox._tool_click({"x": 100, "y": 200}) + assert result["success"] is True + assert result["output"] == "clicked" + + # Test right_click + result = cloud_computer_sandbox._tool_right_click({"x": 100, "y": 200}) + assert result["success"] is True + assert result["output"] == "right clicked" + + # Test click_and_type + result = cloud_computer_sandbox._tool_click_and_type( + {"x": 100, "y": 200, "text": "hello"}, + ) + assert result["success"] is True + assert result["output"] == "typed and entered" + + # Test append_text + result = cloud_computer_sandbox._tool_append_text( + {"x": 100, "y": 200, "text": "world"}, + ) + assert result["success"] is True + assert result["output"] == "text appended" + + # Test launch_app + result = cloud_computer_sandbox._tool_launch_app({"app": "notepad"}) + assert result["success"] is True + assert result["output"] == "app launched" + + # Test go_home + result = cloud_computer_sandbox._tool_go_home({}) + assert result["success"] is True + assert result["output"] == "went home" + + # Test mouse_move + result = cloud_computer_sandbox._tool_mouse_move({"x": 100, "y": 200}) + assert result["success"] is True + assert result["output"] == "mouse moved" + + # Test scroll + result = cloud_computer_sandbox._tool_scroll({"pixels": 100}) + assert result["success"] is True + assert result["output"] == "scrolled" + + # Test scroll_pos + result = cloud_computer_sandbox._tool_scroll_pos( + {"x": 100, "y": 200, "pixels": 100}, + ) + assert result["success"] is True + assert result["output"] == "scrolled at position" + + def test_file_operation_tools( + self, + cloud_computer_sandbox, + mock_instance_manager, + ): + """Test file operation tools.""" + # Test write_file + result = cloud_computer_sandbox._tool_write_file( + { + "file_path": "/test/file.txt", + "content": "test content", + }, + ) + assert result["success"] is True + assert result["file_path"] == "/test/file.txt" + + # Test read_file + result = cloud_computer_sandbox._tool_read_file( + { + "file_path": "/test/file.txt", + }, + ) + assert result["success"] is True + assert result["output"] == "file content" + assert result["file_path"] == "/test/file.txt" + + # Test remove_file + result = cloud_computer_sandbox._tool_remove_file( + { + "file_path": "/test/file.txt", + }, + ) + assert result["success"] is True + assert result["file_path"] == "/test/file.txt" + + def test_screenshot_tool( + self, + cloud_computer_sandbox, + mock_instance_manager, + ): + """Test screenshot tool.""" + with patch("uuid.uuid4") as mock_uuid: + mock_uuid.return_value.hex = "test-uuid" + + result = cloud_computer_sandbox._tool_screenshot( + { + "file_name": "test-screenshot", + }, + ) + + assert result["success"] is True + assert result["output"] == "http://screenshot.url/image.png" + + def test_list_tools_all_types(self, cloud_computer_sandbox): + """Test listing all tools.""" + result = cloud_computer_sandbox.list_tools() + + assert "tools" in result + assert "tools_by_type" in result + assert result["total_count"] > 0 + + # Check that we have tools of different types + assert "run_shell_command" in result["tools"] # command tool + assert "click" in result["tools"] # input tool + assert "screenshot" in result["tools"] # system tool + + def test_list_tools_by_type(self, cloud_computer_sandbox): + """Test listing tools by specific type.""" + # Test command tools + result = cloud_computer_sandbox.list_tools("command") + assert result["tool_type"] == "command" + assert "run_shell_command" in result["tools"] + assert "run_ipython_cell" in result["tools"] + + # Test input tools + result = cloud_computer_sandbox.list_tools("input") + assert result["tool_type"] == "input" + assert "click" in result["tools"] + assert "press_key" in result["tools"] + + # Test system tools + result = cloud_computer_sandbox.list_tools("system") + assert result["tool_type"] == "system" + assert "screenshot" in result["tools"] + assert "go_home" in result["tools"] + + def test_get_screenshot_methods( + self, + cloud_computer_sandbox, + mock_instance_manager, + ): + """Test screenshot methods.""" + # Test get_screenshot_oss_save_local success + result = cloud_computer_sandbox.get_screenshot_oss_save_local( + "test-file", + "/tmp/test.png", + ) + assert result == "http://screenshot.url/image.png" + + # Test get_screenshot_oss_save_local failure + mock_instance_manager.get_screenshot_oss_url.return_value = None + result = cloud_computer_sandbox.get_screenshot_oss_save_local( + "test-file", + "/tmp/test.png", + max_retry=2, + ) + assert result == "Error" diff --git a/tests/unit/test_cloud_phone_api_sandbox.py b/tests/unit/test_cloud_phone_api_sandbox.py new file mode 100644 index 000000000..3f1d56145 --- /dev/null +++ b/tests/unit/test_cloud_phone_api_sandbox.py @@ -0,0 +1,468 @@ +# -*- coding: utf-8 -*- +# pylint: disable=redefined-outer-name, protected-access, unused-argument +# pylint: disable=too-many-public-methods +""" +Unit tests for CloudPhoneSandbox implementation. +""" + +import os +from unittest.mock import MagicMock, patch +import pytest + +from agentscope_runtime.sandbox.box.cloud_api import ( + CloudPhoneSandbox, +) +from agentscope_runtime.sandbox.enums import SandboxType + + +@pytest.fixture +def mock_instance_manager(): + """Create a mock instance manager.""" + manager = MagicMock() + + # Mock EDS client + manager.eds_client = MagicMock() + manager.eds_client.list_instance.return_value = ( + 1, + None, + [MagicMock(android_instance_status="running")], + ) + manager.eds_client.start_equipment.return_value = 200 + manager.eds_client.stop_equipment.return_value = 200 + + # Mock instance manager methods + manager.refresh_ticket = MagicMock() + manager.run_command = MagicMock(return_value=(True, "command output")) + manager.tab = MagicMock(return_value=(True, "clicked")) + manager.type = MagicMock(return_value="text typed") + manager.slide = MagicMock(return_value=(True, "slid")) + manager.home = MagicMock(return_value=(True, "went home")) + manager.back = MagicMock(return_value=(True, "pressed back")) + manager.menu = MagicMock(return_value=(True, "pressed menu")) + manager.enter = MagicMock(return_value=(True, "pressed enter")) + manager.kill_the_front_app = MagicMock(return_value=(True, "killed app")) + manager.get_screenshot_sdk = MagicMock( + return_value="http://screenshot.url/image.png", + ) + manager.send_file = MagicMock(return_value=200) + manager.remove_file = MagicMock(return_value=(True, "file removed")) + + return manager + + +@pytest.fixture +def mock_client_pool(mock_instance_manager): + """Create a mock client pool.""" + pool = MagicMock() + pool.get_instance_manager.return_value = mock_instance_manager + pool.get_eds_client.return_value = mock_instance_manager.eds_client + pool.get_oss_client.return_value = MagicMock() + return pool + + +@pytest.fixture +def cloud_phone_sandbox(mock_client_pool, mock_instance_manager): + """Create a CloudPhoneSandbox instance with mocked dependencies.""" + with patch( + "agentscope_runtime.sandbox.box.cloud_api" + ".cloud_phone_sandbox.ClientPool", + ) as mock_client_pool_class: + mock_client_pool_class.return_value = mock_client_pool + + with patch.dict( + os.environ, + {"PHONE_INSTANCE_ID": "test-instance-id"}, + ): + # Mock _create_cloud_sandbox to avoid actual API + # calls during initialization + with patch.object( + CloudPhoneSandbox, + "_create_cloud_sandbox", + return_value="test-instance-id", + ): + sandbox = CloudPhoneSandbox() + sandbox.instance_manager = mock_instance_manager + sandbox.eds_client = mock_instance_manager.eds_client + return sandbox + + +class TestCloudPhoneSandbox: + """Test cases for CloudPhoneSandbox class.""" + + def test_init_with_instance_id_from_env( + self, + mock_client_pool, + mock_instance_manager, + ): + """Test initialization with instance_id from + environment variable.""" + with patch( + "agentscope_runtime.sandbox.box.cloud_api." + "cloud_phone_sandbox.ClientPool", + ) as mock_client_pool_class: + mock_client_pool_class.return_value = mock_client_pool + + with patch.dict( + os.environ, + {"PHONE_INSTANCE_ID": "env-instance-id"}, + ): + # Mock _create_cloud_sandbox to avoid actual API calls + with patch.object( + CloudPhoneSandbox, + "_create_cloud_sandbox", + return_value="env-instance-id", + ): + sandbox = CloudPhoneSandbox() + assert sandbox.instance_id == "env-instance-id" + assert sandbox.sandbox_type == SandboxType.CLOUD_PHONE + assert sandbox.auto_start is True + + def test_init_with_explicit_instance_id( + self, + mock_client_pool, + mock_instance_manager, + ): + """Test initialization with explicit instance_id.""" + with patch( + "agentscope_runtime.sandbox.box.cloud_api" + ".cloud_phone_sandbox.ClientPool", + ) as mock_client_pool_class: + mock_client_pool_class.return_value = mock_client_pool + + # Mock _create_cloud_sandbox to avoid actual API calls + with patch.object( + CloudPhoneSandbox, + "_create_cloud_sandbox", + return_value="explicit-instance-id", + ): + sandbox = CloudPhoneSandbox(instance_id="explicit-instance-id") + assert sandbox.instance_id == "explicit-instance-id" + + def test_init_without_instance_id_raises_error(self): + """Test initialization without instance_id raises error.""" + with patch.dict(os.environ, {}, clear=True): + with pytest.raises(ValueError, match="instance_id is required"): + CloudPhoneSandbox() + + def test_initialize_cloud_client_success(self, mock_client_pool): + """Test successful cloud client initialization.""" + with patch( + "agentscope_runtime.sandbox.box.cloud_api." + "cloud_phone_sandbox.ClientPool", + ) as mock_client_pool_class: + mock_client_pool_class.return_value = mock_client_pool + + # Mock _create_cloud_sandbox to avoid actual + # API calls during initialization + # This prevents _initialize_cloud_client from + # being called during __init__ + with patch.object( + CloudPhoneSandbox, + "_create_cloud_sandbox", + return_value="test-instance", + ): + sandbox = CloudPhoneSandbox(instance_id="test-instance") + + # Reset mock to only count the explicit call in the test + mock_client_pool.get_instance_manager.reset_mock() + mock_client_pool.get_eds_client.reset_mock() + mock_client_pool.get_oss_client.reset_mock() + + eds_client = sandbox._initialize_cloud_client() + + assert eds_client is not None + mock_client_pool.get_instance_manager.assert_called_once_with( + "test-instance", + ) + mock_client_pool.get_eds_client.assert_called_once() + mock_client_pool.get_oss_client.assert_called_once() + + def test_create_cloud_sandbox_success( + self, + cloud_phone_sandbox, + mock_instance_manager, + ): + """Test successful cloud sandbox creation.""" + # Reset the call count since refresh_ticket may have been + # called during fixture initialization + mock_instance_manager.refresh_ticket.reset_mock() + + cloud_phone_sandbox.auto_start = True + sandbox_id = cloud_phone_sandbox._create_cloud_sandbox() + + assert sandbox_id == "test-instance-id" + mock_instance_manager.refresh_ticket.assert_called_once() + + def test_create_cloud_sandbox_with_start_failure( + self, + cloud_phone_sandbox, + mock_instance_manager, + ): + """Test cloud sandbox creation when start fails.""" + # Reset the call count since refresh_ticket may have been + # called during fixture initialization + mock_instance_manager.refresh_ticket.reset_mock() + + cloud_phone_sandbox.auto_start = True + # Mock _wait_for_phone_ready to raise exception to simulate + # start failure + # This avoids the long wait loop in _wait_for_phone_ready + with patch.object( + cloud_phone_sandbox, + "_wait_for_phone_ready", + side_effect=Exception("Connection error"), + ): + # Should still succeed even if start fails + sandbox_id = cloud_phone_sandbox._create_cloud_sandbox() + + assert sandbox_id == "test-instance-id" + mock_instance_manager.refresh_ticket.assert_called_once() + + def test_delete_cloud_sandbox_success( + self, + cloud_phone_sandbox, + mock_instance_manager, + ): + """Test successful cloud sandbox deletion.""" + result = cloud_phone_sandbox._delete_cloud_sandbox("test-instance-id") + + assert result is True + m_e = mock_instance_manager.eds_client + m_e.stop_equipment.assert_called_once_with( + ["test-instance-id"], + ) + + def test_delete_cloud_sandbox_failure( + self, + cloud_phone_sandbox, + mock_instance_manager, + ): + """Test cloud sandbox deletion failure.""" + m_e = mock_instance_manager.eds_client + m_e.stop_equipment.side_effect = Exception("Network error") + + result = cloud_phone_sandbox._delete_cloud_sandbox( + "test-instance-id", + ) + + assert result is False + + def test_call_cloud_tool_supported_tool( + self, + cloud_phone_sandbox, + mock_instance_manager, + ): + """Test calling a supported tool.""" + result = cloud_phone_sandbox._call_cloud_tool( + "run_shell_command", + {"command": "ls -la"}, + ) + + assert result["success"] is True + assert result["output"] == "command output" + mock_instance_manager.run_command.assert_called_once_with("ls -la") + + def test_call_cloud_tool_unsupported_tool(self, cloud_phone_sandbox): + """Test calling an unsupported tool.""" + result = cloud_phone_sandbox._call_cloud_tool( + "unsupported_tool", + {}, + ) + + assert result["success"] is False + assert "not supported" in result["error"] + + def test_call_cloud_tool_execution_exception( + self, + cloud_phone_sandbox, + mock_instance_manager, + ): + """Test calling a tool that throws exception.""" + mock_instance_manager.run_command.side_effect = Exception( + "Command failed", + ) + + result = cloud_phone_sandbox._call_cloud_tool( + "run_shell_command", + {"command": "ls -la"}, + ) + + assert result["success"] is False + assert "Command failed" in result["error"] + + def test_wait_for_phone_ready_success( + self, + cloud_phone_sandbox, + mock_instance_manager, + ): + """Test waiting for phone ready successfully.""" + mock_instance_manager.eds_client.list_instance.return_value = ( + 1, + None, + [MagicMock(android_instance_status="running")], + ) + + result = cloud_phone_sandbox._wait_for_phone_ready( + "test-instance-id", + max_wait_time=5, + stability_check_duration=1, + ) + + assert result is True + + def test_tool_handlers(self, cloud_phone_sandbox, mock_instance_manager): + """Test various tool handlers.""" + # Test run_shell_command + result = cloud_phone_sandbox._tool_run_shell_command( + {"command": "pwd"}, + ) + assert result["success"] is True + assert result["output"] == "command output" + + # Test click + result = cloud_phone_sandbox._tool_click( + { + "x1": 100, + "y1": 200, + "x2": 150, + "y2": 250, + "width": 1080, + "height": 1920, + }, + ) + assert result["success"] is True + assert result["output"] == "clicked" + + # Test type_text + result = cloud_phone_sandbox._tool_type_text({"text": "hello"}) + assert result["success"] is True + assert result["output"] == "text typed" + + # Test slide + result = cloud_phone_sandbox._tool_slide( + { + "x1": 100, + "y1": 200, + "x2": 150, + "y2": 250, + }, + ) + assert result["success"] is True + assert result["output"] == "slid" + + # Test go_home + result = cloud_phone_sandbox._tool_go_home({}) + assert result["success"] is True + assert result["output"] == "went home" + + # Test back + result = cloud_phone_sandbox._tool_back({}) + assert result["success"] is True + assert result["output"] == "pressed back" + + # Test menu + result = cloud_phone_sandbox._tool_menu({}) + assert result["success"] is True + assert result["output"] == "pressed menu" + + # Test enter + result = cloud_phone_sandbox._tool_enter({}) + assert result["success"] is True + assert result["output"] == "pressed enter" + + # Test kill_front_app + result = cloud_phone_sandbox._tool_kill_front_app({}) + assert result["success"] is True + assert result["output"] == "killed app" + + def test_file_operation_tools( + self, + cloud_phone_sandbox, + mock_instance_manager, + ): + """Test file operation tools.""" + # Test send_file + result = cloud_phone_sandbox._tool_send_file( + { + "source_file_path": "/sdcard/test.txt", + "upload_url": "http://example.com/file.txt", + }, + ) + assert result["success"] is True + assert result["status_code"] == 200 + + # Test remove_file + result = cloud_phone_sandbox._tool_remove_file( + { + "file_path": "/sdcard/test.txt", + }, + ) + assert result["success"] is True + assert result["output"] == "file removed" + + def test_screenshot_tool(self, cloud_phone_sandbox, mock_instance_manager): + """Test screenshot tool.""" + result = cloud_phone_sandbox._tool_screenshot( + { + "max_retry": 3, + }, + ) + + assert result["success"] is True + assert result["output"] == "http://screenshot.url/image.png" + + def test_list_tools_all_types(self, cloud_phone_sandbox): + """Test listing all tools.""" + result = cloud_phone_sandbox.list_tools() + + assert "tools" in result + assert "tools_by_type" in result + assert result["total_count"] > 0 + + # Check that we have tools of different types + assert "run_shell_command" in result["tools"] # command tool + assert "click" in result["tools"] # input tool + assert "screenshot" in result["tools"] # system tool + + def test_list_tools_by_type(self, cloud_phone_sandbox): + """Test listing tools by specific type.""" + # Test input tools + result = cloud_phone_sandbox.list_tools("input") + assert result["tool_type"] == "input" + assert "click" in result["tools"] + assert "type_text" in result["tools"] + + # Test navigation tools + result = cloud_phone_sandbox.list_tools("navigation") + assert result["tool_type"] == "navigation" + assert "go_home" in result["tools"] + assert "back" in result["tools"] + + # Test command tools + result = cloud_phone_sandbox.list_tools("command") + assert result["tool_type"] == "command" + assert "run_shell_command" in result["tools"] + assert "kill_front_app" in result["tools"] + + # Test system tools + result = cloud_phone_sandbox.list_tools("system") + assert result["tool_type"] == "system" + assert "screenshot" in result["tools"] + assert "send_file" in result["tools"] + + def test_get_screenshot_methods( + self, + cloud_phone_sandbox, + mock_instance_manager, + ): + """Test screenshot methods.""" + # Test get_screenshot_oss_phone success + result = cloud_phone_sandbox.get_screenshot_oss_phone(max_retry=2) + assert result == "http://screenshot.url/image.png" + + # Test get_screenshot_oss_phone failure + # After fix, if get_screenshot_sdk returns None for all retries, + # the method should return "Error" + mock_instance_manager.get_screenshot_sdk.return_value = None + result = cloud_phone_sandbox.get_screenshot_oss_phone(max_retry=2) + assert result == "Error" diff --git a/tests/unit/test_e2b_sandbox.py b/tests/unit/test_e2b_sandbox.py new file mode 100644 index 000000000..26771297e --- /dev/null +++ b/tests/unit/test_e2b_sandbox.py @@ -0,0 +1,406 @@ +# -*- coding: utf-8 -*- +# pylint: disable=redefined-outer-name, protected-access, unused-argument +# pylint: disable=too-many-public-methods +""" +Unit tests for E2bSandBox implementation. +""" + +import os +from unittest.mock import MagicMock, patch +import pytest +from PIL import Image + +from agentscope_runtime.sandbox.box.e2b.e2b_sandbox import ( + E2bSandBox, +) +from agentscope_runtime.sandbox.enums import SandboxType + + +@pytest.fixture +def mock_device(): + """Create a mock E2B device.""" + device = MagicMock() + device.sandbox_id = "test-sandbox-id" + device.id = "test-device-id" + + # Mock stream + device.stream = MagicMock() + device.stream.start = MagicMock() + device.stream.stop = MagicMock() + + # Mock commands + device.commands = MagicMock() + mock_command_result = MagicMock() + mock_command_result.stdout = "command output" + mock_command_result.stderr = "" + device.commands.run = MagicMock(return_value=mock_command_result) + + # Mock screenshot + mock_image = Image.new("RGB", (100, 100), color="red") + device.screenshot = MagicMock(return_value=mock_image) + + # Mock mouse and keyboard operations + device.move_mouse = MagicMock() + device.left_click = MagicMock() + device.right_click = MagicMock() + device.double_click = MagicMock() + device.press = MagicMock() + device.write = MagicMock() + device.launch = MagicMock() + + return device + + +@pytest.fixture +def e2b_sandbox(mock_device): + """Create an E2bSandBox instance with mocked dependencies.""" + with patch( + "agentscope_runtime.sandbox.box.e2b.e2b_sandbox.Sandbox", + ) as mock_sandbox_class: + mock_sandbox_class.create = MagicMock(return_value=mock_device) + + # Mock _create_cloud_sandbox to avoid actual API calls + # during initialization + with patch.object( + E2bSandBox, + "_create_cloud_sandbox", + return_value="test-sandbox-id", + ): + sandbox = E2bSandBox() + sandbox.device = mock_device + return sandbox + + +class TestE2bSandBox: + """Test cases for E2bSandBox class.""" + + def test_init(self, mock_device): + """Test initialization.""" + with patch( + "agentscope_runtime.sandbox.box.e2b.e2b_sandbox.Sandbox", + ) as mock_sandbox_class: + mock_sandbox_class.create = MagicMock(return_value=mock_device) + + # Mock _create_cloud_sandbox to avoid actual API calls + with patch.object( + E2bSandBox, + "_create_cloud_sandbox", + return_value="test-sandbox-id", + ): + sandbox = E2bSandBox(timeout=300, command_timeout=30) + assert sandbox.command_timeout == 30 + assert sandbox.sandbox_type == SandboxType.E2B + + def test_initialize_cloud_client(self, e2b_sandbox): + """Test cloud client initialization.""" + result = e2b_sandbox._initialize_cloud_client() + assert result == "" + + def test_create_cloud_sandbox_success(self, mock_device): + """Test successful cloud sandbox creation.""" + with patch( + "agentscope_runtime.sandbox.box.e2b.e2b_sandbox.Sandbox", + ) as mock_sandbox_class: + mock_sandbox_class.create = MagicMock(return_value=mock_device) + + # Mock _create_cloud_sandbox to avoid actual API + # calls during initialization + # This prevents Sandbox.create from being called during __init__ + with patch.object( + E2bSandBox, + "_create_cloud_sandbox", + return_value="test-sandbox-id", + ): + sandbox = E2bSandBox() + + # Reset mock to only count the explicit call in the test + mock_sandbox_class.create.reset_mock() + mock_device.stream.start.reset_mock() + + sandbox_id = sandbox._create_cloud_sandbox(timeout=300) + + assert sandbox_id == "test-sandbox-id" + mock_sandbox_class.create.assert_called_once_with(timeout=300) + mock_device.stream.start.assert_called_once() + + def test_create_cloud_sandbox_failure(self, mock_device): + """Test cloud sandbox creation failure.""" + with patch( + "agentscope_runtime.sandbox.box.e2b.e2b_sandbox.Sandbox", + ) as mock_sandbox_class: + mock_sandbox_class.create = MagicMock( + side_effect=Exception("Connection error"), + ) + + # Mock _create_cloud_sandbox to avoid RuntimeError + # during initialization + # We'll test the actual failure in the explicit call + with patch.object( + E2bSandBox, + "_create_cloud_sandbox", + return_value="test-sandbox-id", + ): + sandbox = E2bSandBox() + + # Now test the actual failure scenario + sandbox_id = sandbox._create_cloud_sandbox() + + assert sandbox_id is None + + def test_delete_cloud_sandbox_success(self, e2b_sandbox, mock_device): + """Test successful cloud sandbox deletion.""" + result = e2b_sandbox._delete_cloud_sandbox("test-sandbox-id") + + assert result is True + mock_device.stream.stop.assert_called_once() + + def test_delete_cloud_sandbox_failure(self, e2b_sandbox, mock_device): + """Test cloud sandbox deletion failure.""" + mock_device.stream.stop.side_effect = Exception("Stop error") + + result = e2b_sandbox._delete_cloud_sandbox("test-sandbox-id") + + assert result is False + + def test_call_cloud_tool_supported_tool(self, e2b_sandbox, mock_device): + """Test calling a supported tool.""" + result = e2b_sandbox._call_cloud_tool( + "run_shell_command", + {"command": "ls -la"}, + ) + + assert result["success"] is True + assert "output" in result + + def test_call_cloud_tool_unsupported_tool(self, e2b_sandbox): + """Test calling an unsupported tool.""" + result = e2b_sandbox._call_cloud_tool( + "unsupported_tool", + {}, + ) + + assert result["success"] is False + assert "not supported" in result["error"] + + def test_call_cloud_tool_execution_exception( + self, + e2b_sandbox, + mock_device, + ): + """Test calling a tool that throws exception.""" + mock_device.commands.run.side_effect = Exception("Command failed") + + result = e2b_sandbox._call_cloud_tool( + "run_shell_command", + {"command": "ls -la"}, + ) + + assert result["success"] is False + assert "Command failed" in result["error"] + + def test_tool_run_command(self, e2b_sandbox, mock_device): + """Test run command tool.""" + result = e2b_sandbox._tool_run_command({"command": "pwd"}) + + assert result["success"] is True + assert "output" in result + mock_device.commands.run.assert_called_once() + + def test_tool_run_command_background(self, e2b_sandbox, mock_device): + """Test run command tool in background.""" + result = e2b_sandbox._tool_run_command( + { + "command": "long-running-task", + "background": True, + }, + ) + + assert result["success"] is True + assert result["output"] == "The command has been started." + mock_device.commands.run.assert_called_once_with( + "long-running-task", + background=True, + ) + + def test_tool_run_command_missing_command(self, e2b_sandbox): + """Test run command tool without command.""" + result = e2b_sandbox._tool_run_command({}) + + assert result["success"] is False + assert "command" in result["error"] + + def test_tool_press_key(self, e2b_sandbox, mock_device): + """Test press key tool.""" + result = e2b_sandbox._tool_press_key({"key": "Enter"}) + + assert result["success"] is True + assert "Enter" in result["output"] + mock_device.press.assert_called_once_with("Enter") + + def test_tool_press_key_combination(self, e2b_sandbox, mock_device): + """Test press key combination tool.""" + result = e2b_sandbox._tool_press_key({"key_combination": "Ctrl+C"}) + + assert result["success"] is True + assert "Ctrl+C" in result["output"] + mock_device.press.assert_called_once_with("Ctrl+C") + + def test_tool_press_key_invalid(self, e2b_sandbox): + """Test press key with invalid arguments.""" + result = e2b_sandbox._tool_press_key( + {"key": "Enter", "key_combination": "Ctrl+C"}, + ) + + assert result["success"] is False + assert "Invalid" in result["error"] + + def test_tool_type_text(self, e2b_sandbox, mock_device): + """Test type text tool.""" + result = e2b_sandbox._tool_type_text({"text": "Hello World"}) + + assert result["success"] is True + assert "typed" in result["output"] + mock_device.write.assert_called_once() + + def test_tool_type_text_missing_text(self, e2b_sandbox): + """Test type text tool without text.""" + result = e2b_sandbox._tool_type_text({}) + + assert result["success"] is False + assert "text" in result["error"] + + def test_tool_click(self, e2b_sandbox, mock_device): + """Test click tool.""" + result = e2b_sandbox._tool_click({"x": 100, "y": 200}) + + assert result["success"] is True + assert "100" in result["output"] + assert "200" in result["output"] + mock_device.move_mouse.assert_called_once_with(100, 200) + mock_device.left_click.assert_called_once() + + def test_tool_click_double_click(self, e2b_sandbox, mock_device): + """Test double click tool.""" + result = e2b_sandbox._tool_click({"x": 100, "y": 200, "count": 2}) + + assert result["success"] is True + assert "2 times" in result["output"] + mock_device.double_click.assert_called_once() + + def test_tool_click_with_query(self, e2b_sandbox, mock_device): + """Test click tool with visual query.""" + with patch( + "agentscope_runtime.sandbox.box.e2b." + "e2b_sandbox.perform_gui_grounding_with_api", + ) as mock_grounding: + mock_grounding.return_value = (150, 250) + + result = e2b_sandbox._tool_click({"query": "click button"}) + + assert result["success"] is True + mock_device.screenshot.assert_called_once() + mock_grounding.assert_called_once() + mock_device.move_mouse.assert_called_once_with(150, 250) + + def test_tool_click_invalid_count(self, e2b_sandbox): + """Test click tool with invalid count.""" + result = e2b_sandbox._tool_click({"x": 100, "y": 200, "count": 3}) + + assert result["success"] is False + assert "Invalid count" in result["error"] + + def test_tool_right_click(self, e2b_sandbox, mock_device): + """Test right click tool.""" + result = e2b_sandbox._tool_right_click({"x": 100, "y": 200}) + + assert result["success"] is True + assert "right clicked" in result["output"] + mock_device.move_mouse.assert_called_once_with(100, 200) + mock_device.right_click.assert_called_once() + + def test_tool_click_and_type(self, e2b_sandbox, mock_device): + """Test click and type tool.""" + result = e2b_sandbox._tool_click_and_type( + { + "x": 100, + "y": 200, + "text": "Hello", + }, + ) + + assert result["success"] is True + assert "clicked and typed" in result["output"] + mock_device.move_mouse.assert_called_once_with(100, 200) + mock_device.left_click.assert_called_once() + mock_device.write.assert_called_once_with("Hello") + + def test_tool_click_and_type_missing_text(self, e2b_sandbox): + """Test click and type tool without text.""" + result = e2b_sandbox._tool_click_and_type({"x": 100, "y": 200}) + + assert result["success"] is False + assert "text" in result["error"] + + def test_tool_launch_app(self, e2b_sandbox, mock_device): + """Test launch app tool.""" + result = e2b_sandbox._tool_launch_app({"app": "notepad"}) + + assert result["success"] is True + assert "notepad" in result["output"] + mock_device.launch.assert_called_once_with("notepad") + + def test_tool_launch_app_missing_app(self, e2b_sandbox): + """Test launch app tool without app.""" + result = e2b_sandbox._tool_launch_app({}) + + assert result["success"] is False + assert "app" in result["error"] + + def test_tool_screenshot(self, e2b_sandbox, mock_device, tmpdir): + """Test screenshot tool.""" + file_path = tmpdir.join("screenshot.png").strpath + + result = e2b_sandbox._tool_screenshot({"file_path": file_path}) + + assert result["success"] is True + assert result["output"] == file_path + mock_device.screenshot.assert_called_once() + assert os.path.exists(file_path) + + def test_tool_screenshot_missing_file_path(self, e2b_sandbox): + """Test screenshot tool without file_path.""" + result = e2b_sandbox._tool_screenshot({}) + + assert result["success"] is False + assert "file_path" in result["error"] + + def test_list_tools_all_types(self, e2b_sandbox): + """Test listing all tools.""" + result = e2b_sandbox.list_tools() + + assert "tools" in result + assert "tools_by_type" in result + assert result["total_count"] > 0 + + # Check that we have tools of different types + assert "run_shell_command" in result["tools"] # command tool + assert "click" in result["tools"] # desktop tool + assert "screenshot" in result["tools"] # system tool + + def test_list_tools_by_type(self, e2b_sandbox): + """Test listing tools by specific type.""" + # Test desktop tools + result = e2b_sandbox.list_tools("desktop") + assert result["tool_type"] == "desktop" + assert "click" in result["tools"] + assert "press_key" in result["tools"] + + # Test command tools + result = e2b_sandbox.list_tools("command") + assert result["tool_type"] == "command" + assert "run_shell_command" in result["tools"] + + # Test system tools + result = e2b_sandbox.list_tools("system") + assert result["tool_type"] == "system" + assert "screenshot" in result["tools"]