[Feature Request] Add `ubuntu` and `android` toolkits and environments for computer and mobile use

### Required prerequisites

- [x] I have searched the [Issue Tracker](https://github.com/camel-ai/camel/issues) and [Discussions](https://github.com/camel-ai/camel/discussions) that this hasn't already been reported. (+1 or comment there if it has.)
- [ ] Consider asking first in a [Discussion](https://github.com/camel-ai/camel/discussions/new).

### Motivation

Refer to:
- https://github.com/camel-ai/crab and other computer-use or mobile-use projects.
- https://github.com/camel-ai/camel/blob/master/camel/runtimes/ubuntu_docker_runtime.py


## **Summary**

Add full **Ubuntu** and **Android** execution environments—including toolkits, VMs/emulators, sandboxes, action/observation spaces, and orchestration—so CAMEL agents can operate across desktop and mobile ecosystems through GUI, API, or hybrid interaction modes.

This upgrade will enable richer multi-device autonomy and real-world agent behaviors.

Notes: some of the features are supported in https://github.com/camel-ai/camel/blob/master/camel/runtimes/ubuntu_docker_runtime.py and https://github.com/camel-ai/crab

---

## **🎯 Motivation**

To build general autonomous agents, CAMEL needs **platform-level environments** where agents can interact with real operating systems, software, GUIs, and devices.
Right now, CAMEL lacks:

* OS-specific toolkits for Ubuntu and Android
* Full execution sandboxes
* Standardized action/observation spaces for GUI, API, or hybrid control
* Runtime orchestration across heterogeneous platforms
* VNC/noVNC-based graphical access for agent visualization and debugging

Adding Ubuntu and Android support unlocks significant new research and practical applications.

---

## **📦 Proposed Additions**

---

# **1. Ubuntu Toolkit + Execution Environment**

### **Ubuntu Toolkit Capabilities**

* Standardized command execution
* File system operations (read/write/search)
* GUI automation (via pyautogui, X11, Wayland, or browser-based toolkit)
* Package management (APT)
* Networking tools
* Optional agent MCP integrations
* Execution of preinstalled software

### **Ubuntu VM / Sandbox / Runtime**

* Based on **Ubuntu 22.04 LTS**
* Runs either as:

  * VM
  * Docker sandbox
  * Agent-safe runtime
* With optional GUI stack using:

  * **VNC server**
  * **noVNC** browser-based access

### **Preinstalled Ubuntu Software**

Potential defaults:

* Python, Node, Java toolchains
* Browsers (Firefox / Chromium)
* Developer tools (git, curl, build-essential)
* Automation packages (xdotool, wmctrl)
* Optional AI/ML toolchains
* Any desired MCPs or agent toolkits

---

# **2. Android Toolkit + Execution Environment**

### **Android Toolkit Capabilities**

* ADB commands
* App installation/removal
* Input simulation: tap, swipe, long press
* Typing/text events
* Screenshot + screen recording
* UI hierarchy extraction
* Intent launching and permission control
* Optional UI automation via:

  * `uiautomator2`
  * `Appium`
  * `espresso` (advanced)

### **Android Execution Environment**

* Android Emulator (x86/ARM)
* Sandbox with ADB bridge
* Optional GUI access via:

  * VNC server in emulator
  * noVNC in browser
* Configurable:

  * Android version
  * Screen resolution
  * Device profile
  * Preinstalled apps

---

# **3. Action and Observation Spaces (GUI / API / Hybrid)**

### **Action Spaces**

Agents should be able to choose actions across different modalities:

#### **GUI-Based Actions**

* Mouse movement/click
* Keyboard events
* Touch gestures (Android)
* Window focus / switching

#### **API-Based Actions**

* System commands
* API calls exposed by toolkits
* ADB commands
* High-level task actions (e.g., “open browser”, “install package”)

#### **Hybrid Actions**

* GUI fallback when API fails
* API introspection + GUI execution
* Multi-step execution chains across OS boundaries

### **Observation Spaces**

* Full-screen screenshots
* Bounding-box detected UI elements
* OCR text extraction
* System logs
* Output of terminal/ADB commands
* Telemetry (CPU, RAM, network)
* File system state

---

# **4. Orchestration Layer for Runtimes and Emulators**

Centralized orchestration for:

* Managing Ubuntu VMs/sandboxes
* Launching Android emulators
* Starting/stopping runtimes
* Maintaining lifecycle of multiple environments
* Synchronizing agent interactions
* Logging, replay, and deterministic stepping

Possible orchestrator modes:

* **Local multi-runtime**
* **Cluster/distributed runtimes**
* **Dockerized**
* **CI-friendly headless mode**

---

# **5. Integration of VNC / noVNC**

### **Why**

To give agents and developers GUI visibility.

### **What to integrate**

* VNC servers for Ubuntu GUI
* VNC embedded in Android Emulator
* noVNC to expose GUI in browser
* Agent-accessible screenshot + OCR utilities
* Ability to switch between screen rendering modes

### **Development + Debugging**

* Human-in-the-loop control
* Replay and time-travel debugging
* Parallel monitoring of multiple device screens

---

## **🔒 Security Model**

* Container-level sandboxing
* Command whitelisting
* File-system isolation
* Resource limits (CPU, RAM, GPU)
* Network policies
* Debug mode vs. locked-down mode
* Survives untrusted agent actions

---

## **🧩 Integration Points with CAMEL**

* Standard tool interface for both Ubuntu and Android
* Unified API for actions/observations
* Compatibility with current multi-agent workflows
* Optional plugin system for additional OS/toolkits
* Shared schema for tool results

---

## **📈 Expected Impact**

* Agents can operate full computers and mobile devices
* Enables experiments in **GUI agents**, **multimodal autonomy**, **tool learning**, **device automation**, and **multi-device coordination**
* Bridges the gap between purely text-based agents and real-world embodied software agents
* Supports the long-term goal of building **universal, general-purpose agents**

---

## **🙋 Request for Feedback**

Seeking input on:

* Environment packaging (VMs vs. containers vs. hybrid)
* What should be included by default in Ubuntu/Android runtimes
* Standardization of GUI/API action schemas
* Orchestrator design
* Security model and execution safety boundaries
* How to align this with future MCP/toolkit ecosystems
👉 Write scaffolding code for Ubuntu/Android tool wrappers
👉 Design the action/observation schema
👉 Architect the multi-runtime orchestrator
👉 Prepare a companion PR
👉 Build a roadmap or RFC for the whole system

Just tell me!





### Solution

_No response_

### Alternatives

_No response_

### Additional context

_No response_

[Feature Request] Add ubuntu and android toolkits and environments for computer and mobile use #3561

Description

Required prerequisites

Motivation

Summary

🎯 Motivation

📦 Proposed Additions

1. Ubuntu Toolkit + Execution Environment

Ubuntu Toolkit Capabilities

Ubuntu VM / Sandbox / Runtime

Preinstalled Ubuntu Software

2. Android Toolkit + Execution Environment

Android Toolkit Capabilities

Android Execution Environment

3. Action and Observation Spaces (GUI / API / Hybrid)

Action Spaces

GUI-Based Actions

API-Based Actions

Hybrid Actions

Observation Spaces

4. Orchestration Layer for Runtimes and Emulators

5. Integration of VNC / noVNC

Why

What to integrate

Development + Debugging

🔒 Security Model

🧩 Integration Points with CAMEL

📈 Expected Impact

🙋 Request for Feedback

Solution

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Feature Request] Add `ubuntu` and `android` toolkits and environments for computer and mobile use #3561