Skip to content

[Feature Request] Add ubuntu and android toolkits and environments for computer and mobile use #3561

@lightaime

Description

@lightaime

Required prerequisites

Motivation

Refer to:

Summary

Add full Ubuntu and Android execution environments—including toolkits, VMs/emulators, sandboxes, action/observation spaces, and orchestration—so CAMEL agents can operate across desktop and mobile ecosystems through GUI, API, or hybrid interaction modes.

This upgrade will enable richer multi-device autonomy and real-world agent behaviors.

Notes: some of the features are supported in https://github.com/camel-ai/camel/blob/master/camel/runtimes/ubuntu_docker_runtime.py and https://github.com/camel-ai/crab


🎯 Motivation

To build general autonomous agents, CAMEL needs platform-level environments where agents can interact with real operating systems, software, GUIs, and devices.
Right now, CAMEL lacks:

  • OS-specific toolkits for Ubuntu and Android
  • Full execution sandboxes
  • Standardized action/observation spaces for GUI, API, or hybrid control
  • Runtime orchestration across heterogeneous platforms
  • VNC/noVNC-based graphical access for agent visualization and debugging

Adding Ubuntu and Android support unlocks significant new research and practical applications.


📦 Proposed Additions


1. Ubuntu Toolkit + Execution Environment

Ubuntu Toolkit Capabilities

  • Standardized command execution
  • File system operations (read/write/search)
  • GUI automation (via pyautogui, X11, Wayland, or browser-based toolkit)
  • Package management (APT)
  • Networking tools
  • Optional agent MCP integrations
  • Execution of preinstalled software

Ubuntu VM / Sandbox / Runtime

  • Based on Ubuntu 22.04 LTS

  • Runs either as:

    • VM
    • Docker sandbox
    • Agent-safe runtime
  • With optional GUI stack using:

    • VNC server
    • noVNC browser-based access

Preinstalled Ubuntu Software

Potential defaults:

  • Python, Node, Java toolchains
  • Browsers (Firefox / Chromium)
  • Developer tools (git, curl, build-essential)
  • Automation packages (xdotool, wmctrl)
  • Optional AI/ML toolchains
  • Any desired MCPs or agent toolkits

2. Android Toolkit + Execution Environment

Android Toolkit Capabilities

  • ADB commands

  • App installation/removal

  • Input simulation: tap, swipe, long press

  • Typing/text events

  • Screenshot + screen recording

  • UI hierarchy extraction

  • Intent launching and permission control

  • Optional UI automation via:

    • uiautomator2
    • Appium
    • espresso (advanced)

Android Execution Environment

  • Android Emulator (x86/ARM)

  • Sandbox with ADB bridge

  • Optional GUI access via:

    • VNC server in emulator
    • noVNC in browser
  • Configurable:

    • Android version
    • Screen resolution
    • Device profile
    • Preinstalled apps

3. Action and Observation Spaces (GUI / API / Hybrid)

Action Spaces

Agents should be able to choose actions across different modalities:

GUI-Based Actions

  • Mouse movement/click
  • Keyboard events
  • Touch gestures (Android)
  • Window focus / switching

API-Based Actions

  • System commands
  • API calls exposed by toolkits
  • ADB commands
  • High-level task actions (e.g., “open browser”, “install package”)

Hybrid Actions

  • GUI fallback when API fails
  • API introspection + GUI execution
  • Multi-step execution chains across OS boundaries

Observation Spaces

  • Full-screen screenshots
  • Bounding-box detected UI elements
  • OCR text extraction
  • System logs
  • Output of terminal/ADB commands
  • Telemetry (CPU, RAM, network)
  • File system state

4. Orchestration Layer for Runtimes and Emulators

Centralized orchestration for:

  • Managing Ubuntu VMs/sandboxes
  • Launching Android emulators
  • Starting/stopping runtimes
  • Maintaining lifecycle of multiple environments
  • Synchronizing agent interactions
  • Logging, replay, and deterministic stepping

Possible orchestrator modes:

  • Local multi-runtime
  • Cluster/distributed runtimes
  • Dockerized
  • CI-friendly headless mode

5. Integration of VNC / noVNC

Why

To give agents and developers GUI visibility.

What to integrate

  • VNC servers for Ubuntu GUI
  • VNC embedded in Android Emulator
  • noVNC to expose GUI in browser
  • Agent-accessible screenshot + OCR utilities
  • Ability to switch between screen rendering modes

Development + Debugging

  • Human-in-the-loop control
  • Replay and time-travel debugging
  • Parallel monitoring of multiple device screens

🔒 Security Model

  • Container-level sandboxing
  • Command whitelisting
  • File-system isolation
  • Resource limits (CPU, RAM, GPU)
  • Network policies
  • Debug mode vs. locked-down mode
  • Survives untrusted agent actions

🧩 Integration Points with CAMEL

  • Standard tool interface for both Ubuntu and Android
  • Unified API for actions/observations
  • Compatibility with current multi-agent workflows
  • Optional plugin system for additional OS/toolkits
  • Shared schema for tool results

📈 Expected Impact

  • Agents can operate full computers and mobile devices
  • Enables experiments in GUI agents, multimodal autonomy, tool learning, device automation, and multi-device coordination
  • Bridges the gap between purely text-based agents and real-world embodied software agents
  • Supports the long-term goal of building universal, general-purpose agents

🙋 Request for Feedback

Seeking input on:

  • Environment packaging (VMs vs. containers vs. hybrid)
  • What should be included by default in Ubuntu/Android runtimes
  • Standardization of GUI/API action schemas
  • Orchestrator design
  • Security model and execution safety boundaries
  • How to align this with future MCP/toolkit ecosystems
    👉 Write scaffolding code for Ubuntu/Android tool wrappers
    👉 Design the action/observation schema
    👉 Architect the multi-runtime orchestrator
    👉 Prepare a companion PR
    👉 Build a roadmap or RFC for the whole system

Just tell me!

Solution

No response

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

No status

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions