AgentDroid - AI移动设备自动化服务器

基于AI的移动设备智能控制与自动化平台

📖 项目简介

AgentDroid 是一个基于FastAPI的AI驱动移动设备自动化代理服务器，能够通过自然语言指令智能控制Android设备执行各种操作。项目采用双引擎架构设计，支持从简单的单智能体控制到复杂的多智能体协作任务。

🌟 核心特性

🤖 双引擎架构: V1简单直接执行 + V4多智能体协作
📱 移动设备控制: 通过ADB实现Android设备的完整控制
🧠 多模态AI: 结合视觉理解和自然语言处理
⚡ 异步处理: 支持后台任务执行和回调机制
🔌 RESTful API: 标准化的Web API接口
🐳 容器化部署: 支持Docker一键部署

🏗️ 系统架构

 graph TB
    %% 用户交互层
    subgraph "用户交互层"
        User[用户/客户端应用]
        WebUI[Web界面]
        RESTAPI[REST API客户端]
    end

    %% API网关层
    subgraph "API网关层 (main.py)"
        FastAPI[FastAPI服务器<br/>端口:9777]
        
        subgraph "API端点"
            SyncV1["/run-agent<br/>同步V1引擎"]
            AsyncV1["/run-agent-async<br/>异步V1引擎"]
            SyncV4["/run-agent-v4<br/>同步V4引擎"]
            AsyncV4["/run-agent-v4-async<br/>异步V4引擎"]
            CallbackAPI["/callback-test<br/>回调测试"]
        end
        
        subgraph "中间件"
            TaskLock[任务执行锁<br/>防并发冲突]
            BackgroundTasks[后台任务管理器]
            CallbackHandler[回调处理器]
        end
    end

    %% 核心引擎层
    subgraph "核心引擎层"
        subgraph "V1引擎 (agent_core.py)"
            V1Core[run_mobile_agent<br/>V1核心执行器]
            V1Device[ADB设备连接器]
            V1Screenshot[屏幕截图处理]
            V1SystemMsg[系统消息构建器]
            V1ActionExec[动作执行器]
        end
        
        subgraph "V4引擎 (agent_core_v4.py)"
            V4Core[MobileAgentV4Runner<br/>V4运行器]
            V4Async[异步V4执行器]
            V4Env[SimpleAdbEnv<br/>环境适配器]
        end
    end

    %% 多智能体层 (V4专用)
    subgraph "多智能体协作层"
        InfoPool[InfoPool<br/>信息池<br/>全局状态管理]
        
        subgraph "智能体组件"
            Manager[Manager<br/>任务规划智能体<br/>高级决策制定]
            Executor[Executor<br/>动作执行智能体<br/>具体操作执行]
            Reflector[ActionReflector<br/>反思智能体<br/>错误检测修正]
            Notetaker[Notetaker<br/>记录智能体<br/>知识积累]
        end
        
        V4Agent[MobileAgentV4_Optimized<br/>V4优化智能体]
    end

    %% 工具服务层
    subgraph "工具服务层"
        subgraph "设备控制工具"
            MobileUse[MobileUse<br/>移动设备控制<br/>点击/滑动/输入]
            ComputerUse[ComputerUse<br/>计算机控制]
        end
        
        subgraph "动作处理"
            JSONAction[JSONAction<br/>标准化动作格式]
            ActionConverter[动作转换器<br/>格式标准化]
        end
        
        subgraph "辅助工具"
            ImageUtils[图像处理工具<br/>截图/缩放/编码]
            CommonUtils[通用工具<br/>消息解析/标签处理]
            FileUtils[文件工具<br/>文件读写操作]
            ContactUtils[联系人工具<br/>联系人管理]
            FuzzyMatch[模糊匹配库<br/>文本相似度]
        end
    end

    %% 外部服务层
    subgraph "外部服务层"
        subgraph "AI模型服务"
            OpenAI[OpenAI API<br/>GUI-OWL-7B]
            vLLM[vLLM服务<br/>本地部署模型]
            QwenAgent[Qwen Agent<br/>阿里通义千问]
        end
        
        subgraph "设备连接"
            ADBServer[ADB服务器<br/>端口:5037]
            AndroidDevice[Android设备<br/>USB/WiFi连接]
        end
        
        subgraph "外部回调"
            CallbackURL[回调接收端<br/>外部系统集成]
        end
    end

    %% 数据存储层
    subgraph "数据存储层"
        subgraph "内存存储"
            CallbackLogs[回调日志<br/>内存缓存]
            TaskState[任务状态<br/>运行时状态]
        end
        
        subgraph "文件存储"
            AgentOutputs[agent_outputs/<br/>执行日志输出]
            Screenshots[screenshots/<br/>屏幕截图存储]
            ActionLogs[action_logs/<br/>动作执行记录]
        end
    end

    %% 连接关系 - 用户层到API层
    User --> FastAPI
    WebUI --> FastAPI
    RESTAPI --> FastAPI
    
    %% API层内部连接
    FastAPI --> SyncV1
    FastAPI --> AsyncV1
    FastAPI --> SyncV4
    FastAPI --> AsyncV4
    FastAPI --> CallbackAPI
    
    %% API到引擎层
    SyncV1 --> V1Core
    AsyncV1 --> BackgroundTasks
    SyncV4 --> V4Core
    AsyncV4 --> BackgroundTasks
    
    BackgroundTasks --> V1Core
    BackgroundTasks --> V4Async
    BackgroundTasks --> CallbackHandler
    
    %% V1引擎内部连接
    V1Core --> V1Device
    V1Core --> V1Screenshot
    V1Core --> V1SystemMsg
    V1Core --> V1ActionExec
    
    %% V4引擎内部连接
    V4Core --> V4Env
    V4Async --> V4Core
    V4Core --> V4Agent
    
    %% V4多智能体连接
    V4Agent --> InfoPool
    V4Agent --> Manager
    V4Agent --> Executor
    V4Agent --> Reflector
    V4Agent --> Notetaker
    
    InfoPool --> Manager
    InfoPool --> Executor
    InfoPool --> Reflector
    InfoPool --> Notetaker
    
    %% 工具层连接
    V1Core --> MobileUse
    Executor --> MobileUse
    V1ActionExec --> JSONAction
    ActionConverter --> JSONAction
    
    V1Screenshot --> ImageUtils
    V4Env --> ImageUtils
    V1SystemMsg --> CommonUtils
    
    %% 外部服务连接
    Manager --> OpenAI
    Executor --> OpenAI
    Reflector --> OpenAI
    Notetaker --> OpenAI
    V1SystemMsg --> OpenAI
    
    V1Device --> ADBServer
    V4Env --> ADBServer 
    ADBServer --> AndroidDevice
    
    CallbackHandler --> CallbackURL
    
    %% 存储连接
    V1Core --> AgentOutputs
    V4Agent --> AgentOutputs
    V1Screenshot --> Screenshots
    V4Env --> Screenshots
    CallbackAPI --> CallbackLogs
    
    %% 样式定义
    classDef userLayer fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    classDef apiLayer fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    classDef engineLayer fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px
    classDef agentLayer fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef toolLayer fill:#fce4ec,stroke:#880e4f,stroke-width:2px
    classDef externalLayer fill:#ffebee,stroke:#b71c1c,stroke-width:2px
    classDef storageLayer fill:#f1f8e9,stroke:#33691e,stroke-width:2px
    
    class User,WebUI,RESTAPI userLayer
    class FastAPI,SyncV1,AsyncV1,SyncV4,AsyncV4,CallbackAPI,TaskLock,BackgroundTasks,CallbackHandler apiLayer
    class V1Core,V1Device,V1Screenshot,V1SystemMsg,V1ActionExec,V4Core,V4Async,V4Env engineLayer
    class InfoPool,Manager,Executor,Reflector,Notetaker,V4Agent agentLayer
    class MobileUse,ComputerUse,JSONAction,ActionConverter,ImageUtils,CommonUtils,FileUtils,ContactUtils,FuzzyMatch toolLayer
    class OpenAI,vLLM,QwenAgent,ADBServer,AndroidDevice,CallbackURL externalLayer
    class CallbackLogs,TaskState,AgentOutputs,Screenshots,ActionLogs storageLayer

🚀 快速开始

环境要求

Python 3.8+
Android设备（开启USB调试）
ADB工具
OpenAI API密钥或本地vLLM服务

安装步骤

克隆项目

git clone https://github.com/sav7ng/AgentDroid.git
cd AgentDroid

安装依赖

pip install -r requirements.txt

配置环境

# 连接Android设备
adb devices

# 确保设备已连接并授权

启动服务

python main.py

服务将在 http://localhost:9777 启动

Docker部署

# 构建镜像
docker build -t agentdroid .

# 运行容器
docker run -p 9777:9777 -v /dev/bus/usb:/dev/bus/usb --privileged agentdroid

📚 API使用指南

基础API调用

同步执行（V1引擎）

curl -X POST "http://localhost:9777/run-agent" \
  -H "Content-Type: application/json" \
  -d '{
    "instruction": "打开微信并发送消息给张三说你好",
    "max_steps": 50,
    "api_key": "your-openai-api-key",
    "base_url": "https://api.openai.com/v1",
    "model_name": "gpt-4-vision-preview"
  }'

异步执行（V4引擎）

curl -X POST "http://localhost:9777/run-agent-v4-async" \
  -H "Content-Type: application/json" \
  -d '{
    "instruction": "在淘宝上搜索iPhone并加入购物车",
    "max_steps": 100,
    "api_key": "your-openai-api-key",
    "base_url": "https://api.openai.com/v1",
    "model_name": "gpt-4-vision-preview",
    "output_path": "./agent_outputs",
    "callback_url": "http://your-server.com/callback"
  }'

Python SDK示例

import requests

# 创建任务
response = requests.post("http://localhost:9777/run-agent-v4-async", json={
    "instruction": "帮我在美团上点一份外卖",
    "max_steps": 80,
    "api_key": "your-api-key",
    "base_url": "https://api.openai.com/v1",
    "model_name": "gpt-4-vision-preview",
    "callback_url": "http://your-callback-url.com/webhook"
})

task_info = response.json()
print(f"任务ID: {task_info['task_id']}")
print(f"状态: {task_info['status']}")

🔧 配置说明

支持的AI模型

OpenAI GPT-4V: 推荐用于复杂任务
Claude-3: 支持多模态理解
本地vLLM: 支持开源模型部署
其他兼容OpenAI API的服务

引擎选择指南

特性	V1引擎	V4引擎
适用场景	简单操作	复杂任务
执行速度	快	中等
准确性	中等	高
错误恢复	基础	智能
资源消耗	低	中等

📁 项目结构

AgentDroid/
├── main.py                 # FastAPI主服务
├── agent_core.py           # V1引擎核心
├── agent_core_v4.py        # V4引擎核心
├── agents/                 # 智能体模块
│   ├── base_agent.py       # 基础智能体类
│   ├── mobile_agent_v4.py  # V4优化版智能体
│   ├── mobile_agent_v4_agent.py  # V4智能体定义
│   └── mobile_agent_utils_v4.py  # V4工具函数
├── utils/                  # 工具模块
│   ├── mobile_use.py       # 移动设备控制
│   ├── computer_use.py     # 计算机控制
│   └── common.py           # 通用工具
├── scripts/                # 部署脚本
├── tests/                  # 测试文件
└── agent_outputs/          # 输出目录

🎯 使用场景

电商自动化

商品搜索和比价
自动下单和支付
订单状态跟踪

社交媒体管理

自动发布内容
消息回复和互动
数据收集和分析

应用测试

UI自动化测试
功能回归测试
性能监控

日常任务自动化

定时提醒和通知
文件管理和整理
系统设置配置

🔍 监控和调试

日志查看

# 查看实时日志
tail -f logs/agent.log

# 查看错误日志
grep "ERROR" logs/agent.log

性能监控

# 获取执行统计
response = requests.get("http://localhost:9777/stats")
stats = response.json()
print(f"成功率: {stats['success_rate']:.2%}")

调试模式

# 启用调试模式
export DEBUG=true
python main.py

🤝 致谢与开源协议

特别致谢

本项目大量使用了 MobileAgent 项目的优秀源码和设计理念。MobileAgent是由阿里巴巴X-PLUG团队开发的开创性移动设备AI控制框架，为移动设备自动化领域做出了重要贡献。

感谢MobileAgent项目提供的核心技术：

多智能体协作架构设计
移动设备控制和交互机制
视觉理解和动作执行逻辑
错误处理和恢复策略

MobileAgent项目信息：

项目地址: https://github.com/X-PLUG/MobileAgent
论文: MobileAgent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
团队: 阿里巴巴X-PLUG

我们在MobileAgent的基础上进行了以下改进和扩展：

添加了FastAPI Web服务接口
实现了异步任务处理和回调机制
优化了多智能体协作流程
增强了错误处理和日志记录
支持了更多AI模型和部署方式

开源协议

本项目采用 MIT 协议开源，在遵循原项目协议的基础上，欢迎大家使用、修改和分发。

🐛 问题反馈

如果您在使用过程中遇到问题，请通过以下方式反馈：

🐛 提交Issue
💬 讨论区
📧 发送邮件至: [email protected]

📄 许可证

本项目基于 MIT License 开源协议。

📑Citation

If you find Mobile-Agent useful for your research and applications, please cite using this BibTeX:

@article{ye2025mobile,
  title={Mobile-Agent-v3: Foundamental Agents for GUI Automation},
  author={Ye, Jiabo and Zhang, Xi and Xu, Haiyang and Liu, Haowei and Wang, Junyang and Zhu, Zhaoqing and Zheng, Ziwei and Gao, Feiyu and Cao, Junjie and Lu, Zhengxi and others},
  journal={arXiv preprint arXiv:2508.15144},
  year={2025}
}

@article{wanyan2025look,
  title={Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation},
  author={Wanyan, Yuyang and Zhang, Xi and Xu, Haiyang and Liu, Haowei and Wang, Junyang and Ye, Jiabo and Kou, Yutong and Yan, Ming and Huang, Fei and Yang, Xiaoshan and others},
  journal={arXiv preprint arXiv:2506.04614},
  year={2025}
}

@article{liu2025pc,
  title={PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC},
  author={Liu, Haowei and Zhang, Xi and Xu, Haiyang and Wanyan, Yuyang and Wang, Junyang and Yan, Ming and Zhang, Ji and Yuan, Chunfeng and Xu, Changsheng and Hu, Weiming and Huang, Fei},
  journal={arXiv preprint arXiv:2502.14282},
  year={2025}
}

@article{wang2025mobile,
  title={Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks},
  author={Wang, Zhenhailong and Xu, Haiyang and Wang, Junyang and Zhang, Xi and Yan, Ming and Zhang, Ji and Huang, Fei and Ji, Heng},
  journal={arXiv preprint arXiv:2501.11733},
  year={2025}
}

@article{wang2024mobile2,
  title={Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration},
  author={Wang, Junyang and Xu, Haiyang and Jia, Haitao and Zhang, Xi and Yan, Ming and Shen, Weizhou and Zhang, Ji and Huang, Fei and Sang, Jitao},
  journal={arXiv preprint arXiv:2406.01014},
  year={2024}
}

@article{wang2024mobile,
  title={Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception},
  author={Wang, Junyang and Xu, Haiyang and Ye, Jiabo and Yan, Ming and Shen, Weizhou and Zhang, Ji and Huang, Fei and Sang, Jitao},
  journal={arXiv preprint arXiv:2401.16158},
  year={2024}
}

如果这个项目对您有帮助，请给我们一个 ⭐ Star！

Made with ❤️ by AgentDroid Team

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
agents		agents
env		env
scripts		scripts
tests		tests
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
agent_core.py		agent_core.py
agent_core_v4.py		agent_core_v4.py
main.py		main.py
requirements.txt		requirements.txt
system_architecture.md		system_architecture.md

sav7ng/AgentDroid

Folders and files

Latest commit

History

Repository files navigation