AgentDroid 是一个基于FastAPI的AI驱动移动设备自动化代理服务器,能够通过自然语言指令智能控制Android设备执行各种操作。项目采用双引擎架构设计,支持从简单的单智能体控制到复杂的多智能体协作任务。
- 🤖 双引擎架构: V1简单直接执行 + V4多智能体协作
- 📱 移动设备控制: 通过ADB实现Android设备的完整控制
- 🧠 多模态AI: 结合视觉理解和自然语言处理
- ⚡ 异步处理: 支持后台任务执行和回调机制
- 🔌 RESTful API: 标准化的Web API接口
- 🐳 容器化部署: 支持Docker一键部署
graph TB
%% 用户交互层
subgraph "用户交互层"
User[用户/客户端应用]
WebUI[Web界面]
RESTAPI[REST API客户端]
end
%% API网关层
subgraph "API网关层 (main.py)"
FastAPI[FastAPI服务器<br/>端口:9777]
subgraph "API端点"
SyncV1["/run-agent<br/>同步V1引擎"]
AsyncV1["/run-agent-async<br/>异步V1引擎"]
SyncV4["/run-agent-v4<br/>同步V4引擎"]
AsyncV4["/run-agent-v4-async<br/>异步V4引擎"]
CallbackAPI["/callback-test<br/>回调测试"]
end
subgraph "中间件"
TaskLock[任务执行锁<br/>防并发冲突]
BackgroundTasks[后台任务管理器]
CallbackHandler[回调处理器]
end
end
%% 核心引擎层
subgraph "核心引擎层"
subgraph "V1引擎 (agent_core.py)"
V1Core[run_mobile_agent<br/>V1核心执行器]
V1Device[ADB设备连接器]
V1Screenshot[屏幕截图处理]
V1SystemMsg[系统消息构建器]
V1ActionExec[动作执行器]
end
subgraph "V4引擎 (agent_core_v4.py)"
V4Core[MobileAgentV4Runner<br/>V4运行器]
V4Async[异步V4执行器]
V4Env[SimpleAdbEnv<br/>环境适配器]
end
end
%% 多智能体层 (V4专用)
subgraph "多智能体协作层"
InfoPool[InfoPool<br/>信息池<br/>全局状态管理]
subgraph "智能体组件"
Manager[Manager<br/>任务规划智能体<br/>高级决策制定]
Executor[Executor<br/>动作执行智能体<br/>具体操作执行]
Reflector[ActionReflector<br/>反思智能体<br/>错误检测修正]
Notetaker[Notetaker<br/>记录智能体<br/>知识积累]
end
V4Agent[MobileAgentV4_Optimized<br/>V4优化智能体]
end
%% 工具服务层
subgraph "工具服务层"
subgraph "设备控制工具"
MobileUse[MobileUse<br/>移动设备控制<br/>点击/滑动/输入]
ComputerUse[ComputerUse<br/>计算机控制]
end
subgraph "动作处理"
JSONAction[JSONAction<br/>标准化动作格式]
ActionConverter[动作转换器<br/>格式标准化]
end
subgraph "辅助工具"
ImageUtils[图像处理工具<br/>截图/缩放/编码]
CommonUtils[通用工具<br/>消息解析/标签处理]
FileUtils[文件工具<br/>文件读写操作]
ContactUtils[联系人工具<br/>联系人管理]
FuzzyMatch[模糊匹配库<br/>文本相似度]
end
end
%% 外部服务层
subgraph "外部服务层"
subgraph "AI模型服务"
OpenAI[OpenAI API<br/>GUI-OWL-7B]
vLLM[vLLM服务<br/>本地部署模型]
QwenAgent[Qwen Agent<br/>阿里通义千问]
end
subgraph "设备连接"
ADBServer[ADB服务器<br/>端口:5037]
AndroidDevice[Android设备<br/>USB/WiFi连接]
end
subgraph "外部回调"
CallbackURL[回调接收端<br/>外部系统集成]
end
end
%% 数据存储层
subgraph "数据存储层"
subgraph "内存存储"
CallbackLogs[回调日志<br/>内存缓存]
TaskState[任务状态<br/>运行时状态]
end
subgraph "文件存储"
AgentOutputs[agent_outputs/<br/>执行日志输出]
Screenshots[screenshots/<br/>屏幕截图存储]
ActionLogs[action_logs/<br/>动作执行记录]
end
end
%% 连接关系 - 用户层到API层
User --> FastAPI
WebUI --> FastAPI
RESTAPI --> FastAPI
%% API层内部连接
FastAPI --> SyncV1
FastAPI --> AsyncV1
FastAPI --> SyncV4
FastAPI --> AsyncV4
FastAPI --> CallbackAPI
%% API到引擎层
SyncV1 --> V1Core
AsyncV1 --> BackgroundTasks
SyncV4 --> V4Core
AsyncV4 --> BackgroundTasks
BackgroundTasks --> V1Core
BackgroundTasks --> V4Async
BackgroundTasks --> CallbackHandler
%% V1引擎内部连接
V1Core --> V1Device
V1Core --> V1Screenshot
V1Core --> V1SystemMsg
V1Core --> V1ActionExec
%% V4引擎内部连接
V4Core --> V4Env
V4Async --> V4Core
V4Core --> V4Agent
%% V4多智能体连接
V4Agent --> InfoPool
V4Agent --> Manager
V4Agent --> Executor
V4Agent --> Reflector
V4Agent --> Notetaker
InfoPool --> Manager
InfoPool --> Executor
InfoPool --> Reflector
InfoPool --> Notetaker
%% 工具层连接
V1Core --> MobileUse
Executor --> MobileUse
V1ActionExec --> JSONAction
ActionConverter --> JSONAction
V1Screenshot --> ImageUtils
V4Env --> ImageUtils
V1SystemMsg --> CommonUtils
%% 外部服务连接
Manager --> OpenAI
Executor --> OpenAI
Reflector --> OpenAI
Notetaker --> OpenAI
V1SystemMsg --> OpenAI
V1Device --> ADBServer
V4Env --> ADBServer
ADBServer --> AndroidDevice
CallbackHandler --> CallbackURL
%% 存储连接
V1Core --> AgentOutputs
V4Agent --> AgentOutputs
V1Screenshot --> Screenshots
V4Env --> Screenshots
CallbackAPI --> CallbackLogs
%% 样式定义
classDef userLayer fill:#e1f5fe,stroke:#01579b,stroke-width:2px
classDef apiLayer fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
classDef engineLayer fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px
classDef agentLayer fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef toolLayer fill:#fce4ec,stroke:#880e4f,stroke-width:2px
classDef externalLayer fill:#ffebee,stroke:#b71c1c,stroke-width:2px
classDef storageLayer fill:#f1f8e9,stroke:#33691e,stroke-width:2px
class User,WebUI,RESTAPI userLayer
class FastAPI,SyncV1,AsyncV1,SyncV4,AsyncV4,CallbackAPI,TaskLock,BackgroundTasks,CallbackHandler apiLayer
class V1Core,V1Device,V1Screenshot,V1SystemMsg,V1ActionExec,V4Core,V4Async,V4Env engineLayer
class InfoPool,Manager,Executor,Reflector,Notetaker,V4Agent agentLayer
class MobileUse,ComputerUse,JSONAction,ActionConverter,ImageUtils,CommonUtils,FileUtils,ContactUtils,FuzzyMatch toolLayer
class OpenAI,vLLM,QwenAgent,ADBServer,AndroidDevice,CallbackURL externalLayer
class CallbackLogs,TaskState,AgentOutputs,Screenshots,ActionLogs storageLayer
- Python 3.8+
- Android设备(开启USB调试)
- ADB工具
- OpenAI API密钥或本地vLLM服务
- 克隆项目
git clone https://github.com/sav7ng/AgentDroid.git
cd AgentDroid
- 安装依赖
pip install -r requirements.txt
- 配置环境
# 连接Android设备
adb devices
# 确保设备已连接并授权
- 启动服务
python main.py
服务将在 http://localhost:9777
启动
# 构建镜像
docker build -t agentdroid .
# 运行容器
docker run -p 9777:9777 -v /dev/bus/usb:/dev/bus/usb --privileged agentdroid
curl -X POST "http://localhost:9777/run-agent" \
-H "Content-Type: application/json" \
-d '{
"instruction": "打开微信并发送消息给张三说你好",
"max_steps": 50,
"api_key": "your-openai-api-key",
"base_url": "https://api.openai.com/v1",
"model_name": "gpt-4-vision-preview"
}'
curl -X POST "http://localhost:9777/run-agent-v4-async" \
-H "Content-Type: application/json" \
-d '{
"instruction": "在淘宝上搜索iPhone并加入购物车",
"max_steps": 100,
"api_key": "your-openai-api-key",
"base_url": "https://api.openai.com/v1",
"model_name": "gpt-4-vision-preview",
"output_path": "./agent_outputs",
"callback_url": "http://your-server.com/callback"
}'
import requests
# 创建任务
response = requests.post("http://localhost:9777/run-agent-v4-async", json={
"instruction": "帮我在美团上点一份外卖",
"max_steps": 80,
"api_key": "your-api-key",
"base_url": "https://api.openai.com/v1",
"model_name": "gpt-4-vision-preview",
"callback_url": "http://your-callback-url.com/webhook"
})
task_info = response.json()
print(f"任务ID: {task_info['task_id']}")
print(f"状态: {task_info['status']}")
- OpenAI GPT-4V: 推荐用于复杂任务
- Claude-3: 支持多模态理解
- 本地vLLM: 支持开源模型部署
- 其他兼容OpenAI API的服务
特性 | V1引擎 | V4引擎 |
---|---|---|
适用场景 | 简单操作 | 复杂任务 |
执行速度 | 快 | 中等 |
准确性 | 中等 | 高 |
错误恢复 | 基础 | 智能 |
资源消耗 | 低 | 中等 |
AgentDroid/
├── main.py # FastAPI主服务
├── agent_core.py # V1引擎核心
├── agent_core_v4.py # V4引擎核心
├── agents/ # 智能体模块
│ ├── base_agent.py # 基础智能体类
│ ├── mobile_agent_v4.py # V4优化版智能体
│ ├── mobile_agent_v4_agent.py # V4智能体定义
│ └── mobile_agent_utils_v4.py # V4工具函数
├── utils/ # 工具模块
│ ├── mobile_use.py # 移动设备控制
│ ├── computer_use.py # 计算机控制
│ └── common.py # 通用工具
├── scripts/ # 部署脚本
├── tests/ # 测试文件
└── agent_outputs/ # 输出目录
- 商品搜索和比价
- 自动下单和支付
- 订单状态跟踪
- 自动发布内容
- 消息回复和互动
- 数据收集和分析
- UI自动化测试
- 功能回归测试
- 性能监控
- 定时提醒和通知
- 文件管理和整理
- 系统设置配置
# 查看实时日志
tail -f logs/agent.log
# 查看错误日志
grep "ERROR" logs/agent.log
# 获取执行统计
response = requests.get("http://localhost:9777/stats")
stats = response.json()
print(f"成功率: {stats['success_rate']:.2%}")
# 启用调试模式
export DEBUG=true
python main.py
本项目大量使用了 MobileAgent 项目的优秀源码和设计理念。MobileAgent是由阿里巴巴X-PLUG团队开发的开创性移动设备AI控制框架,为移动设备自动化领域做出了重要贡献。
感谢MobileAgent项目提供的核心技术:
- 多智能体协作架构设计
- 移动设备控制和交互机制
- 视觉理解和动作执行逻辑
- 错误处理和恢复策略
MobileAgent项目信息:
- 项目地址: https://github.com/X-PLUG/MobileAgent
- 论文: MobileAgent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
- 团队: 阿里巴巴X-PLUG
我们在MobileAgent的基础上进行了以下改进和扩展:
- 添加了FastAPI Web服务接口
- 实现了异步任务处理和回调机制
- 优化了多智能体协作流程
- 增强了错误处理和日志记录
- 支持了更多AI模型和部署方式
本项目采用 MIT 协议开源,在遵循原项目协议的基础上,欢迎大家使用、修改和分发。
如果您在使用过程中遇到问题,请通过以下方式反馈:
- 🐛 提交Issue
- 💬 讨论区
- 📧 发送邮件至: [email protected]
本项目基于 MIT License 开源协议。
If you find Mobile-Agent useful for your research and applications, please cite using this BibTeX:
@article{ye2025mobile,
title={Mobile-Agent-v3: Foundamental Agents for GUI Automation},
author={Ye, Jiabo and Zhang, Xi and Xu, Haiyang and Liu, Haowei and Wang, Junyang and Zhu, Zhaoqing and Zheng, Ziwei and Gao, Feiyu and Cao, Junjie and Lu, Zhengxi and others},
journal={arXiv preprint arXiv:2508.15144},
year={2025}
}
@article{wanyan2025look,
title={Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation},
author={Wanyan, Yuyang and Zhang, Xi and Xu, Haiyang and Liu, Haowei and Wang, Junyang and Ye, Jiabo and Kou, Yutong and Yan, Ming and Huang, Fei and Yang, Xiaoshan and others},
journal={arXiv preprint arXiv:2506.04614},
year={2025}
}
@article{liu2025pc,
title={PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC},
author={Liu, Haowei and Zhang, Xi and Xu, Haiyang and Wanyan, Yuyang and Wang, Junyang and Yan, Ming and Zhang, Ji and Yuan, Chunfeng and Xu, Changsheng and Hu, Weiming and Huang, Fei},
journal={arXiv preprint arXiv:2502.14282},
year={2025}
}
@article{wang2025mobile,
title={Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks},
author={Wang, Zhenhailong and Xu, Haiyang and Wang, Junyang and Zhang, Xi and Yan, Ming and Zhang, Ji and Huang, Fei and Ji, Heng},
journal={arXiv preprint arXiv:2501.11733},
year={2025}
}
@article{wang2024mobile2,
title={Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration},
author={Wang, Junyang and Xu, Haiyang and Jia, Haitao and Zhang, Xi and Yan, Ming and Shen, Weizhou and Zhang, Ji and Huang, Fei and Sang, Jitao},
journal={arXiv preprint arXiv:2406.01014},
year={2024}
}
@article{wang2024mobile,
title={Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception},
author={Wang, Junyang and Xu, Haiyang and Ye, Jiabo and Yan, Ming and Shen, Weizhou and Zhang, Ji and Huang, Fei and Sang, Jitao},
journal={arXiv preprint arXiv:2401.16158},
year={2024}
}
如果这个项目对您有帮助,请给我们一个 ⭐ Star!
Made with ❤️ by AgentDroid Team