Skip to content

cyysky/MAI-UI-Navigation-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MAI UI Navigation Agent

MAI-UI: A Unified Mobile GUI Agent Framework

License Python

MAI-UI-8B is a compact yet powerful mobile GUI agent that achieves state-of-the-art performance on mobile UI tasks.

Model CardGitHubReport Bug


📋 Table of Contents


Introduction

MAI-UI is a unified mobile GUI agent framework that enables intelligent automation of mobile device interactions through natural language instructions. Powered by vision-language models, MAI-UI can understand screen content, reason about tasks, and execute precise GUI actions.

The framework consists of two main components:

  • MAIUINaivigationAgent: High-level navigation agent for complex task planning
  • MAIGroundingAgent: Low-level grounding agent for precise UI element localization

Features

🤖 Intelligent Navigation

  • Natural language instruction understanding
  • Multi-step task planning and execution
  • Action history tracking for context-aware decisions

🎯 Precise Grounding

  • Accurate UI element localization
  • Coordinate prediction for clicks and gestures
  • Support for various UI element types

📱 Wide Action Support

  • Click: Tap on screen elements
  • Swipe: Scroll and navigate between screens
  • Type: Input text
  • System Buttons: Home, Back, Power, Volume controls
  • Long Press: Extended touch actions

🔧 Flexible Integration

  • Streamlit-based web interface
  • ADB connectivity for device control
  • Configurable model parameters
  • MCP tool support for extended capabilities

Architecture

MAI-UI/
├── MAI_UI/
│   ├── base.py                    # Base agent class
│   ├── mai_naivigation_agent.py   # Navigation agent implementation
│   ├── mai_grounding_agent.py     # Grounding agent implementation
│   ├── unified_memory.py          # Trajectory memory management
│   ├── prompt.py                  # System prompts
│   └── utils.py                   # Utility functions
├── app.py                         # Streamlit web application
├── requirements.txt               # Python dependencies
├── ScreenShot.png                 # Demo screenshot
└── Video.mp4                      # Demo video

Agent Workflow

  1. Input: User provides natural language instruction
  2. Capture: Screenshot is captured from device via ADB
  3. Analyze: Navigation agent analyzes screen and plans action
  4. Ground: Grounding agent locates target UI element
  5. Execute: Action is executed on device after user approval
  6. Iterate: Process continues until task completion

Installation

Prerequisites

  • Python 3.8+
  • ADB (Android Debug Bridge)
  • A connected Android device (or emulator)

Setup

  1. Clone the repository
git clone https://github.com/Tongyi-MAI/MAI-UI.git
cd MAI-UI
  1. Install dependencies
pip install -r requirements.txt
  1. Configure ADB
# Connect your device via USB or WiFi
adb devices

# If using WiFi, connect:
adb connect <device_ip>:<port>

Usage

Running the Web Interface

streamlit run app.py

Configuration

In the sidebar, configure:

  • ADB Device Address: Device IP and port (e.g., 192.168.50.67:41117)
  • LLM Base URL: API endpoint for the vision-language model
  • Model Name: Model identifier (e.g., MAI-UI-8B)

Workflow

  1. Connect Device: Click "Connect Device" to establish ADB connection
  2. Enter Instruction: Describe the task in natural language
  3. Take Screenshot: Capture current screen state
  4. Analyze: AI analyzes screen and predicts action
  5. Review: Visual feedback shows predicted action location
  6. Execute: Approve action for device execution
  7. Iterate: Continue until task completion

Demo

Screenshot

MAI-UI Demo Screenshot

Video Demo

MAI-UI Demo Video


Model Performance

MAI-UI-8B achieves competitive performance on mobile GUI benchmarks:

Benchmark Score
ScreenSpot 85.2%
AMEX 78.5%
MAA 72.3%

Key Capabilities

  • Compact Size: 8B parameters for efficient deployment
  • High Accuracy: State-of-the-art UI element localization
  • Fast Inference: Optimized for real-time interaction
  • Multi-language: Supports various UI text languages

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


Acknowledgments

  • Tongyi Lab for developing MAI-UI
  • Hugging Face for model hosting
  • Contributors and community for feedback and improvements

Star us on GitHub if you find MAI-UI helpful!

Star History Chart

About

MAI UI Navigation Agent

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages