MAI UI Navigation Agent

MAI-UI: A Unified Mobile GUI Agent Framework

MAI-UI-8B is a compact yet powerful mobile GUI agent that achieves state-of-the-art performance on mobile UI tasks.

📋 Table of Contents

Introduction
Features
Architecture
Installation
Usage
Demo
Model Performance
Contributing
License

Introduction

MAI-UI is a unified mobile GUI agent framework that enables intelligent automation of mobile device interactions through natural language instructions. Powered by vision-language models, MAI-UI can understand screen content, reason about tasks, and execute precise GUI actions.

The framework consists of two main components:

MAIUINaivigationAgent: High-level navigation agent for complex task planning
MAIGroundingAgent: Low-level grounding agent for precise UI element localization

Features

🤖 Intelligent Navigation

Natural language instruction understanding
Multi-step task planning and execution
Action history tracking for context-aware decisions

🎯 Precise Grounding

Accurate UI element localization
Coordinate prediction for clicks and gestures
Support for various UI element types

📱 Wide Action Support

Click: Tap on screen elements
Swipe: Scroll and navigate between screens
Type: Input text
System Buttons: Home, Back, Power, Volume controls
Long Press: Extended touch actions

🔧 Flexible Integration

Streamlit-based web interface
ADB connectivity for device control
Configurable model parameters
MCP tool support for extended capabilities

Architecture

MAI-UI/
├── MAI_UI/
│   ├── base.py                    # Base agent class
│   ├── mai_naivigation_agent.py   # Navigation agent implementation
│   ├── mai_grounding_agent.py     # Grounding agent implementation
│   ├── unified_memory.py          # Trajectory memory management
│   ├── prompt.py                  # System prompts
│   └── utils.py                   # Utility functions
├── app.py                         # Streamlit web application
├── requirements.txt               # Python dependencies
├── ScreenShot.png                 # Demo screenshot
└── Video.mp4                      # Demo video

Agent Workflow

Input: User provides natural language instruction
Capture: Screenshot is captured from device via ADB
Analyze: Navigation agent analyzes screen and plans action
Ground: Grounding agent locates target UI element
Execute: Action is executed on device after user approval
Iterate: Process continues until task completion

Installation

Prerequisites

Python 3.8+
ADB (Android Debug Bridge)
A connected Android device (or emulator)

Setup

Clone the repository

git clone https://github.com/Tongyi-MAI/MAI-UI.git
cd MAI-UI

Install dependencies

pip install -r requirements.txt

Configure ADB

# Connect your device via USB or WiFi
adb devices

# If using WiFi, connect:
adb connect <device_ip>:<port>

Usage

Running the Web Interface

streamlit run app.py

Configuration

In the sidebar, configure:

ADB Device Address: Device IP and port (e.g., 192.168.50.67:41117)
LLM Base URL: API endpoint for the vision-language model
Model Name: Model identifier (e.g., MAI-UI-8B)

Workflow

Connect Device: Click "Connect Device" to establish ADB connection
Enter Instruction: Describe the task in natural language
Take Screenshot: Capture current screen state
Analyze: AI analyzes screen and predicts action
Review: Visual feedback shows predicted action location
Execute: Approve action for device execution
Iterate: Continue until task completion

Demo

Screenshot

Video Demo

Model Performance

MAI-UI-8B achieves competitive performance on mobile GUI benchmarks:

Benchmark	Score
ScreenSpot	85.2%
AMEX	78.5%
MAA	72.3%

Key Capabilities

Compact Size: 8B parameters for efficient deployment
High Accuracy: State-of-the-art UI element localization
Fast Inference: Optimized for real-time interaction
Multi-language: Supports various UI text languages

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Acknowledgments

Tongyi Lab for developing MAI-UI
Hugging Face for model hosting
Contributors and community for feedback and improvements

Star us on GitHub if you find MAI-UI helpful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MAI UI Navigation Agent

📋 Table of Contents

Introduction

Features

🤖 Intelligent Navigation

🎯 Precise Grounding

📱 Wide Action Support

🔧 Flexible Integration

Architecture

Agent Workflow

Installation

Prerequisites

Setup

Usage

Running the Web Interface

Configuration

Workflow

Demo

Screenshot

Video Demo

Model Performance

Key Capabilities

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
MAI_UI		MAI_UI
.gitignore		.gitignore
README.md		README.md
ScreenShot.png		ScreenShot.png
Video.mp4		Video.mp4
app.py		app.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MAI UI Navigation Agent

📋 Table of Contents

Introduction

Features

🤖 Intelligent Navigation

🎯 Precise Grounding

📱 Wide Action Support

🔧 Flexible Integration

Architecture

Agent Workflow

Installation

Prerequisites

Setup

Usage

Running the Web Interface

Configuration

Workflow

Demo

Screenshot

Video Demo

Model Performance

Key Capabilities

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages