Computer Control MCP

MCP server that provides computer control capabilities using PyAutoGUI.

Expecting Issues

Most Models are too slow with MCP server + PyAutoGUI
Screenshot consumes a lot of tokens
It is diffcult for the model to click at the right position

Potential

By using this generic mcp server, model can take all the actions locally
The actuation layer is extremely thin

Quick Usage (MCP Setup)

Option 1: Using Local Development Version

{
  "mcpServers": {
    "computer-control-mcp": {
      "command": "python3",
      "args": ["-m", "computer_use_mcp"],
      "cwd": "/path/to/your/computer-control-mcp"
    }
  }
}

Option 2: Using Global Installation

First install globally:

pip3 install -e . --break-system-packages

Then use this configuration:

{
  "mcpServers": {
    "computer-control-mcp": {
      "command": "python3",
      "args": ["-m", "computer_use_mcp"]
    }
  }
}

Option 3: Using uvx (Remote Version)

{
  "mcpServers": {
    "computer-control-mcp": {
      "command": "uvx",
      "args": ["computer-control-mcp@latest"]
    }
  }
}

Features

Control mouse movements and clicks (left and right click)
Type text at the current cursor position
Take screenshots of the entire screen or specific windows with optional saving to downloads directory
Extract text from screenshots using OCR (Optical Character Recognition)
List and activate windows (adapted to newer pygetwindow API)
Press keyboard keys
Drag and drop operations
Image compression to handle large screenshots
Automatic cleanup of temporary files

Available Tools

Mouse Control

click_screen(x: int, y: int): Left click at specified screen coordinates
right_click(x: int, y: int): Right click at specified screen coordinates
move_mouse(x: int, y: int): Move mouse cursor to specified coordinates
drag_mouse(from_x: int, from_y: int, to_x: int, to_y: int, duration: float = 0.5): Drag mouse from one position to another

Keyboard Control

type_text(text: str): Type the specified text at current cursor position
press_key(key: str): Press a specified keyboard key

Screen and Window Management

take_screenshot(title_pattern: str = None, use_regex: bool = False, threshold: int = 60, with_ocr_text_and_coords: bool = False, scale_percent_for_ocr: int = 100, save_to_downloads: bool = False): Capture screen or window with optional OCR
get_screen_size(): Get current screen resolution
list_windows(): List all open windows with active window detection
activate_window(title_pattern: str, use_regex: bool = False, threshold: int = 60): Find and report matching windows

Recent Improvements

✅ API Compatibility

Fixed pygetwindow API issues - Updated to work with newer version that returns strings instead of window objects
Removed deprecated functions - Eliminated getAllWindows() and VisRes usage
Improved error handling - Better exception handling for API limitations

✅ New Features

Right-click functionality - Added right_click() function for right mouse button clicks
Image compression - Automatic resizing and compression to stay within Claude Desktop's 1MB limit
Temporary file cleanup - Automatic cleanup of test files and temporary screenshots

✅ Testing & Quality

Comprehensive test suite - 10/10 tests passing with full coverage
Proper test organization - Moved development files to tests directory
Performance optimization - Removed screenshots from window listing to reduce response size

Current Limitations

Due to changes in the pygetwindow API:

Window activation is limited (can detect but not fully control windows)
Window properties (position, size) are not available in newer API
Window manipulation is restricted to detection and reporting

Development

Setting up the Development Environment

# Clone the repository
git clone https://github.com/zhouyuf6741/computer-use-mcp.git
cd computer-control-mcp

# Install dependencies with dev tools
uv sync --extra dev

# Start server
python3 -m computer_use_mcp.core

Running Tests

# Run all tests
uv run pytest tests/ -v

# Run specific test categories
uv run pytest tests/test_basic.py -v
uv run pytest tests/test_screenshot.py -v
uv run pytest tests/test_window_functions.py -v
uv run pytest tests/test_ocr_standalone.py -v

Test Coverage

✅ Basic Functionality (4 tests)

Screen size detection
Tool version information
Mouse control functions
Keyboard control functions

✅ Screenshot & OCR (3 tests)

Basic screenshot functionality
Screenshot with download saving
OCR text extraction (89+ text elements detected)

✅ Window Management (3 tests)

Window listing (52+ windows detected)
Window activation detection
API compatibility verification

Troubleshooting

Common Issues

"No module named 'pyautogui'": Install the package globally with pip3 install -e . --break-system-packages
"module 'pygetwindow' has no attribute 'getAllWindows'": This is expected with newer versions. The code has been updated to handle this limitation.
"result exceeds maximum length": Screenshots are automatically compressed. If you still get this error, try using scale_percent_for_ocr=50 when taking screenshots.
Permission issues on macOS: You may need to grant accessibility permissions to your terminal/IDE in System Preferences > Security & Privacy > Privacy > Accessibility.

Performance Tips

Use scale_percent_for_ocr=50 for faster OCR processing
Avoid taking screenshots with OCR unless necessary
The list_windows() function no longer includes screenshots to reduce response size
Temporary files are automatically cleaned up after processing

API Reference

See the API Reference for detailed information about the available functions and classes.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
__main__.py		__main__.py
pyproject.toml		pyproject.toml
run_cli.py		run_cli.py
run_server.py		run_server.py
setup.py		setup.py
smithery.yaml		smithery.yaml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Computer Control MCP

MCP server that provides computer control capabilities using PyAutoGUI.

Expecting Issues

Potential

Quick Usage (MCP Setup)

Option 1: Using Local Development Version

Option 2: Using Global Installation

Option 3: Using uvx (Remote Version)

Features

Available Tools

Mouse Control

Keyboard Control

Screen and Window Management

Recent Improvements

✅ API Compatibility

✅ New Features

✅ Testing & Quality

Current Limitations

Development

Setting up the Development Environment

Running Tests

Test Coverage

Troubleshooting

Common Issues

Performance Tips

API Reference

License

About

Uh oh!

Releases

Packages

Languages

License

zhouyuf6741/computer-use-mcp

Folders and files

Latest commit

History

Repository files navigation

Computer Control MCP

MCP server that provides computer control capabilities using PyAutoGUI.

Expecting Issues

Potential

Quick Usage (MCP Setup)

Option 1: Using Local Development Version

Option 2: Using Global Installation

Option 3: Using uvx (Remote Version)

Features

Available Tools

Mouse Control

Keyboard Control

Screen and Window Management

Recent Improvements

✅ API Compatibility

✅ New Features

✅ Testing & Quality

Current Limitations

Development

Setting up the Development Environment

Running Tests

Test Coverage

Troubleshooting

Common Issues

Performance Tips

API Reference

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages