-
Notifications
You must be signed in to change notification settings - Fork 57
Description
Summary
Add a WindowsApiToolsMixin class that provides native Windows API tools directly to GAIA agents, eliminating the need to route Windows system operations through an external MCP server subprocess. This mixin would follow the existing tool mixin pattern established by CLIToolsMixin (src/gaia/agents/code/tools/cli_tools.py) and integrate with the @tool decorator in src/gaia/agents/base/tools.py.
Motivation
Currently, Windows system operations (GUI automation, window management, system info, clipboard, etc.) require launching an external MCP server process (e.g., uvx windows-mcp) and communicating through MCPClientMixin (src/gaia/mcp/mixin.py). This works but introduces measurable overhead:
- Process startup cost: Spawning a subprocess for the MCP server
- IPC serialization: Every tool call crosses a process boundary via JSON-RPC over stdio
- Connection lifecycle: Initialize handshake, tool discovery, and teardown per session
A native mixin that calls Windows APIs directly from the agent process eliminates all of this. The tools register into _TOOL_REGISTRY the same way any other tool does, and the LLM can invoke them with zero IPC overhead.
The MCP server approach remains valuable for cross-machine scenarios and language-agnostic integrations, but for local single-machine usage the native mixin should be the faster default.
Proposed Location
src/gaia/agents/windows/
__init__.py
windows_api_tools.py # WindowsApiToolsMixin class
This mirrors the pattern where CLIToolsMixin lives in src/gaia/agents/code/tools/cli_tools.py.
Tools to Implement
The mixin should expose at minimum the following tool categories, registered via the @tool decorator from src/gaia/agents/base/tools.py:
Window Management
list_windows()- Enumerate visible windows (title, handle, position, size)focus_window(title_or_handle)- Bring a window to the foregroundmove_window(handle, x, y, width, height)- Reposition/resize a windowminimize_window(handle)/maximize_window(handle)/close_window(handle)
System Information
get_system_info()- OS version, hostname, CPU, RAM, GPU, NPU presenceget_running_processes()- Process list with PID, name, memory usageget_disk_usage()- Drive letters, total/used/free spaceget_display_info()- Monitor count, resolutions, DPI scalingget_battery_status()- Charge level, AC/battery, estimated time remaining
Clipboard Operations
get_clipboard()- Read current clipboard text contentset_clipboard(text)- Write text to clipboard
GUI Automation (basic)
screenshot(region=None)- Capture full screen or a region, return file pathclick(x, y)- Simulate mouse click at coordinatestype_text(text)- Simulate keyboard inputsend_keys(keys)- Send key combinations (e.g.,Win+D,Alt+Tab)
System Settings
get_dark_mode_status()- Check if dark mode is enabledset_dark_mode(enabled)- Toggle dark modeget_volume()/set_volume(level)- Audio volume control
Notifications
show_notification(title, message)- Display a Windows toast notification
Acceptance Criteria
-
WindowsApiToolsMixinclass created following the mixin pattern inCLIToolsMixin - All tools registered via
@tooldecorator into_TOOL_REGISTRY - Mixin is composable with the base
Agentclass:class MyAgent(Agent, WindowsApiToolsMixin) - Platform guard: tools gracefully degrade or raise clear errors on non-Windows platforms
- No dependency on an external MCP server process for any of the tools listed above
- Python dependencies are Windows-only extras (e.g.,
pywin32,pyautogui,pystray) declared inpyproject.tomlunder a[windows]extra - Unit tests with mocked Windows APIs (runnable on any platform)
- Integration tests gated behind a
@pytest.mark.windowsmarker - CUA integration: the Computer Use Agent (
docs/plans/cua.mdx) can useWindowsApiToolsMixinas a native backend instead of (or alongside) an MCP server - Documentation page added to
docs/guides/ordocs/sdk/ -
docs/docs.jsonupdated with the new documentation page
Technical Notes
Pattern to Follow
The CLIToolsMixin in src/gaia/agents/code/tools/cli_tools.py is the closest reference. Key patterns:
- Inherits via
super().__init__(*args, **kwargs)for MRO compatibility - Has a
register_*_tools()method that defines@tool-decorated inner functions - Uses
_ensure_*_initialized()for lazy initialization of internal state - Returns structured dicts with
status,success,error,datafields
CUA Relationship
The CUA plan (docs/plans/cua.mdx) currently assumes all desktop control goes through an external MCP server. WindowsApiToolsMixin offers a native alternative that the CUA agent could use:
class ComputerUseAgent(Agent, WindowsApiToolsMixin, MCPClientMixin):
"""Uses native tools when available, falls back to MCP server."""This lets the CUA agent prefer zero-overhead native calls for common operations while still supporting arbitrary MCP servers for extended capabilities.
Compatibility with MCPClientMixin
Both mixins register tools into the same _TOOL_REGISTRY. Care should be taken to:
- Namespace tools clearly (e.g.,
win_list_windowsvsmcp_windows_list_windows) - Allow both mixins on the same agent without name collisions
- Document which approach to prefer and when
Related Files
src/gaia/agents/base/tools.py-@tooldecorator and_TOOL_REGISTRYsrc/gaia/agents/base/agent.py- BaseAgentclasssrc/gaia/mcp/mixin.py-MCPClientMixin(current MCP-based approach)src/gaia/agents/code/tools/cli_tools.py-CLIToolsMixin(pattern to follow)docs/plans/cua.mdx- Computer Use Agent roadmap (primary consumer)
Open Questions
- Should the mixin auto-register all tools on
__init__, or require an explicitregister_windows_tools()call (likeCLIToolsMixin.register_cli_tools())? - Which Python libraries to standardize on? Candidates:
pywin32(low-level Win32 API),pyautogui(GUI automation),ctypes(no extra deps). - Should screenshot output be a file path, base64 string, or both?