Add AI-driven UI automation to any UWP or WinUI 3 app. zRover exposes your app's screen and input as Model Context Protocol (MCP) tools, letting AI agents, test harnesses, and other MCP clients capture screenshots, inject touch/mouse/keyboard/pen/gamepad input, and validate coordinates — all over a standard HTTP endpoint.
- Prerequisites
- Choose Your Package
- UWP Integration
- WinUI 3 Integration
- Configuration
- Connecting MCP Clients
- Available Tools
- Coordinate Spaces
- Preview & Dry Run
- Requirements & Limitations
- Troubleshooting
- Windows 10 (version 1904 / build 19041) or later
- .NET 8 runtime installed on the target machine
- Developer Mode enabled: Settings → Privacy & Security → For Developers
- UWP apps: a UWP project targeting UAP 10.0.19041 or higher
- WinUI 3 apps: a WinUI 3 (Windows App SDK) project targeting .NET 8+ with
net8.0-windows10.0.19041.0or higher
zRover ships two NuGet packages — pick the one that matches your UI framework:
| Package | Framework | Architecture |
|---|---|---|
zRover.Uwp |
UWP (Universal Windows Platform) | Launches a FullTrust companion process for the MCP server. Requires manifest extensions and 3 touch-points in App.xaml.cs. |
zRover.WinUI |
WinUI 3 (Windows App SDK) | Runs the MCP server in-process — no companion process needed. Only 2 touch-points in App.xaml.cs. |
Debug builds only. zRover is a development and testing tool — it exposes your app's screen and input over a local HTTP endpoint. Never include zRover in release or production builds. Exclude it from your release configuration (see the package reference sections below).
By default the endpoint requires no authentication. You can enable Bearer token auth by passing
requireAuthToken: trueand anauthTokenvalue toDebugHost.Start()(or setting the equivalent properties onDebugHostOptions), and then configuring the same token in your MCP client.
dotnet add package zRover.Uwp --prerelease
Or add to your .csproj:
<PackageReference Include="zRover.Uwp" />Omit the Version attribute to always restore the latest available pre-release. If you need to pin a version, pass --version to the CLI or add a Version attribute.
The package includes:
- zRover.Uwp.dll — the UWP library you call from your app
- FullTrust companion server — a .NET 8 MCP HTTP server (included for both x64 and ARM64)
- MSBuild targets — automatically adds the FullTrust binaries to your AppX package and references the Desktop Extensions SDK
Add the zRover.Uwp NuGet package to your UWP app project for debug configurations only. The simplest way is to wrap the reference in a condition so it is excluded from release builds:
<PackageReference Include="zRover.Uwp" Condition="'$(Configuration)' == 'Debug'" />This ensures zRover's FullTrust binaries, manifest extensions, and capabilities are never shipped in a release package.
The package's MSBuild auto-import handles:
- Referencing the Windows Desktop Extensions SDK (needed for
FullTrustProcessLauncher) - Including the FullTrust companion server binaries in your AppX under
FullTrust\
Open your Package.appxmanifest and add the required namespaces, extensions, and capabilities.
Add XML namespaces to the root <Package> element:
<Package
xmlns="http://schemas.microsoft.com/appx/manifest/foundation/windows10"
xmlns:uap="http://schemas.microsoft.com/appx/manifest/uap/windows10"
xmlns:rescap="http://schemas.microsoft.com/appx/manifest/foundation/windows10/restrictedcapabilities"
xmlns:desktop="http://schemas.microsoft.com/appx/manifest/desktop/windows10"
IgnorableNamespaces="uap rescap desktop">Add extensions inside your <Application> element:
<Extensions>
<!-- In-process AppService for internal communication -->
<uap:Extension Category="windows.appService">
<uap:AppService Name="com.zrover.toolinvocation" />
</uap:Extension>
<!-- FullTrust MCP server packaged alongside your app -->
<desktop:Extension Category="windows.fullTrustProcess"
Executable="FullTrust\zRover.FullTrust.McpServer.exe">
<desktop:FullTrustProcess>
<desktop:ParameterGroup GroupId="McpServer" Parameters="" />
</desktop:FullTrustProcess>
</desktop:Extension>
</Extensions>Add capabilities inside <Capabilities>:
<Capabilities>
<Capability Name="internetClient" />
<Capability Name="internetClientServer" />
<Capability Name="privateNetworkClientServer" />
<rescap:Capability Name="runFullTrust" />
<rescap:Capability Name="inputInjectionBrokered" />
</Capabilities>Note: The
inputInjectionBrokeredrestricted capability enables input injection. Without it, input tools will fall back to XAML automation (which supports a subset of interactions). TherunFullTrustcapability is required to launch the companion MCP server.
zRover requires exactly three touch-points in your App.xaml.cs:
using Windows.ApplicationModel;
using Windows.ApplicationModel.Activation;
using Windows.ApplicationModel.Background;
using Windows.UI.Xaml;
using Windows.UI.Xaml.Controls;
sealed partial class App : Application
{
public App()
{
this.InitializeComponent();
this.Suspending += OnSuspending;
}
protected override async void OnLaunched(LaunchActivatedEventArgs e)
{
Frame rootFrame = Window.Current.Content as Frame;
if (rootFrame == null)
{
rootFrame = new Frame();
Window.Current.Content = rootFrame;
}
if (e.PrelaunchActivated == false)
{
if (rootFrame.Content == null)
rootFrame.Navigate(typeof(MainPage), e.Arguments);
Window.Current.Activate();
}
// --- zRover: Start the MCP server ---
await zRover.Uwp.RoverMcp.StartAsync("MyApp", port: 5100,
() => FullTrustProcessLauncher
.LaunchFullTrustProcessForCurrentAppAsync("McpServer")
.AsTask());
}
// --- zRover: Route AppService requests ---
protected override void OnBackgroundActivated(BackgroundActivatedEventArgs args)
{
if (zRover.Uwp.RoverMcp.HandleBackgroundActivation(args)) return;
base.OnBackgroundActivated(args);
}
private void OnSuspending(object sender, SuspendingEventArgs e)
{
var deferral = e.SuspendingOperation.GetDeferral();
// --- zRover: Clean shutdown ---
zRover.Uwp.RoverMcp.Stop();
deferral.Complete();
}
}That's it. Build and run your app — the MCP server will start listening on http://localhost:5100/mcp.
WinUI 3 (Windows App SDK) apps run as full-trust desktop processes, so zRover runs the MCP server in-process — no FullTrust companion or special manifest extensions are needed.
dotnet add package zRover.WinUI --prerelease
Or add to your .csproj:
<PackageReference Include="zRover.WinUI" />The package includes:
- zRover.WinUI.dll — the WinUI 3 library with in-process MCP server
- zRover.Core.dll and zRover.Mcp.dll — bundled automatically
Wrap the reference in a condition so it is excluded from release builds:
<PackageReference Include="zRover.WinUI" Condition="'$(Configuration)' == 'Debug'" />No manifest changes or additional capabilities are required — WinUI 3 apps already have full-trust access.
zRover requires only two touch-points in your App.xaml.cs:
using Microsoft.UI.Xaml;
sealed partial class App : Application
{
private Window? m_window;
public App()
{
this.InitializeComponent();
}
protected override async void OnLaunched(LaunchActivatedEventArgs args)
{
m_window = new MainWindow();
m_window.Activate();
#if DEBUG
// --- zRover: Start the in-process MCP server ---
await zRover.WinUI.RoverMcp.StartAsync(m_window, "MyApp");
// --- zRover: Clean shutdown on window close ---
m_window.Closed += async (s, e) => await zRover.WinUI.RoverMcp.StopAsync();
#endif
}
}That's it — no OnBackgroundActivated override, no Package.appxmanifest changes. Build and run your app — the MCP server will start listening on http://localhost:5100/mcp.
zRover.Uwp.RoverMcp.StartAsync accepts the following parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
appName |
string |
(required) | Display name shown to MCP clients |
port |
int |
5100 |
TCP port the MCP server listens on. Pass 0 to let the OS pick a free port. |
launchFullTrust |
Func<Task>? |
null |
Callback to launch the FullTrust companion. Pass the FullTrustProcessLauncher call as shown above. If null, no companion server starts (tools register but aren't reachable externally). |
actionableApp |
IActionableApp? |
null |
Optional implementation of zRover.Core.IActionableApp to expose app-defined actions via the list_actions and dispatch_action MCP tools. See App Action Tools. |
managerUrl |
string? |
null |
URL of the zRover Retriever for session registration (e.g. http://localhost:5200). |
zRover.WinUI.RoverMcp.StartAsync accepts the following parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
window |
Window |
(required) | The main application window. Required for UI thread access and screenshots. |
appName |
string |
(required) | Display name shown to MCP clients |
port |
int |
5100 |
TCP port the MCP server listens on. Pass 0 to let the OS pick a free port. |
actionableApp |
IActionableApp? |
null |
Optional implementation of zRover.Core.IActionableApp to expose app-defined actions via the list_actions and dispatch_action MCP tools. See App Action Tools. |
managerUrl |
string? |
null |
URL of the zRover Retriever for session registration (e.g. http://localhost:5200). |
Once your app is running, any MCP client can connect to http://localhost:<port>/mcp. The server speaks Streamable HTTP MCP.
Add to .vscode/mcp.json in your workspace:
{
"servers": {
"rover": {
"type": "http",
"url": "http://localhost:5100/mcp"
}
}
}Or via CLI:
code --add-mcp '{"name":"zrover","url":"http://localhost:5100/mcp"}'
Add to claude_desktop_config.json:
{
"mcpServers": {
"rover": {
"url": "http://localhost:5100/mcp"
}
}
}claude mcp add rover --transport http --url http://localhost:5100/mcp
Go to Cursor Settings → MCP → New MCP Server, then add:
{
"mcpServers": {
"rover": {
"url": "http://localhost:5100/mcp"
}
}
}Add to your MCP config:
{
"mcpServers": {
"rover": {
"serverUrl": "http://localhost:5100/mcp"
}
}
}using ModelContextProtocol.Client;
using ModelContextProtocol.Protocol;
var transport = new HttpClientTransport(new HttpClientTransportOptions
{
Endpoint = new Uri("http://localhost:5100/mcp")
});
var client = await McpClient.CreateAsync(transport);
var tools = await client.ListToolsAsync();zRover registers 22 tools organized across screenshot capture, touch/mouse, keyboard, pen, gamepad input, app-defined action dispatch, logging, UI tree inspection, window management, and condition polling. The Retriever additionally exposes 10 package management tools (see Retriever: Package Management Tools).
Captures the entire app window as a PNG screenshot.
| Parameter | Type | Default | Description |
|---|---|---|---|
format |
string | "png" |
Image format |
maxWidth |
integer | 1280 |
Maximum image width in pixels (scaled proportionally) |
maxHeight |
integer | 1280 |
Maximum image height in pixels |
Returns: { success, filePath, bitmapWidth, bitmapHeight, windowWidth, windowHeight } — bitmapWidth/bitmapHeight are the PNG dimensions; windowWidth/windowHeight are the render-pixel size of the window (use for the pixels coordinate space). Convert to normalized: normalizedX = bitmapX / bitmapWidth.
Captures the app window with a crosshair overlay drawn at the specified coordinates. Use this to visually confirm a target position before injecting input.
| Parameter | Type | Default | Description |
|---|---|---|---|
x |
number | (required) | X in normalized space (0.0–1.0) |
y |
number | (required) | Y in normalized space (0.0–1.0) |
maxWidth |
integer | 1280 |
Maximum screenshot width |
maxHeight |
integer | 1280 |
Maximum screenshot height |
Captures a cropped rectangular region of the app window.
| Parameter | Type | Default | Description |
|---|---|---|---|
x |
number | (required) | Left edge (normalized 0.0–1.0) |
y |
number | (required) | Top edge (normalized 0.0–1.0) |
width |
number | (required) | Region width (normalized) |
height |
number | (required) | Region height (normalized) |
maxWidth |
integer | 1280 |
Maximum output width |
maxHeight |
integer | 1280 |
Maximum output height |
Taps at the specified coordinates. Returns an annotated screenshot showing the tap location before injection for verification.
| Parameter | Type | Default | Description |
|---|---|---|---|
x |
number | (required) | X coordinate |
y |
number | (required) | Y coordinate |
coordinateSpace |
string | "normalized" |
"normalized" (0.0–1.0 relative to window) or "pixels" (render pixels matching windowWidth/windowHeight from capture_current_view) |
device |
string | "touch" |
"touch" or "mouse" |
dryRun |
boolean | false |
If true, returns the preview screenshot without injecting |
Drags along a path of points. Returns an annotated screenshot with the drag path visualized (green start → cyan path → red end).
| Parameter | Type | Default | Description |
|---|---|---|---|
points |
array | (required) | Array of { x, y } objects (minimum 2) |
durationMs |
integer | 300 |
Total drag duration in milliseconds |
coordinateSpace |
string | "normalized" |
Coordinate space |
device |
string | "touch" |
"touch" or "mouse" |
dryRun |
boolean | false |
Preview only |
Moves the mouse cursor to the specified coordinates without clicking.
| Parameter | Type | Default | Description |
|---|---|---|---|
x |
number | (required) | Target X |
y |
number | (required) | Target Y |
coordinateSpace |
string | "normalized" |
Coordinate space |
dryRun |
boolean | false |
Preview only |
Scrolls the mouse wheel at the specified position. One notch = 120 units.
| Parameter | Type | Default | Description |
|---|---|---|---|
x |
number | (required) | Scroll position X |
y |
number | (required) | Scroll position Y |
coordinateSpace |
string | "normalized" |
Coordinate space |
deltaY |
integer | -120 |
Vertical scroll (negative = down, positive = up) |
deltaX |
integer | 0 |
Horizontal scroll (negative = left, positive = right) |
dryRun |
boolean | false |
Preview only |
Injects a multi-touch gesture with multiple simultaneous contact points.
| Parameter | Type | Default | Description |
|---|---|---|---|
pointers |
array | (required) | Array of pointer objects (see below) |
durationMs |
integer | 400 |
Total gesture duration |
coordinateSpace |
string | "normalized" |
Coordinate space |
dryRun |
boolean | false |
Preview only |
Each pointer object:
| Field | Type | Description |
|---|---|---|
id |
integer | Unique pointer ID (1-based) |
path |
array | Array of { x, y } waypoints |
pressure |
number | Contact pressure 0.0–1.0 (default 1.0) |
orientation |
integer | Contact angle 0–359 degrees (default 0) |
contactWidth |
integer | Contact patch width (default 4) |
contactHeight |
integer | Contact patch height (default 4) |
Injects a two-finger pinch gesture. endDistance < startDistance = pinch in (zoom out); endDistance > startDistance = pinch out (zoom in).
| Parameter | Type | Default | Description |
|---|---|---|---|
centerX |
number | (required) | Center X of pinch |
centerY |
number | (required) | Center Y of pinch |
startDistance |
number | 0.3 |
Starting finger distance (normalized) |
endDistance |
number | 0.1 |
Ending finger distance |
angle |
number | 0 |
Pinch axis angle in degrees |
durationMs |
integer | 400 |
Duration |
coordinateSpace |
string | "normalized" |
Coordinate space |
dryRun |
boolean | false |
Preview only |
Injects a two-finger rotation gesture. Positive endAngle = clockwise.
| Parameter | Type | Default | Description |
|---|---|---|---|
centerX |
number | (required) | Center X |
centerY |
number | (required) | Center Y |
distance |
number | 0.2 |
Finger distance from center (normalized) |
startAngle |
number | 0 |
Starting angle in degrees |
endAngle |
number | 90 |
Ending angle in degrees |
durationMs |
integer | 400 |
Duration |
coordinateSpace |
string | "normalized" |
Coordinate space |
dryRun |
boolean | false |
Preview only |
Presses a keyboard key with optional modifiers.
| Parameter | Type | Default | Description |
|---|---|---|---|
key |
string | (required) | Virtual key name (e.g. "Enter", "Tab", "A", "Escape", "Left", "F5", "Back") |
modifiers |
array | [] |
Modifier keys: "Control", "Shift", "Menu" (Alt), "Windows" |
holdDurationMs |
integer | 0 |
How long to hold the key |
Types a string by injecting individual key presses per character.
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
string | (required) | Text to type |
delayBetweenKeysMs |
integer | 30 |
Delay between keystrokes |
Taps with a pen stylus. Supports pressure, tilt, rotation, barrel button, and eraser mode.
| Parameter | Type | Default | Description |
|---|---|---|---|
x |
number | (required) | X coordinate |
y |
number | (required) | Y coordinate |
coordinateSpace |
string | "normalized" |
Coordinate space |
pressure |
number | 0.5 |
Pen pressure 0.0–1.0 |
tiltX |
integer | 0 |
X tilt (-90 to 90 degrees) |
tiltY |
integer | 0 |
Y tilt (-90 to 90 degrees) |
rotation |
number | 0.0 |
Pen rotation 0.0–359.0 degrees |
barrel |
boolean | false |
Barrel button pressed |
eraser |
boolean | false |
Eraser mode |
hover |
boolean | false |
Hover without contact |
dryRun |
boolean | false |
Preview only |
Draws a pen stroke along a series of points. Each point can override pressure, tilt, and rotation.
| Parameter | Type | Default | Description |
|---|---|---|---|
points |
array | (required) | Array of point objects (minimum 2). Each: { x, y, pressure?, tiltX?, tiltY?, rotation? } |
coordinateSpace |
string | "normalized" |
Coordinate space |
pressure |
number | 0.5 |
Default pressure for all points |
tiltX |
integer | 0 |
Default X tilt |
tiltY |
integer | 0 |
Default Y tilt |
rotation |
number | 0.0 |
Default rotation |
barrel |
boolean | false |
Barrel button |
eraser |
boolean | false |
Eraser mode |
durationMs |
integer | 400 |
Total stroke duration |
dryRun |
boolean | false |
Preview only |
Injects a single gamepad input state. The state is held for the specified duration, then released.
| Parameter | Type | Default | Description |
|---|---|---|---|
buttons |
array | [] |
Button names: "A", "B", "X", "Y", "LeftThumbstick", "RightThumbstick", "LeftShoulder", "RightShoulder", "View", "Menu", "DPadUp", "DPadDown", "DPadLeft", "DPadRight", "Paddle1"–"Paddle4" |
leftStickX |
number | 0.0 |
Left thumbstick X (-1.0 to 1.0) |
leftStickY |
number | 0.0 |
Left thumbstick Y (-1.0 to 1.0) |
rightStickX |
number | 0.0 |
Right thumbstick X |
rightStickY |
number | 0.0 |
Right thumbstick Y |
leftTrigger |
number | 0.0 |
Left trigger (0.0–1.0) |
rightTrigger |
number | 0.0 |
Right trigger (0.0–1.0) |
holdDurationMs |
integer | 100 |
How long to hold the state |
Injects a series of gamepad frames in order. After the last frame, all inputs are automatically released.
| Parameter | Type | Default | Description |
|---|---|---|---|
frames |
array | (required) | Array of frame objects. Each frame accepts the same fields as inject_gamepad_input plus durationMs (default 100) per frame. |
App Action tools expose the host application's own discrete operations as MCP tools. Your app opts in by implementing zRover.Core.IActionableApp and passing the instance to RoverMcp.StartAsync. The two tools are always registered as a pair; if no IActionableApp is provided, neither tool appears in tools/list.
Returns the set of actions the app currently supports. The list is dynamic — the app may return different actions depending on state (e.g. which objects are selected).
Parameters: none
Returns:
{
"actions": [
{
"name": "SetPresetColor",
"description": "Sets the RGB sliders to a named color preset.",
"parameterSchema": {
"type": "object",
"properties": {
"color": { "type": "string", "enum": ["Red", "Green", "Blue", "Yellow", "White"] }
},
"required": ["color"]
}
}
]
}Dispatches a named action and returns whether it succeeded.
| Parameter | Type | Default | Description |
|---|---|---|---|
action |
string | (required) | Name of the action to dispatch (must match a name from list_actions) |
params |
object | {} |
Parameters validated against the action's parameterSchema |
Returns on success:
{ "success": true, "consequences": ["UpdateColorPreview"] }Returns on failure:
{ "success": false, "error": { "code": "unknown_action", "message": "No action named 'Foo'" } }Error codes: unknown_action, validation_error, not_available, execution_error.
Add the interface to any class accessible from App.xaml.cs, typically your main page:
using System.Collections.Generic;
using System.Threading.Tasks;
using Newtonsoft.Json.Linq;
using zRover.Core;
public sealed partial class MainPage : Page, IActionableApp
{
private static readonly IReadOnlyList<ActionDescriptor> _actions = new[]
{
new ActionDescriptor(
name: "SetPresetColor",
description: "Sets the RGB sliders to a named color preset.",
parameterSchema: @"{
""type"": ""object"",
""properties"": {
""color"": { ""type"": ""string"", ""enum"": [""Red"", ""Green"", ""Blue""] }
},
""required"": [""color""]
}"),
};
public IReadOnlyList<ActionDescriptor> GetAvailableActions() => _actions;
public async Task<ActionResult> DispatchAsync(string actionName, string parametersJson)
{
switch (actionName)
{
case "SetPresetColor":
return await DispatchSetPresetColorAsync(parametersJson);
default:
return ActionResult.Fail("unknown_action", $"No action named '{actionName}'");
}
}
private async Task<ActionResult> DispatchSetPresetColorAsync(string parametersJson)
{
JObject p;
try { p = JObject.Parse(parametersJson); }
catch { return ActionResult.Fail("validation_error", "params is not valid JSON"); }
var color = p["color"]?.Value<string>();
if (string.IsNullOrEmpty(color))
return ActionResult.Fail("validation_error", "Missing required parameter: color");
// Marshal UI work to the dispatcher thread.
await Dispatcher.RunAsync(Windows.UI.Core.CoreDispatcherPriority.Normal, () =>
{
switch (color)
{
case "Red": RedSlider.Value = 255; GreenSlider.Value = 0; BlueSlider.Value = 0; break;
case "Green": RedSlider.Value = 0; GreenSlider.Value = 255; BlueSlider.Value = 0; break;
case "Blue": RedSlider.Value = 0; GreenSlider.Value = 0; BlueSlider.Value = 255; break;
}
});
return ActionResult.Ok(new[] { "UpdateColorPreview" });
}
}Pass the instance to RoverMcp.StartAsync in App.xaml.cs:
UWP:
protected override async void OnLaunched(LaunchActivatedEventArgs e)
{
// ... navigation setup, Window.Current.Activate() ...
var actionableApp = rootFrame.Content as zRover.Core.IActionableApp;
await zRover.Uwp.RoverMcp.StartAsync("MyApp", port: 5100,
launchFullTrust: () => FullTrustProcessLauncher
.LaunchFullTrustProcessForCurrentAppAsync("McpServer").AsTask(),
actionableApp: actionableApp);
}WinUI 3:
protected override async void OnLaunched(LaunchActivatedEventArgs args)
{
m_window = new MainWindow();
m_window.Activate();
#if DEBUG
var mainPage = (m_window as MainWindow)?.MainPage;
var actionableApp = mainPage as zRover.Core.IActionableApp;
await zRover.WinUI.RoverMcp.StartAsync(m_window, "MyApp",
actionableApp: actionableApp);
m_window.Closed += async (s, e) => await zRover.WinUI.RoverMcp.StopAsync();
#endif
}Thread safety:
GetAvailableActions()is called from a background thread.DispatchAsyncalso receives the call on a background thread — marshal any UI work to the dispatcher. In UWP useDispatcher.RunAsync; in WinUI 3 useDispatcherQueue.TryEnqueue.
Returns recent entries from the zRover in-memory log store. Useful for understanding app state, diagnosing failures, and reading lifecycle and exception events logged by the host app.
| Parameter | Type | Default | Description |
|---|---|---|---|
minimumLevel |
string | "info" |
Minimum level: "trace", "debug", "info", "warning", "error", "fatal" |
maxEntries |
integer | 100 |
Maximum number of entries to return (newest first) |
category |
string | (none) | Filter to entries whose category starts with this string |
clear |
boolean | false |
If true, clears the buffer after returning the snapshot |
Walks the XAML visual tree and returns a structured representation of every UI element. Useful for discovering element names, automation IDs, text content, and positions without taking a screenshot.
| Parameter | Type | Default | Description |
|---|---|---|---|
maxDepth |
integer | 32 |
Maximum depth to traverse into the visual tree |
visibleOnly |
boolean | false |
If true, excludes elements whose Visibility is Collapsed |
Returns: A tree of nodes. Each node has:
type— XAML class name (e.g."Button","TextBlock")name—x:Namevalue, if setautomationName—AutomationProperties.Namevalue, if settext— text content extracted fromTextBlock,TextBox, orContentControlbounds—{ x, y, width, height }in normalized coordinates (0.0–1.0)isVisible— whether the element is in the visible stateisEnabled— whether the element is enabledchildren— nested child nodes
Resizes the app window to the specified dimensions.
| Parameter | Type | Default | Description |
|---|---|---|---|
width |
number | (required) | Desired window width in DIPs |
height |
number | (required) | Desired window height in DIPs |
Returns: { success, actualWidth, actualHeight } — the actual bounds after resize (the OS may constrain the requested size).
Blocks until a condition is met or the timeout expires. Use this to synchronize with animations, loading states, or async events before proceeding with other tools.
| Parameter | Type | Default | Description |
|---|---|---|---|
condition |
string | (required) | "visual_stable" or "log_match" |
timeoutMs |
integer | 5000 |
Maximum time to wait in milliseconds |
stabilityMs |
integer | 400 |
(visual_stable only) How long the UI must remain unchanged to count as stable |
intervalMs |
integer | 150 |
Polling interval in milliseconds |
pattern |
string | (none) | (log_match only) Regex pattern to match against log messages |
Conditions:
visual_stable— polls screenshot hashes until the image is unchanged forstabilityMs. Good for waiting after animations, page transitions, or loading indicators.log_match— polls the in-memory log store until an entry matchingpatternappears. Good for waiting for a specific event to be logged by the host app.
Returns: { success, condition, elapsedMs, reason } — reason explains why the wait ended (e.g. "stable", "matched", "timeout").
The Retriever exposes 10 additional MCP tools for MSIX package management. These tools are available on the Retriever's MCP endpoint (port 5200 by default) and support multi-device routing via the deviceId parameter.
For full documentation of the upload flow, hop-by-hop forwarding, and security model, see the MSIX Package Management section in the Multi-Machine Federation guide.
Lists all devices reachable from this Retriever instance (local device plus all federated remote managers).
Parameters: none
Returns: Array of { deviceId, displayName, hops, isLocal }.
Lists MSIX packages installed on a device.
| Parameter | Type | Default | Description |
|---|---|---|---|
deviceId |
string | "local" |
Target device. |
nameFilter |
string | (none) | Filter by package family name or display name substring. |
includeFrameworks |
boolean | false |
Include framework packages. |
includeSystemPackages |
boolean | false |
Include resource/system packages. |
Returns: Array of PackageInfo objects with family name, full name, display name, version, publisher, install date, architecture, signing status, running state, and app entries.
Returns full metadata for a single package including dependencies, capabilities, and health status flags.
| Parameter | Type | Default | Description |
|---|---|---|---|
deviceId |
string | "local" |
Target device. |
packageFamilyName |
string | (required) | Package family name to look up. |
Installs an MSIX package from a URI or a staged upload.
| Parameter | Type | Default | Description |
|---|---|---|---|
deviceId |
string | "local" |
Target device. |
packageUri |
string | (required) | URI of the package to install. Schemes: https://, file://, ms-appinstaller://, staged://<stagingId>. |
dependencyUris |
array | [] |
Dependency package URIs to install alongside. |
forceAppShutdown |
boolean | false |
Force-close the app if it is currently running. |
allowUnsigned |
boolean | false |
Allow packages without a trusted signature (Developer Mode required). |
installForAllUsers |
boolean | false |
Install as a full (non-stub) package for all users. |
deferRegistration |
boolean | false |
Defer registration until the package is no longer in use. |
Returns: { success, packageFamilyName, packageFullName, version, wasUpdate, isDeferred } on success.
Removes an installed MSIX package.
| Parameter | Type | Default | Description |
|---|---|---|---|
deviceId |
string | "local" |
Target device. |
packageFamilyName |
string | (required) | Package to remove. |
removeForAllUsers |
boolean | false |
Remove for all users (requires elevated privilege). |
preserveAppData |
boolean | false |
Keep the app's local data after removal. |
Launches an app entry within an installed package.
| Parameter | Type | Default | Description |
|---|---|---|---|
deviceId |
string | "local" |
Target device. |
packageFamilyName |
string | (required) | Package containing the app. |
appId |
string | (none) | Specific app entry ID. If omitted, the first entry is launched. |
arguments |
string | "" |
Command-line arguments to pass to the app. |
Returns: { success, pid, aumid } on success.
Stops all running instances of a package.
| Parameter | Type | Default | Description |
|---|---|---|---|
deviceId |
string | "local" |
Target device. |
packageFamilyName |
string | (required) | Package to stop. |
force |
boolean | false |
Kill immediately without sending WM_CLOSE first. |
Returns: { success, stoppedCount }.
Negotiates an upload slot for sending an MSIX file to a device. In multi-hop federation, the upload flows through each manager in the chain.
| Parameter | Type | Default | Description |
|---|---|---|---|
deviceId |
string | "local" |
Target device. |
fileName |
string | (required) | Original file name (e.g. MyApp_1.0.0.0_x64.msix). |
fileSizeBytes |
integer | (required) | Exact file size in bytes. |
sha256 |
string | (required) | Hex-encoded SHA-256 hash of the file (verified at every hop). |
Returns: { stagingId, uploadUrl, expiresAt }. POST the raw binary of the MSIX to uploadUrl with Content-Type: application/octet-stream.
Checks the status of a pending or completed staged upload.
| Parameter | Type | Default | Description |
|---|---|---|---|
stagingId |
string | (required) | Staging ID returned by request_package_upload. |
Returns: { stagingId, status, fileName, fileSizeBytes, createdAt }. Status values: Pending, Ready, Forwarding, Expired.
Cancels a staged upload and deletes any uploaded data.
| Parameter | Type | Default | Description |
|---|---|---|---|
stagingId |
string | (required) | Staging ID to discard. |
Most input tools accept a coordinateSpace parameter:
| Value | Description |
|---|---|
"normalized" |
Default. Values from 0.0 (top-left) to 1.0 (bottom-right), relative to the app window. Recommended for resolution-independent automation. |
"pixels" |
Render-pixel coordinates matching windowWidth/windowHeight from capture_current_view. Use when you have pixel positions from a screenshot or UI tree bounds. |
Converting screenshot pixels to normalized coordinates:
normalizedX = pixelX / screenshotWidth
normalizedY = pixelY / screenshotHeight
The capture_current_view response includes width and height for this conversion.
The inject_tap and inject_drag_path tools return an annotated preview screenshot showing exactly where the input will land before injection occurs. This lets an AI agent visually verify its target.
Set dryRun: true on any input tool to get the preview without actually injecting input — useful for coordinate validation workflows.
- Developer Mode must be enabled on the machine for input injection to work.
- Input injection requires the
inputInjectionBrokeredrestricted capability (UWP) or Developer Mode (WinUI 3). Without it, touch, pen, gamepad, and multi-touch injection will not work. - Architecture must match the host machine. The
InputInjectorCOM component is native-only — it does not work under emulation. On ARM64 machines, you must set your Solution Platform to ARM64 in Configuration Manager. Both AnyCPU and x64 produce packages that Windows runs under x64 emulation on ARM64, causingInputInjector.TryCreate()to fail withHRESULT 0x800700C1("is not a valid Win32 application"). When InputInjector is unavailable, tap and drag tools fall back to Win32SendInput(mouse-only, no touch/pen/gamepad). See Troubleshooting for details. - UWP: The MCP server runs as a FullTrust companion process alongside your UWP app. Both processes must be running for tools to work.
- WinUI 3: The MCP server runs in-process — no companion process is needed.
- The server listens on localhost only — remote connections require your own tunneling solution.
- .NET 8 runtime must be installed on the target machine (the UWP companion is published as framework-dependent; WinUI 3 apps bundle it based on your publish settings).
- Screenshot tools use Windows.Graphics.Capture as the primary capture path, which faithfully captures everything composited into the app window — including XAML,
SwapChainPanel, Win2DCanvasControl,MediaPlayerElement, composition swap-chain brushes, andWebView2. WGC requires Windows 10 1903+ (build 18362). If WGC is unavailable, screenshots fall back toRenderTargetBitmap, which can only capture content inside the XAML visual tree (so DirectX swap-chains, Win2D content, video, and WebView2 will appear black). - UWP: Declare the
graphicsCapturecapability inPackage.appxmanifestunder theuap6namespace so the WGC path is allowed:Without it, the WGC path will throw at runtime and screenshots will silently fall back to the XAML-only<Package xmlns:uap6="http://schemas.microsoft.com/appx/manifest/uap/windows10/6" IgnorableNamespaces="uap uap6 …"> <Capabilities> <uap6:Capability Name="graphicsCapture" /> </Capabilities> </Package>
RenderTargetBitmap. - WinUI 3: Packaged WinUI 3 apps should also declare
<uap6:Capability Name="graphicsCapture" />. Unpackaged WinUI 3 apps are not capability-gated and need no manifest change.
MCP client can't connect
- Verify the app is running and the port is not in use by another process:
Test-NetConnection -ComputerName localhost -Port 5100 - Check that Developer Mode is enabled.
- The FullTrust companion needs a moment to start after launch. Wait a few seconds after the app window appears.
Tools list is empty
- UWP: Ensure
RoverMcp.StartAsyncis called afterWindow.Current.Activate()inOnLaunched. VerifyOnBackgroundActivatedcallsRoverMcp.HandleBackgroundActivation(args). - WinUI 3: Ensure
RoverMcp.StartAsyncis called afterm_window.Activate()inOnLaunched.
Input injection has no effect
- UWP: Confirm
inputInjectionBrokeredis declared in your manifest. - Ensure Developer Mode is enabled (required by
InputInjector). - The app window must be in the foreground — the server automatically brings it forward before injection.
InputInjector fails on ARM64
- On ARM64 machines, the
InputInjectorCOM interface is native ARM64 only. If your app is built as x64 or AnyCPU, Windows runs the UWP process under x64 emulation, andInputInjector.TryCreate()fails. - Fix: In Visual Studio Configuration Manager, set your Solution Platform to ARM64 on ARM64 hardware.
- Check the diagnostic log at startup: the MCP server logs
InputInjectionCapability: InputInjector not availablewith the specific error when this occurs.
App crashes on launch
- Check the crash log at
%LOCALAPPDATA%\Packages\<PackageFamilyName>\LocalState\crash.log. - Ensure you've added all required manifest capabilities and extensions.
Port conflict
- Pass a different port:
RoverMcp.StartAsync("MyApp", port: 5200, ...)— then update your MCP client URL accordingly.