Turn GUI demonstrations into reusable and increasingly portable agent skills.
简体中文 · Releases · Demos · Quick Start · Skill Integration
AppAgent-Claw is a macOS GUI skill framework for turning a demonstrated foreground desktop workflow into a reusable, inspectable, and progressively portable automation artifact.
It is intentionally narrow. AppAgent-Claw is not an open-domain computer-use agent. It is a practical bridge between agent platforms and GUI-only software: record a workflow once, annotate it with lightweight semantics, and replay it with layered target resolution, retries, validation, and structured diagnostics.
- Record a foreground GUI workflow from demonstration.
- Annotate the saved flow with lightweight semantics and parameterized text slots.
- Replay it with layered matching, retries, validation, and structured diagnostics.
Modern agent platforms such as OpenClaw are powerful because they can call lightweight, reusable skills. But a large share of real software functionality is still exposed primarily through GUI interactions rather than stable APIs or CLIs. That leaves a gap: the agent can reason, but it still cannot reliably operate the software interface where the task actually lives.
AppAgent-Claw is designed to close that gap with a pragmatic engineering choice. Instead of aiming for unrestricted computer use, it focuses on fixed foreground workflows that can be taught once and reused many times. Inspired by AppAgentX and grounded in classic RPA ideas, it keeps the determinism and inspectability of workflow recording while adding enough semantic structure for agent-driven reuse.
| AppAgent-Claw is | AppAgent-Claw is not |
|---|---|
| A workflow reuse layer for familiar working environments | A cross-machine portability solution |
| Foreground desktop automation with explicit replay artifacts | Background window automation |
| Annotation-assisted replay | OCR-heavy desktop understanding |
| Agent-callable workflow packaging | Self-healing script synthesis |
The current version starts from stable replay in familiar working setups and uses that foundation to move toward richer semantic understanding and more portable workflow reuse.
Inline demo videos are embedded below for direct playback on GitHub.
Netease Music · Recordingnetease-recording.mp4 |
Netease Music · Replaynetease-replay.mp4 |
This demo shows the two core loops of the project: teaching a fixed GUI procedure by demonstration, and replaying it later with window preparation, matching, fallback, and run diagnostics.
- Netease Music shows stable record-and-replay for a fixed media interaction flow under a similar UI state.
- Record: capture a user demonstration from the frontmost macOS app and save replay-critical assets such as
anchor.png,context.png,search_region, action metadata, and window context. - Annotate: enrich the saved
flow.jsonwithflow.description, per-step descriptions, and safe replay-time text slots for selectedtype_textactions. - Replay: restore the app window when possible, then resolve click-like targets through a three-stage strategy:
- local anchor matching near the recorded region
- broader context matching on the recorded monitor
- relative-coordinate fallback when visual matching fails
- Validate and diagnose: apply retries, post-action checks, and structured
run.jsonoutput so failures are inspectable instead of opaque.
- More practical than open-world perception alone for repeated workflows on the same machine.
- More semantic than a raw macro recorder because the saved flow includes descriptions and parameterized text metadata.
- Easy to integrate with agent systems because the final artifact behaves like a lightweight reusable skill rather than a one-off recording.
- macOS
- Python
3.13+ swiftonPATH- macOS Accessibility permission for the terminal or Python interpreter you use
- macOS Screen Recording permission for the same process
uv sync
source .venv/bin/activateIf swift is missing, install Xcode Command Line Tools or full Xcode first.
python scripts/record.py start --name demo-flowThe recorder opens a small overlay. Start from the overlay, perform the workflow in the target app, then stop with Esc.
python scripts/replay.py run "demo-flow" --debugReplay targets can be:
- a recording directory
- a direct
flow.jsonpath - a saved flow name
When replaying by name, the latest exact normalized match is used.
A new recording is saved under data/recordings/, and each replay run writes logs and debug artifacts under data/runs/.
python -m py_compile scripts/*.py tests/*.py
pytest -qRe-run annotation for an existing recording:
python scripts/record.py annotate "demo-flow"Replay a flow with parameterized text input:
python scripts/replay.py run "demo-flow" --inputs-json '{"input_message_body_01":"Tonight this track is so good"}' --debugOnly type_text steps marked with text_policy.mode = "parameterized" can be overridden at replay time.
AppAgent-Claw is designed to be callable as a lightweight GUI skill from existing agent runtimes.
Prebuilt platform bundles are distributed through GitHub Releases rather than kept as committed runtime directories in the main repository.
Release page: https://github.com/Westlake-AGI-Lab/AppAgent-Claw/releases
| Platform | Release asset | Notes |
|---|---|---|
| OpenClaw | appagent-claw-openclaw.zip |
Packaged OpenClaw skill bundle for recording, annotation, replay, and reusable workflow packaging. |
| Codex | appagent-claw-codex.zip |
Self-contained Codex skill bundle with the bundled runtime and workflow layout for AppAgent-Claw tasks. |
| Claude Code | appagent-claw-claude-code.zip |
Self-contained Claude Code skill bundle with the same packaged runtime and workflow model. |
If you mainly want to use AppAgent-Claw from an agent runtime, start from the matching release bundle instead of cloning the repository first.
For repository development, use the root project with uv sync. For agent distribution, download the matching release asset for your runtime.
| Path | Purpose |
|---|---|
data/recordings/ |
local recording sessions created during development and manual testing |
data/runs/ |
replay logs, debug assets, and failure diagnostics |
examples/recordings/ |
curated, commit-safe sample flows kept in the repository |
Current curated repository examples:
Core runtime modules live under scripts/:
record.py— recording entrypointannotation.py— post-recording descriptions and text-slot analysisreplay.py— replay entrypointrecorder.py— event aggregation and step generationcapture.py— screenshot asset generationresolver.py— replay-time target resolutionexecutor.py— action executionstorage.py— flow and run persistenceschema.py— flow protocolwindow_context.py— frontmost window metadata and restoration helpers
- CJK and IME-driven text recording is still unreliable in some apps.
- Highly dynamic UI regions may still force fallback to relative coordinates.
- Replay assumes a similar window layout, monitor arrangement, and app state to the original recording.
- The post-recording annotator is heuristic and conservative; review
flow.jsonbefore relying on dynamic text reuse. - The target app must be available in the active foreground desktop session.
- Introduce GUI-agent reasoning into each recorded and replayed step to infer what the UI action is actually doing, not just where it clicks.
- Turn low-level interaction traces into higher-level workflow semantics so recorded procedures become easier to inspect, adapt, and reuse.
- Use step-level semantic grounding as the basis for moving from machine-specific replay toward more portable workflow execution.
- Increase GUI-agent participation during recording, annotation, and replay instead of limiting it to post-hoc metadata enrichment.
- Let the agent help describe intent, identify editable parameters, validate intermediate UI state, and decide when a replay step still matches the original task.
- Explore hybrid execution where deterministic replay remains the backbone, while the GUI agent provides interpretation, adjustment, and recovery when the environment changes.
- Use semantic step understanding to reduce dependence on a single machine's exact layout, coordinates, and visual state.
- Build toward workflow artifacts that preserve both recorded evidence and agent-readable intent, so a demonstrated skill can be re-instantiated in a different environment with less manual re-recording.
- Provide a stronger bridge between reusable skill workflows today and more autonomous demonstration-driven GUI agents in the future.
If you find this work helpful, please consider citing AppAgent-Claw. Our technical report will be released soon; for now, you can cite the GitHub repository:
@software{westlake_agi_lab_appagent_claw_2026,
author = {{Westlake-AGI-Lab}},
title = {AppAgent-Claw},
url = {https://github.com/Westlake-AGI-Lab/AppAgent-Claw},
year = {2026}
}This project was built under a very tight timeline, and it likely still contains many rough edges, missing cases, and implementation issues.
If you notice a bug, an unclear behavior, or a place where the workflow can be improved, please feel free to open an issue or submit a pull request. We'd love to make AppAgent-Claw better together with the community.