Add verify-ui-change-in-cloud and test-warp-ui bundled skills (dogfood-gated)

cephalonaut · oz-agent · cephalonaut · commit 21b071063482 · 2026-05-05T21:10:23.000-04:00
Add two new bundled skills gated to dogfood (Local/Dev) builds:

- verify-ui-change-in-cloud: spawns a cloud agent with computer use to
  verify user-facing client changes after they are made
- test-warp-ui: guides a cloud agent through launching Warp and testing
  UI behavior using the computer_use tool

Co-Authored-By: Oz &lt;oz-agent@warp.dev&gt;
diff --git a/resources/channel-gated-skills/dogfood/test-warp-ui/SKILL.md b/resources/channel-gated-skills/dogfood/test-warp-ui/SKILL.md
@@ -0,0 +1,57 @@
+---
+name: test-warp-ui
+description: >
+  Guides testing Warp UI features and changes using the computer use tool.
+  Use this skill only when the computer_use tool is available to the agent.
+  Covers launching Warp and verifying UI behavior.
+user-invocable: false
+---
+
+# Computer Use for Warp UI Testing
+
+Use the `computer_use` tool to visually test that Warp looks and behaves as intended after UI changes.
+
+## Running Warp
+
+Launch Warp from the repository root with:
+
+```bash
+cargo run -- --api-key $STAGING_USER_WARP_API_KEY
+```
+
+The `--api-key` flag authenticates using the API key from the `STAGING_USER_WARP_API_KEY` environment variable, so the app starts directly without interactive login prompts.
+
+Initial builds may take several minutes; subsequent incremental builds are faster.
+
+## Testing Workflow
+
+### 1. Hardcode or Mock Data (When Needed)
+
+If you just need to verify that a specific UI looks correct, it can be useful to hardcode or mock data so the UI state is immediately reachable without navigating a full flow. This is optional — skip this step when testing end-to-end flows that should work naturally.
+
+Examples of when to hardcode:
+
+- **Conditional UI**: The feature only appears under certain conditions (e.g., a specific setting, a non-empty data set, an active subscription) — hardcode the condition so the UI always appears.
+- **Feature flags**: The feature is behind a flag that isn't enabled yet — enable it directly.
+- **Error states**: You want to test error handling UI — hardcode error responses or failure conditions.
+
+Keep mocked changes minimal and focused — only change what's necessary to reach the UI state under test.
+
+### 2. Invoke Computer Use
+
+Call the `computer_use` tool with a task description that includes:
+
+- The command to build and launch Warp (typically `cargo run -- --api-key $STAGING_USER_WARP_API_KEY` from the repo root)
+- Step-by-step instructions for navigating to the UI being tested
+- **Specific observations to report**: describe exactly what elements, text, colors, layout, or states the tool should observe and describe back
+- Do **not** include expected values in the task — the tool should report what it sees, not judge correctness
+
+### 3. Verify Results
+
+Compare the observations returned by `computer_use` against your expectations. If the UI doesn't match, investigate and adjust the code or mocks accordingly.
+
+## Tips
+
+- **Be specific in task descriptions**: Instead of "check if the dialog looks right," say "open Settings, click the General tab, and describe the text and layout of the first section."
+- **Test one thing at a time**: Focused tests are easier to debug when observations don't match expectations.
+- **Build before invoking**: Always confirm the build succeeds before calling `computer_use`. The tool cannot fix build errors.
diff --git a/resources/channel-gated-skills/dogfood/verify-ui-change-in-cloud/SKILL.md b/resources/channel-gated-skills/dogfood/verify-ui-change-in-cloud/SKILL.md
@@ -0,0 +1,82 @@
+---
+name: verify-ui-change-in-cloud
+description: >
+  Invoke this automatically after completing any user-facing client change, ONLY in non-sandboxed environments and local environments.
+  Verifies changes by spawning a cloud agent with computer use to test Warp.
+  Triggers a cloud agent that runs the test-warp-ui skill.
+user-invocable: true
+---
+
+# Verify UI Change in Cloud
+
+After making a user-facing client change, use this workflow to verify it by spawning a cloud agent with computer use capabilities. This applies to any change that affects what the user sees or experiences in the running app — not just visual/UI changes, but also startup behavior, config handling, migration flows, and other client-side logic.
+
+## Workflow
+
+### 1. Push Your Changes
+
+The cloud agent runs in a fresh environment that clones the repo. Your changes must be pushed to a branch so the cloud agent can access them.
+
+### 2. Detect the Repository
+
+Before spawning the cloud agent, detect which repository you are running in. Check the Git remote URL to determine the repo:
+
+```bash
+git remote get-url origin
+```
+
+Use the table below to select the correct environment ID:
+
+- **warp** (remote URL contains `warpdotdev/warp`):
+  - Environment: `SVhg783GBFQHk1OfdPfFU9` (the warp Dev Environment)
+
+If the remote URL does not match, warn the user that this skill only supports the warp repository and stop.
+
+### 3. Spawn the Cloud Agent
+
+Use the `run_agents` tool to spawn a remote cloud agent. A single-child batch (one entry in `agent_run_configs`) is valid.
+
+- `summary`: a brief declarative explanation, e.g. `"Spawning a cloud agent with computer use to verify the UI change."`
+- `base_prompt`: include an instruction to read and follow the `test-warp-ui` skill, followed by the verification task (see the next section)
+- `remote.environment_id`: the environment ID from the table above
+- `remote.computer_use_enabled`: `true`
+- `agent_run_configs`: a single entry with `name` set to a short display name such as `"verify-ui-change"`. The per-agent `prompt` can be empty since `base_prompt` covers the task.
+
+The `test-warp-ui` skill is bundled, so the cloud agent has it automatically. Tell the agent to invoke it by name in the `base_prompt` (e.g. "Read and follow the test-warp-ui skill.").
+
+### 4. Write an Effective Prompt
+
+The prompt should tell the cloud agent:
+- Which element, flow, or behavior to test
+- What hardcoding or mocking is needed (see below and the test-warp-ui skill for details on sandbox constraints)
+- What filesystem or app state to pre-seed before launching (e.g., creating directories, writing config files)
+- What specific observations to report back
+
+**Example prompts:**
+
+```
+I changed the settings dialog header to use a larger font and blue color.
+Hardcode the settings dialog to open on launch, then describe the header text,
+font size relative to other text, and color.
+```
+
+```
+I added a migration that symlinks config from ~/.warp into ~/.warp-preview on first launch.
+The migration is gated on Channel::Preview. Before building, hardcode the migration to run
+regardless of channel by removing the channel check. Also create a fake ~/.warp directory
+with test files. After launching Warp, verify the symlinks were created in ~/.warp-preview.
+```
+
+### Hardcoding to reach the code path under test
+
+The cloud agent builds Warp with `cargo run`, which may not match the exact runtime conditions of your change (e.g., different channel, missing feature flags, absent preconditions). When this happens, instruct the cloud agent to temporarily hardcode the code so the build exercises the path you need to test. Common examples:
+
+- **Gated code paths**: If the change is behind a channel check, feature flag, or experiment, tell the agent to remove or bypass the gate before building.
+- **Pre-existing state**: If the change depends on filesystem state that wouldn't exist in a clean environment (e.g., a config directory from a prior install), tell the agent to create it before launching.
+- **Startup behavior**: If the change affects something that only happens on first launch or migration, make sure the agent sets up the preconditions that trigger it.
+
+Be explicit in the prompt about what to hardcode and why — the cloud agent won't infer this on its own.
+
+### 5. Surface the Cloud Agent Link
+
+No extra surfacing step is needed — the Warp client displays the cloud agent run automatically.