E2E Implement Fixes

Apply fixes for E2E test failures, verify with scoped E2E tests.

Before implementing any fixes, enter plan mode by invoking /plan. Analyze the findings (Steps 1-2 below), produce a complete fix plan with specific file paths and code changes, and get user approval before executing.

IMPORTANT: Running real E2E tests is a HARD REQUIREMENT of this procedure. Every fix MUST be verified with real E2E tests before the summary step. Canary tests use the Vogon fake agent and cannot catch agent-specific issues. Do NOT skip E2E verification unless the user explicitly declines due to cost.

Inputs

This procedure accepts findings from one of:

/e2e:triage-ci output -- findings report already in conversation context
/e2e:debug output -- root cause analysis already in conversation context
Standalone description -- user describes known failure and desired fix

Step 1: Identify Fixes

From the findings in context, identify actionable fixes:

For `flaky` failures: describe the proposed fix

For agent-behavior flaky issues, fixes typically modify test prompts. For test-bug flaky issues, fixes target e2e/ infrastructure code (harness setup, helpers, env propagation).

**Proposed fix:** <description>
  - File: <path to test file or e2e infrastructure file>
  - Change: <what will be modified -- e.g., append "Do not ask for confirmation" to prompt, or fix env propagation in NewTmuxSession>

Common flaky fixes:

Agent asked for confirmation -> append "Do not ask for confirmation" to prompt
Agent wrote to wrong path -> be more explicit about paths in prompt
Agent committed when shouldn't -> add "Do not commit" to prompt
Checkpoint wait timeout -> increase timeout argument
Agent timeout (signal: killed) -> increase per-test timeout, simplify prompt
Auth/env not propagated -> fix test harness env setup in e2e/ code
Test helper bug (wrong assertion, bad glob) -> fix test helper in e2e/
tmux session setup issue -> fix NewTmuxSession or session config in e2e/

For `real-bug` failures: describe root cause analysis

**Root cause analysis:**
  - Component: <hooks | session | checkpoint | strategy | agent>
  - Suspected location: <file:function>
  - Description: <what's wrong and why>
  - Proposed fix: <what code change would address it>

Step 2: Ask the User

Prompt the user:

Should I fix these?

[list of tests with classifications and proposed fixes]

You can select all, specific tests, or skip.

Wait for user response before proceeding.

Step 3: Apply Fixes

For flaky fixes the user approved:

Apply fixes directly in the working tree (no branch creation)

Run static checks:

mise run fmt && mise run lint
mise run test:e2e:canary   # Must pass

Run real E2E tests to verify the fix. Scope depends on what was changed:
- Agent-specific fix (e.g., e2e/agents/cursor_cli.go, one agent's config/trust/env): run the full suite for that agent only:
```
mise run test:e2e --agent <agent>
```
- Shared test infra fix (e.g., e2e/agents/agent.go, e2e/testutil/, TmuxSession, test helpers): run the full suite for all agents that failed, since the fix could affect any of them:
```
mise run test:e2e --agent <agent1>
mise run test:e2e --agent <agent2>
# ... for each agent that had failures
```
- Test prompt fix (e.g., changed wording in a specific test): run that test across all agents that failed it:
```
mise run test:e2e --agent <agent> <TestName>
```
This step is MANDATORY -- canary tests use the Vogon fake agent and cannot verify agent-specific behavior (trust dialogs, env propagation, config directories, etc.).
If any step fails, investigate and adjust. Report what happened to the user.

For real-bug fixes the user approved:

Apply the fix directly in the working tree (no branch creation)

Run static checks and unit tests:

mise run fmt && mise run lint
mise run test        # Unit tests
mise run test:e2e:canary  # Canary tests

Run real E2E tests to verify the fix (MANDATORY). Same scoping rules as flaky fixes above:
- Agent-specific change -> full suite for that agent
- Shared CLI/infra change -> full suite for all agents that failed
- Narrow change (single test affected) -> just that test across affected agents
Report results to the user.

GATE: Do NOT proceed to the summary until real E2E tests have been run and results reported for every fix applied above. If E2E tests were not run, go back and run them now.

Step 4: Summary

Print a summary table:

| Test | Agent(s) | Classification | Action Taken |
|------|----------|----------------|--------------|
| TestFoo | claude-code | flaky | Fixed in working tree |
| TestBar | all agents | real-bug | Fix applied, tests passing |
| TestBaz | opencode | flaky | Skipped (user declined) |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

E2E Implement Fixes

Inputs

Step 1: Identify Fixes

For `flaky` failures: describe the proposed fix

For `real-bug` failures: describe root cause analysis

Step 2: Ask the User

Step 3: Apply Fixes

Step 4: Summary

FilesExpand file tree

implement.md

Latest commit

History

implement.md

File metadata and controls

E2E Implement Fixes

Inputs

Step 1: Identify Fixes

For flaky failures: describe the proposed fix

For real-bug failures: describe root cause analysis

Step 2: Ask the User

Step 3: Apply Fixes

Step 4: Summary

For `flaky` failures: describe the proposed fix

For `real-bug` failures: describe root cause analysis