Apply fixes for E2E test failures, verify with scoped E2E tests.
Before implementing any fixes, enter plan mode by invoking /plan. Analyze the findings (Steps 1-2 below), produce a complete fix plan with specific file paths and code changes, and get user approval before executing.
IMPORTANT: Running real E2E tests is a HARD REQUIREMENT of this procedure. Every fix MUST be verified with real E2E tests before the summary step. Canary tests use the Vogon fake agent and cannot catch agent-specific issues. Do NOT skip E2E verification unless the user explicitly declines due to cost.
This procedure accepts findings from one of:
/e2e:triage-cioutput -- findings report already in conversation context/e2e:debugoutput -- root cause analysis already in conversation context- Standalone description -- user describes known failure and desired fix
From the findings in context, identify actionable fixes:
For agent-behavior flaky issues, fixes typically modify test prompts. For test-bug flaky issues, fixes target e2e/ infrastructure code (harness setup, helpers, env propagation).
**Proposed fix:** <description>
- File: <path to test file or e2e infrastructure file>
- Change: <what will be modified -- e.g., append "Do not ask for confirmation" to prompt, or fix env propagation in NewTmuxSession>
Common flaky fixes:
- Agent asked for confirmation -> append "Do not ask for confirmation" to prompt
- Agent wrote to wrong path -> be more explicit about paths in prompt
- Agent committed when shouldn't -> add "Do not commit" to prompt
- Checkpoint wait timeout -> increase timeout argument
- Agent timeout (signal: killed) -> increase per-test timeout, simplify prompt
- Auth/env not propagated -> fix test harness env setup in
e2e/code - Test helper bug (wrong assertion, bad glob) -> fix test helper in
e2e/ - tmux session setup issue -> fix
NewTmuxSessionor session config ine2e/
**Root cause analysis:**
- Component: <hooks | session | checkpoint | strategy | agent>
- Suspected location: <file:function>
- Description: <what's wrong and why>
- Proposed fix: <what code change would address it>
Prompt the user:
Should I fix these?
- [list of tests with classifications and proposed fixes]
- You can select all, specific tests, or skip.
Wait for user response before proceeding.
For flaky fixes the user approved:
- Apply fixes directly in the working tree (no branch creation)
- Run static checks:
mise run fmt && mise run lint mise run test:e2e:canary # Must pass
- Run real E2E tests to verify the fix. Scope depends on what was changed:
- Agent-specific fix (e.g.,
e2e/agents/cursor_cli.go, one agent's config/trust/env): run the full suite for that agent only:mise run test:e2e --agent <agent>
- Shared test infra fix (e.g.,
e2e/agents/agent.go,e2e/testutil/,TmuxSession, test helpers): run the full suite for all agents that failed, since the fix could affect any of them:mise run test:e2e --agent <agent1> mise run test:e2e --agent <agent2> # ... for each agent that had failures
- Test prompt fix (e.g., changed wording in a specific test): run that test across all agents that failed it:
mise run test:e2e --agent <agent> <TestName>
- Agent-specific fix (e.g.,
- If any step fails, investigate and adjust. Report what happened to the user.
For real-bug fixes the user approved:
- Apply the fix directly in the working tree (no branch creation)
- Run static checks and unit tests:
mise run fmt && mise run lint mise run test # Unit tests mise run test:e2e:canary # Canary tests
- Run real E2E tests to verify the fix (MANDATORY). Same scoping rules as flaky fixes above:
- Agent-specific change -> full suite for that agent
- Shared CLI/infra change -> full suite for all agents that failed
- Narrow change (single test affected) -> just that test across affected agents
- Report results to the user.
GATE: Do NOT proceed to the summary until real E2E tests have been run and results reported for every fix applied above. If E2E tests were not run, go back and run them now.
Print a summary table:
| Test | Agent(s) | Classification | Action Taken |
|------|----------|----------------|--------------|
| TestFoo | claude-code | flaky | Fixed in working tree |
| TestBar | all agents | real-bug | Fix applied, tests passing |
| TestBaz | opencode | flaky | Skipped (user declined) |