Add optional skills-mode improve and verify stages by lee-to · Pull Request #144 · lee-to/aif-handoff

lee-to · 2026-06-29T14:26:32Z

Summary

Add optional skills-mode runPlanImprove and runPostVerify task flags, defaulted off and exposed only when subagents are disabled.
Add Improve and Verify coordinator stages backed by /aif-improve and /aif-verify.
Run Verify after implementation and before Review; when Review is skipped, Verify moves directly to Done.
Update API, data, schema, UI, documentation, and tests for the expanded workflow.

Validation

npm test --workspace=@aif/agent -- coordinator.test.ts
npm run ai:validate

Related issue: #131

ichinya · 2026-06-29T18:12:47Z

Summary

This PR adds two optional skills-mode pipeline stages (improve and verify) gated by per-task boolean flags. The design is clean: purely additive, zero impact on existing subagent-mode tasks, and the coordinator routing logic is well-separated. The code follows existing patterns throughout.

Architecture & Design ✅

The approach is solid:

Additive, not in-place. New statuses, new columns (default 0), new pipeline entries. No existing behavior changes. This is the right way to extend the state machine.
Defense-in-depth on the guard. The API forces runPlanImprove/runPostVerify to false when useSubagents=true, AND the coordinator checks !task.useSubagents in shouldRunSkillsModeImprove/Verify. Belt and suspenders — good.
Dynamic routing via getStageSuccessStatus. Clean abstraction. The pipeline entries still declare static onSuccess, but the coordinator overrides at runtime based on task flags. This avoids a combinatorial explosion of pipeline entries.
Non-destructive rollback in improver. If /aif-improve produces garbage output, the original plan is preserved. The looksLikeFullPlanUpdate() gate is the right safety net.
Verify gate pattern. Parsing aif-gate-result code fences is a reasonable convention. The gate blocks only on fail/blocking=true, silently passes on warn/pass. This matches the existing review gate semantics.

Concerns

1. Verify retry loop on blocking gate

When the verify gate throws (status: "fail" or blocking: true), the coordinator leaves the task in verify status. On the next poll cycle, findCoordinatorTaskCandidates("verify") picks it up again and re-runs runVerifier. This creates an implicit retry loop with no backoff or retry limit.

Is this intentional? The existing review stage has maxReviewIterations to cap retries. The verify stage has no equivalent. If the implementation is genuinely broken, the verify stage will keep retrying every 30 seconds indefinitely until the stale-stage watchdog kicks in.

Suggestion: either add a verifyIterationCount field (mirroring reviewIterationCount) or have the verify gate throw to blocked_external instead of staying in verify.

2. Budget borrowing

runImprover borrows planCheckerMaxBudgetUsd — makes sense, it's a plan-scoped operation.
runVerifier borrows reviewSidecarMaxBudgetUsd — also reasonable, it's a review-scoped operation.

But the naming is misleading. If someone configures planCheckerMaxBudgetUsd expecting it to only affect the plan-checker, they'll be surprised when the improver uses it too. Consider either:

Renaming to planStageMaxBudgetUsd (covers both plan-checker and improver)
Or adding explicit improverMaxBudgetUsd/verifierMaxBudgetUsd fields (more granular)

This is a follow-up concern, not a blocker.

3. improve status in the state machine

The improve status has no human actions in HUMAN_ACTIONS_BY_STATUS. This is correct — it's agent-controlled. But there's no way for a human to cancel a task stuck in improve without manually setting it to blocked_external or similar. The existing planning status has the same limitation, so this is consistent, but worth noting.

Specific Code Comments

packages/agent/src/coordinator.ts — getStageSuccessStatus

function getStageSuccessStatus(task: TaskRow, stage: StatusTransition): TaskStatus {
  if (stage.label === "planner" && shouldRunSkillsModeImprove(task)) {
    return "improve";
  }
  if (stage.label === "implementer" && shouldRunSkillsModeVerify(task)) {
    return "verify";
  }
  if (stage.label === "verifier" && task.skipReview) {
    return "done";
  }
  return stage.onSuccess;
}

Clean. The three conditions are mutually exclusive (different stage labels), so no ordering ambiguity. The fallback to stage.onSuccess preserves existing behavior.

packages/agent/src/subagents/improver.ts — looksLikeFullPlanUpdate guard

if (!looksLikeFullPlanUpdate(currentPlan, improvedPlan)) {
  log.warn(/* ... */);
  persistTaskPlanForTask({ /* ... planText: currentPlan ... */ });
  return;
}

Good defensive pattern. The improver won't accidentally nuke a plan with partial output. The warning log provides enough context for debugging.

packages/agent/src/subagents/verifier.ts — extractVerifyGateResult

function extractVerifyGateResult(resultText: string): VerifyGateResult | null {
  const fence = resultText.match(/```aif-gate-result\s*([\s\S]*?)```/);
  if (!fence) return null;
  // ... JSON.parse with type narrowing
}

The parsing is defensive (returns null on malformed input, validates field types). One edge case: if the LLM produces multiple aif-gate-result fences, only the first is parsed. This is probably fine for now but worth documenting.

packages/data/src/index.ts — findCoordinatorTaskCandidates

The conditional chain for stage matching is getting long:

stage === "implementer"
  ? or(
      eq(tasks.status, "implementing"),
      and(eq(tasks.status, "plan_ready"), eq(tasks.autoMode, true)),
    )
  : stage === "improver"
    ? inArray(tasks.status, ["improve"])
    : stage === "plan-checker"
      ? and(eq(tasks.status, "plan_ready"), eq(tasks.autoMode, true))
      : stage === "planner"
        ? inArray(tasks.status, ["planning"])
        : stage === "verifier"
          ? inArray(tasks.status, ["verify"])
          : inArray(tasks.status, ["review"]);

This is a nested ternary chain with 6 branches. It works but is hard to read. Consider refactoring to a switch statement or a Record<CoordinatorStage, SQL> map in a follow-up. Not a blocker for this PR.

Minor Issues

Migration version 24 — Confirm no collision with other open PRs. The Migration Version Rule requires append-only versioning.
Test mock for improver.ts — The mock in coordinator.test.ts uses vi.fn().mockResolvedValue(undefined). This doesn't test the improver's rollback path. Consider adding a test where the improver throws to verify the coordinator handles it correctly.
Column.tsx owner badges — improve and verify both get the amber "AI controlled" badge. Consistent with planning and review. Good.
CSS variables — Light theme: --color-status-improve: #d97706 (amber-600), --color-status-verify: #0ea5e9 (sky-500). Dark theme: --color-status-improve: #b45309 (amber-700), --color-status-verify: #0284c7 (sky-600). These have sufficient contrast against their respective backgrounds.

Questions for the Author

Verify retry behavior: When the verify gate blocks, should the task stay in verify (implicit retry) or move to blocked_external (explicit human intervention)? The current behavior is the former, but there's no iteration cap.
Future extensibility: The aif-gate-result pattern in the verifier — is this intended to be reused by other stages (e.g., plan-checker could emit a gate result too)? If so, should the parsing be extracted to a shared utility?
Budget granularity: Any plans to add dedicated improverMaxBudgetUsd / verifierMaxBudgetUsd project settings, or is the current borrowing from plan-checker/review-sidecar budgets intentional long-term?

Files I'd Want to See Tests For

improver.ts — ✅ has tests, covers happy path and non-plan output detection
verifier.ts — ✅ has tests, covers pass and blocking gate
coordinator.ts — ✅ has tests for improve/verify routing, skipReview+verify combo
Missing: coordinator test where improver/verifier throws (error path)

Overall: Clean, well-structured PR. The additive design, defense-in-depth guards, and defensive parsing make this safe to merge. The verify retry loop is the only behavioral concern worth discussing before merge.

lee-to · 2026-06-30T08:28:40Z

Thanks for the review. I addressed the blocking verify retry concern in 1e1277a.

Changes made:

Added a structured StageManualBlockError for intentional stage gate blocks.
The verifier now throws that structured error for failing/blocking aif-gate-result output after saving the verification report.
The coordinator maps that error to blocked_external with retryAfter=null, so a blocking verify result requires manual intervention instead of being retried every poll.
Added regression coverage for the stage error classifier and for a second coordinator poll confirming the verifier is not re-run.

Validation:

npx vitest run packages/agent/src/tests/stageErrorHandler.test.ts packages/agent/src/tests/verifier.test.ts packages/agent/src/tests/coordinator.test.ts
npm run ai:validate
npm run format:check after the commit hook formatting pass

I left the budget naming and data query refactor notes as follow-up/non-blocking, since they do not affect the retry-loop behavior.

ichinya · 2026-07-01T04:44:18Z

PR Review

Summary

PR extends the workflow engine by adding optional Improve and Verify stages for skills-mode (useSubagents=false), as well as new flags, FSM states, and changes to the coordinator logic.

This is not a point change, but an extension of the execution model with conditional stages.

Strengths

Correct opt-in design:
- features are disabled by default
- ignored when useSubagents=true
Fixed an important bug with a retry-loop in verify:
- introduced StageManualBlockError
- added a structural transition in blocked_external
Logically improved pipeline:
- Improve → Plan refinement
- Verify → Post-execution validation
Documentation is synchronized with the code

Risks / Concerns

1. Growth of state machine complexity

Conditional stages (improve, verify) are added, which increases the number of hidden transitions and complicates the workflow mental model.

2. Flag interaction complexity

Behavior depends on the combination of:

useSubagents
skipReview
runPlanImprove
runPostVerify

There is no explicit truth table, which complicates the system's predictability.

3. Verify semantics

verify is actually a validation gate, but it is treated as a full-fledged stage with watchdog/retry logic, which may result in an architectural mix of abstraction levels.

4. Watchdog behavior

Extending the stale-stage watchdog to verify is logical, but it may be redundant for a short-lived validation step.

Suggestions

Add an explicit truth table for flag combinations
Clearly define the semantic distinction:
- Improve = plan refinement
- Verify = execution validation gate
Consider separating verify as a "validation layer" rather than a full stage
Add tests for all combinations of flags + edge cases retry-loop fix

Conclusion

PR qualitatively expands the pipeline and fixes a critical retry-loop bug in verify.

The main trade-off is an increase in FSM complexity and conditional branching, which requires a more strict specification of behavior.

lee-to · 2026-07-01T09:25:11Z

Addressed the latest review feedback in 7017e09. Added an explicit skills-mode flag truth table to docs/architecture.md, clarified Improve as plan refinement vs Verify as the execution validation gate, and documented why verify remains a coordinator stage for lifecycle/claim/timeout/profile/activity-log reuse. Added coordinator coverage for all flag combinations, including subagent-mode ignore behavior and skipReview interactions. Validation: npx vitest run packages/agent/src/tests/coordinator.test.ts, npm run format:check, git diff --check, npm run ai:validate.

…lls-mode-improve-verify

ichinya · 2026-07-02T04:31:27Z

Review: Request changes

Thanks for the update. The structured StageManualBlockError path addresses the previous blocking-verify retry loop, and the coordinator/API/UI changes are generally consistent. CI is green on the current head.

I found one blocking behavior issue: in the normal runPostVerify=true + skipReview=false path, runVerifier writes the verification output to reviewComments, then the following reviewer stage overwrites reviewComments with its own combined review/security output. That means pass/warn verification output is not retained after the standard pipeline, so any non-blocking verification findings disappear from the task once review completes.

Please either persist verification output in a dedicated field/log, or make the reviewer preserve/merge the existing Verification section when writing its final reviewComments.

Non-blocking note: issue #131 describes Review -> Verify -> Done, while this PR implements Implementing -> Verify -> Review/Done. The PR body and docs are internally consistent with the new order, but it would be worth confirming/updating the issue acceptance criteria so the final behavior is explicit.

Inline comment

File: packages/agent/src/subagents/verifier.ts
Line: 107

runVerifier persists the verification report into reviewComments, but the normal non-skipReview path immediately runs runReviewer, which writes setTaskFields({ reviewComments: combinedReview }) and overwrites this field. As a result pass/warn verification output is lost after review. Please either persist verification separately or have the reviewer merge/preserve the existing Verification section before overwriting reviewComments.

feat: add optional skills-mode improve and verify stages

fa7fc8f

lee-to requested a review from ichinya June 29, 2026 14:32

fix: block tasks on failing verify gate

1e1277a

docs: specify skills-mode flag matrix

7017e09

lee-to added 2 commits July 1, 2026 14:08

Merge remote-tracking branch 'origin/main' into feature/issue-131-ski…

1b6eff0

…lls-mode-improve-verify

test: update project overview status fixture

186ccf9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional skills-mode improve and verify stages#144

Add optional skills-mode improve and verify stages#144
lee-to wants to merge 5 commits into
mainfrom
feature/issue-131-skills-mode-improve-verify

lee-to commented Jun 29, 2026

Uh oh!

ichinya commented Jun 29, 2026

Uh oh!

lee-to commented Jun 30, 2026

Uh oh!

ichinya commented Jul 1, 2026

Uh oh!

lee-to commented Jul 1, 2026

Uh oh!

ichinya commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lee-to commented Jun 29, 2026

Summary

Validation

Uh oh!

ichinya commented Jun 29, 2026

Summary

Architecture & Design ✅

Concerns

Specific Code Comments

Minor Issues

Questions for the Author

Files I'd Want to See Tests For

Uh oh!

lee-to commented Jun 30, 2026

Uh oh!

ichinya commented Jul 1, 2026

PR Review

Summary

Strengths

Risks / Concerns

1. Growth of state machine complexity

2. Flag interaction complexity

3. Verify semantics

4. Watchdog behavior

Suggestions

Conclusion

Uh oh!

lee-to commented Jul 1, 2026

Uh oh!

ichinya commented Jul 2, 2026

Review: Request changes

Inline comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants