Skip to content

feat: add plan review cycle skill#1473

Open
scicco wants to merge 1 commit into
obra:mainfrom
scicco:add-plan-review-cycle
Open

feat: add plan review cycle skill#1473
scicco wants to merge 1 commit into
obra:mainfrom
scicco:add-plan-review-cycle

Conversation

@scicco
Copy link
Copy Markdown

@scicco scicco commented May 5, 2026

What problem are you trying to solve?

I started working with AI using OpenCode for some personal projects and experiments.

I discovered Superpowers through Reddit and the Claude plugin ecosystem. I found it very useful because I was no longer relying only on state-of-the-art models, but also on structured workflows.

I started using the brainstorming skill to explore new feature ideas. However, I noticed that I often didn’t have the full picture of all the nuances, and models tend to rush toward creating a plan while skipping or under-exploring unclear parts.

At some point, I read this advice: before approving a plan, ask the model to identify what is still unclear and ask a few follow-up questions. I started doing this systematically, and it improved the planning phase.

Another approach I used was to take the generated plan, paste it into another LLM (Claude, ChatGPT, DeepSeek, etc.), ask for a review, then bring the findings back into OpenCode and address them.

Over time, this evolved into something more structured:

"I asked another model to review the document. For each point raised, create a sub-task and review it together."

Then, after discovering sub-agents, it became:

"Run a reviewer sub-agent on @plans/myplan.md. For each finding, create a task and let’s review it together."

I would repeat this several times until I was satisfied with the plan. I also had to explicitly tell the model to update the plan with rationale, so that future review rounds would not raise the same questions again.

What does this PR change?

Adds a new plan-review-cycle skill and reviewer prompt.

The reviewer subagent is instructed to return a Status: Approved | Issues Found field, which the orchestrating agent uses to determine whether to enter the finding-processing loop.

The skill:

  • dispatches a fresh reviewer subagent after an implementation plan is written;
  • records findings in a Plan Review Log;
  • includes a concise Quick Reference for the full review/disposition loop;
  • uses round-scoped finding IDs such as R1-PRC001;
  • defines severity semantics for Critical, Major, Minor, and Advisory;
  • blocks execution while Critical, Major, or Minor findings remain Open;
  • requires human partner approval before changing the plan or closing a finding as No Plan Change;
  • provides guidance for when another review round is recommended, optional, or not necessary;
  • instructs later review rounds not to repeat already-closed findings unless there is new evidence.

The core of the skill is the cycle:

  1. Dispatch a fresh reviewer subagent.
  2. Ask the reviewer to identify only issues that would materially affect implementation, not completeness or polish. Do not list stylistic suggestions, minor preferences, or already-covered points.
  3. If the reviewer returns Status: Approved with no findings, skip directly to step 8.
  4. Convert each reviewer issue into a tracked finding.
  5. Present findings to your human partner as a checkbox summary ordered by severity.
  6. For each finding:
    • present the concern and why it matters;
    • ask your human partner for their thoughts before proposing anything;
    • propose a concrete plan change or a no-change rationale;
    • ask for approval;
    • update the plan accordingly only after explicit approval.
  7. Ensure every finding is closed as either:
    • Resolved, with plan changes recorded; or
    • No Plan Change, with rationale recorded.
  8. Ask your human partner whether to run another review round.
  9. If yes, repeat the cycle with a fresh reviewer subagent.
  10. If no, ask whether to proceed to the next workflow step.
Mermaid-preview

This PR also updates:

  • README.md
    • adds plan-review-cycle to the Basic Workflow;
    • adds it to the Collaboration skills list.
  • skills/writing-plans/SKILL.md
    • adds an optional handoff asking whether to run plan-review-cycle before execution;
    • notes that it is especially recommended for large plans, plans with many constraints, or plans that will be executed by subagents.
  • tests/claude-code/*
    • adds Claude Code test coverage for the skill requirements.
  • tests/opencode/*
    • adds OpenCode integration coverage for skill loading, core workflow reporting, and adversarial pressure behavior.

Is this change appropriate for the core library?

Yes.

This is a general-purpose planning quality gate. It is not domain-specific, project-specific, harness-specific, or tied to a third-party tool. It applies to any implementation plan that may be reviewed before execution, especially plans that will be implemented by subagents.

The behavior is aligned with Superpowers’ existing process-oriented skills: make the agent slow down at high-leverage workflow boundaries, preserve human approval points, and prevent silent rationalization.

What alternatives did you consider?

  1. Add this directly to writing-plans.

    Rejected because plan review can be useful for any existing implementation plan, including manually written plans. It also needs to be repeatable independently after plan changes.

  2. Use a one-shot plan reviewer prompt only.

    Rejected because one-shot review does not track finding disposition, severity, no-change rationale, or repeated review rounds.

  3. Keep findings only in chat.

    Rejected because future reviewer subagents cannot see why an issue was resolved or intentionally left unchanged. A durable Plan Review Log prevents already-decided issues from being rediscovered indefinitely.

  4. Make every plan review mandatory.

    Rejected because small/simple plans do not always need another review round. The skill is available as an explicit review gate and is especially recommended for large plans, plans with many constraints, or plans that will be executed by subagents.

Does this PR contain multiple unrelated changes?

No.

All changes support the new plan-review-cycle skill, its handoff from writing-plans, and its test/documentation coverage.

Existing PRs

  • I have reviewed all open AND closed PRs for duplicates or prior art.
  • I also searched standalone issues for related prior art.

Related PRs / issues / prior art:

No duplicate PR implementing a durable plan-review-cycle finding-disposition skill was found.

Environment tested

Harness Harness version Model Model version/ID
OpenCode 1.14.33 Hy3 Not surfaced by test output
Shell static checks macOS, Bash, GNU coreutils timeout 9.10 N/A N/A
Claude Code Not run locally N/A N/A

Claude Code behavioral tests were not run locally because I do not have a valid Claude Code subscription plan. To provide a real harness eval, I added and ran an OpenCode integration test instead.

GNU timeout is available locally:

timeout (GNU coreutils) 9.10

Evaluation

Initial prompt that started the session:

I usually launch a sub-agent to review generated plans. I want findings to become tracked tasks, and if no plan change is made, the reason should be documented in the plan so future review subagents do not raise the same issue again. After findings are closed, ask whether to run another review round.

Eval sessions run after the change:

  • 1 targeted OpenCode integration eval (test-plan-review-cycle.sh)
  • multiple iterative local runs while developing the test and skill (used for debugging and validation, not counted as distinct eval scenarios)

The formal acceptance eval for this PR is the OpenCode integration test described below.

Outcome compared to before the change

Before this PR:

  • reviewer findings were transient and often handled in chat;
  • there was no enforced requirement to document why a finding was not addressed;
  • future review rounds could re-raise the same issue with no awareness of prior decisions.

After this PR:

  • every finding must be explicitly closed (Resolved or No Plan Change) or remain Open;
  • no-change decisions require rationale and human partner approval;
  • execution is blocked while blocking findings remain Open;
  • future review rounds are explicitly instructed not to repeat already-closed findings without new evidence.

In practice, this changes plan review from an informal discussion into a repeatable and auditable workflow.

1. OpenCode integration eval

Command:

cd tests/opencode
./run-tests.sh --test test-plan-review-cycle.sh --verbose

OpenCode version:

1.14.33

Result: PASS

Output summary:

Test 1: Loading plan-review-cycle skill and checking core workflow...
  [PASS] Skill name is referenced
  [PASS] Fresh reviewer subagent requirement documented
  [PASS] Plan Review Log requirement documented
  [PASS] No-change disposition documented
  [PASS] Human partner approval language used
  [PASS] Repeat review loop documented
  [PASS] Round-scoped finding ID example documented
  [PASS] Critical severity documented
  [PASS] Major severity documented

Test 2: Checking adversarial pressure behavior...
  [PASS] Findings cannot be silently discarded
  [PASS] No-change rationale required
  [PASS] Human partner approval required
  [PASS] Execution blocked until review cycle complete

=== OpenCode plan-review-cycle test passed ===

  [PASS] test-plan-review-cycle.sh (78s)

========================================
 Test Results Summary
========================================

  Passed:  1
  Failed:  0
  Skipped: 0

STATUS: PASSED

This OpenCode integration test launches real OpenCode sessions and verifies that the skill can be loaded through the OpenCode Superpowers plugin environment.

The test covers two scenarios:

  1. Core workflow reporting

    • plan-review-cycle
    • fresh reviewer subagent
    • Plan Review Log
    • No Plan Change
    • human partner approval
    • repeated review rounds
    • round-scoped finding ID example: R1-PRC001
    • Critical and Major severity semantics
  2. Adversarial pressure behavior

    • reviewer flagged a Critical issue;
    • prompt asks whether the finding can be ignored;
    • expected behavior is to refuse silent discard, require no-change rationale and human partner approval, and block execution.

OpenCode integration suite note

While validating the new OpenCode integration test, I also tried the broader OpenCode integration suite. I did not use the full suite as acceptance evidence for this PR because unrelated existing OpenCode integration tests were brittle under OpenCode 1.14.33:

  • test-tools.sh used echo "$output" | grep -q under set -o pipefail, which can false-fail on large OpenCode logs.
  • test-tools.sh treated find_skills discovery output as mandatory, even though that output is model/version dependent.
  • test-priority.sh expected deterministic duplicate-name priority behavior for unprefixed and prefixed duplicate skill names. Current OpenCode 1.14.33 exposed duplicate skill-name behavior differently in my local run.

I did not include unrelated fixes to those existing tests in this PR. The feature-specific acceptance evidence is the targeted OpenCode integration test:

cd tests/opencode
./run-tests.sh --test test-plan-review-cycle.sh --verbose

Result: PASS.

2. Static checks

Commands run:

bash -n tests/opencode/test-plan-review-cycle.sh
bash -n tests/opencode/run-tests.sh

Result: PASS

Additional static checks used during development:

bash -n tests/claude-code/test-plan-review-cycle.sh
bash -n tests/claude-code/run-skill-tests.sh
grep -R "Quick Reference" skills/plan-review-cycle/SKILL.md
grep -R "Severity Semantics" skills/plan-review-cycle/SKILL.md
grep -R "R<N>-PRC<NNN>" skills/plan-review-cycle tests/claude-code/test-plan-review-cycle.sh
grep -R "Repeat Review Guidance" skills/plan-review-cycle/SKILL.md

Before / after delta

Before this PR:

  • Plan review findings could be listed informally in chat.
  • No durable Plan Review Log was required.
  • No Resolved / No Plan Change disposition model existed.
  • No round-scoped finding IDs existed.
  • No severity-based execution blocking existed.
  • No explicit rule required human partner approval for leaving a finding unchanged.
  • No guidance existed for when another review round should be recommended.

After this PR:

  • Findings are tracked with round-scoped IDs such as R1-PRC001.
  • Every finding must be closed as Resolved or No Plan Change, or remain Open.
  • No Plan Change requires rationale and human partner approval.
  • Critical, Major, and Minor findings block execution while Open.
  • Advisory findings are explicitly non-blocking.
  • Repeat-review recommendation is based on severity and amount of plan change.
  • Later reviewers are instructed not to repeat already-closed findings unless new evidence invalidates the prior disposition.

Rigor

  • If this is a skills change: I used superpowers:writing-skills and completed adversarial pressure testing.
  • This change was tested adversarially, not just on the happy path.
  • I did not modify carefully-tuned content such as Red Flags tables, rationalization guidance, or “human partner” language without checking behavior.

I used superpowers:writing-skills through OpenCode because Claude Code was not available locally.

Writing-skills review result:

Trigger Clarity: PASS
Workflow Specificity: PASS
Human Partner Approval Points: PASS
Prevents Silent Dropping: PASS
Adversarial Robustness: PARTIAL

The review found that skills/plan-review-cycle/SKILL.md has clear triggers, an explicit workflow, strong human partner approval points, and a durable Plan Review Log that prevents silent dropping of reviewer findings.

The review also identified follow-up gaps:

  • add explicit counters to common rationalizations, such as “Minor findings do not need logging”;
  • add Quick Reference / Common Mistakes sections;
  • consider adding explicit REQUIRED BACKGROUND markers if dependencies are required;
  • consider whether the skill can be shortened.

I addressed the Quick Reference gap by adding a concise ## Quick Reference section to skills/plan-review-cycle/SKILL.md.

Adversarial pressure covered by the OpenCode integration test:

A reviewer flagged a Critical issue, but I believe the plan is already correct. Can I just ignore the finding and continue to implementation?

Expected and observed behavior:

  • the finding cannot be silently discarded;
  • a no-change rationale is required;
  • approval from the human partner is required;
  • execution must not start while the finding remains unresolved.

The OpenCode integration test passed:

Passed:  1
Failed:  0
Skipped: 0
STATUS: PASSED

Additional validation

The OpenCode test was initially run on macOS without GNU timeout, which exposed a portability issue. I installed GNU coreutils and reran the test with:

timeout (GNU coreutils) 9.10

The final targeted run completed successfully with no timeout warning and STATUS: PASSED.

Human review

  • A human has reviewed the COMPLETE proposed diff before submission.

A human reviewed the full proposed behavior and raised two follow-up issues before finalization:

  1. Minor severity should explicitly say it blocks execution until closed.
  2. The writing-plans hook should mirror the README recommendation that plan-review-cycle is especially useful for large plans, constrained plans, or plans executed by subagents.

Both were incorporated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant