feat: add plan review cycle skill#1473
Open
scicco wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem are you trying to solve?
I started working with AI using OpenCode for some personal projects and experiments.
I discovered Superpowers through Reddit and the Claude plugin ecosystem. I found it very useful because I was no longer relying only on state-of-the-art models, but also on structured workflows.
I started using the
brainstormingskill to explore new feature ideas. However, I noticed that I often didn’t have the full picture of all the nuances, and models tend to rush toward creating a plan while skipping or under-exploring unclear parts.At some point, I read this advice: before approving a plan, ask the model to identify what is still unclear and ask a few follow-up questions. I started doing this systematically, and it improved the planning phase.
Another approach I used was to take the generated plan, paste it into another LLM (Claude, ChatGPT, DeepSeek, etc.), ask for a review, then bring the findings back into OpenCode and address them.
Over time, this evolved into something more structured:
"I asked another model to review the document. For each point raised, create a sub-task and review it together."
Then, after discovering sub-agents, it became:
"Run a reviewer sub-agent on
@plans/myplan.md. For each finding, create a task and let’s review it together."I would repeat this several times until I was satisfied with the plan. I also had to explicitly tell the model to update the plan with rationale, so that future review rounds would not raise the same questions again.
What does this PR change?
Adds a new
plan-review-cycleskill and reviewer prompt.The reviewer subagent is instructed to return a
Status: Approved | Issues Foundfield, which the orchestrating agent uses to determine whether to enter the finding-processing loop.The skill:
Plan Review Log;R1-PRC001;Critical,Major,Minor, andAdvisory;Critical,Major, orMinorfindings remainOpen;No Plan Change;The core of the skill is the cycle:
Status: Approvedwith no findings, skip directly to step 8.Resolved, with plan changes recorded; orNo Plan Change, with rationale recorded.This PR also updates:
README.mdplan-review-cycleto the Basic Workflow;skills/writing-plans/SKILL.mdplan-review-cyclebefore execution;tests/claude-code/*tests/opencode/*Is this change appropriate for the core library?
Yes.
This is a general-purpose planning quality gate. It is not domain-specific, project-specific, harness-specific, or tied to a third-party tool. It applies to any implementation plan that may be reviewed before execution, especially plans that will be implemented by subagents.
The behavior is aligned with Superpowers’ existing process-oriented skills: make the agent slow down at high-leverage workflow boundaries, preserve human approval points, and prevent silent rationalization.
What alternatives did you consider?
Add this directly to
writing-plans.Rejected because plan review can be useful for any existing implementation plan, including manually written plans. It also needs to be repeatable independently after plan changes.
Use a one-shot plan reviewer prompt only.
Rejected because one-shot review does not track finding disposition, severity, no-change rationale, or repeated review rounds.
Keep findings only in chat.
Rejected because future reviewer subagents cannot see why an issue was resolved or intentionally left unchanged. A durable
Plan Review Logprevents already-decided issues from being rediscovered indefinitely.Make every plan review mandatory.
Rejected because small/simple plans do not always need another review round. The skill is available as an explicit review gate and is especially recommended for large plans, plans with many constraints, or plans that will be executed by subagents.
Does this PR contain multiple unrelated changes?
No.
All changes support the new
plan-review-cycleskill, its handoff fromwriting-plans, and its test/documentation coverage.Existing PRs
Related PRs / issues / prior art:
writing-plans: Enhanced review capabilities #1010 proposed enhanced
writing-plansreview capabilities via an inlinePlan Audit -> Revise -> Consistency Checkworkflow. This PR is related but different: it adds a separateplan-review-cycleskill using a fresh reviewer subagent, durablePlan Review Log, explicitOpen/Resolved/No Plan Changedispositions, severity-based blocking, human partner approval, and repeat-review guidance.Proposal: adversarial plan review step between writing-plans and executing-plans #1130 proposed an adversarial plan review step between
writing-plansandexecuting-plans. This PR is related but different: it adds durable finding disposition, severity-based blocking, human partner approval, and repeat-review guidance rather than only a destructive review pass.feat: document review system and workflow enforcement #334 added document/spec/plan review loops. This PR builds on that area but adds a durable implementation-plan finding disposition cycle with round-scoped IDs, severity semantics, explicit
Resolved/No Plan Changeclosure, human partner approval, and repeat-review guidance.Add architecture guidance and capability-aware escalation #441 added architecture and file-size checks to spec, plan, and code-quality review loops. This PR does not add more review criteria; it adds the process for tracking and closing findings raised during plan review.
Subagent-driven development misses product-level gaps (3 actionable improvements) #766 suggested plan review improvements for product-level gaps and intent-vs-implementation drift. This PR addresses the disposition/closure workflow for reviewer findings rather than adding new product-level review criteria.
writing-plans: plans over-specify implementation, leaving no room for executor judgment #895, writing-plans: require verbatim quotes of spec UX contracts above code snippets #1233, and writing-plans: require 'Step 0: quote current symbols' for tasks modifying existing code #1234 are related planning-quality issues. They focus on what implementation plans should contain or how plans should preserve spec/code context; this PR focuses on how independent review findings are tracked, approved, closed, or carried forward.
Scale process-oriented skills to task complexity #522 adjusted process-oriented skills to scale to task complexity. This PR follows the same principle by making another review round recommended, optional, or unnecessary based on finding severity and amount of plan change.
[codex] Fix OpenCode integration tests #1285 is prior art for OpenCode integration tests. This PR follows the OpenCode test-suite pattern by adding an integration test that launches real OpenCode sessions through
tests/opencode/run-tests.sh.Lift superpowers:code-reviewer agent into the requesting-code-review skill #1299 is prior art for adding behavioral tests around review dispatch behavior, but it targets code review, not implementation-plan review disposition.
No duplicate PR implementing a durable
plan-review-cyclefinding-disposition skill was found.Environment tested
timeout9.10Claude Code behavioral tests were not run locally because I do not have a valid Claude Code subscription plan. To provide a real harness eval, I added and ran an OpenCode integration test instead.
GNU
timeoutis available locally:Evaluation
Initial prompt that started the session:
Eval sessions run after the change:
test-plan-review-cycle.sh)The formal acceptance eval for this PR is the OpenCode integration test described below.
Outcome compared to before the change
Before this PR:
After this PR:
ResolvedorNo Plan Change) or remainOpen;Open;In practice, this changes plan review from an informal discussion into a repeatable and auditable workflow.
1. OpenCode integration eval
Command:
cd tests/opencode ./run-tests.sh --test test-plan-review-cycle.sh --verboseOpenCode version:
Result: PASS
Output summary:
This OpenCode integration test launches real OpenCode sessions and verifies that the skill can be loaded through the OpenCode Superpowers plugin environment.
The test covers two scenarios:
Core workflow reporting
plan-review-cyclePlan Review LogNo Plan ChangeR1-PRC001CriticalandMajorseverity semanticsAdversarial pressure behavior
Criticalissue;OpenCode integration suite note
While validating the new OpenCode integration test, I also tried the broader OpenCode integration suite. I did not use the full suite as acceptance evidence for this PR because unrelated existing OpenCode integration tests were brittle under OpenCode 1.14.33:
test-tools.shusedecho "$output" | grep -qunderset -o pipefail, which can false-fail on large OpenCode logs.test-tools.shtreatedfind_skillsdiscovery output as mandatory, even though that output is model/version dependent.test-priority.shexpected deterministic duplicate-name priority behavior for unprefixed and prefixed duplicate skill names. Current OpenCode 1.14.33 exposed duplicate skill-name behavior differently in my local run.I did not include unrelated fixes to those existing tests in this PR. The feature-specific acceptance evidence is the targeted OpenCode integration test:
cd tests/opencode ./run-tests.sh --test test-plan-review-cycle.sh --verboseResult: PASS.
2. Static checks
Commands run:
Result: PASS
Additional static checks used during development:
Before / after delta
Before this PR:
Plan Review Logwas required.Resolved/No Plan Changedisposition model existed.After this PR:
R1-PRC001.ResolvedorNo Plan Change, or remainOpen.No Plan Changerequires rationale and human partner approval.Critical,Major, andMinorfindings block execution whileOpen.Advisoryfindings are explicitly non-blocking.Rigor
superpowers:writing-skillsand completed adversarial pressure testing.I used
superpowers:writing-skillsthrough OpenCode because Claude Code was not available locally.Writing-skills review result:
The review found that
skills/plan-review-cycle/SKILL.mdhas clear triggers, an explicit workflow, strong human partner approval points, and a durablePlan Review Logthat prevents silent dropping of reviewer findings.The review also identified follow-up gaps:
REQUIRED BACKGROUNDmarkers if dependencies are required;I addressed the Quick Reference gap by adding a concise
## Quick Referencesection toskills/plan-review-cycle/SKILL.md.Adversarial pressure covered by the OpenCode integration test:
Expected and observed behavior:
The OpenCode integration test passed:
Additional validation
The OpenCode test was initially run on macOS without GNU
timeout, which exposed a portability issue. I installed GNU coreutils and reran the test with:The final targeted run completed successfully with no timeout warning and
STATUS: PASSED.Human review
A human reviewed the full proposed behavior and raised two follow-up issues before finalization:
Minorseverity should explicitly say it blocks execution until closed.writing-planshook should mirror the README recommendation thatplan-review-cycleis especially useful for large plans, constrained plans, or plans executed by subagents.Both were incorporated.