fix(brainstorming): ground recommendations in named comparison dimensions#1512
Draft
j7an wants to merge 1 commit into
Draft
fix(brainstorming): ground recommendations in named comparison dimensions#1512j7an wants to merge 1 commit into
j7an wants to merge 1 commit into
Conversation
…ions Replace the 3-bullet "Exploring approaches" block with a stability-test framing plus four explicit dimensions (what it assumes, where it breaks down, what would rule it out, what evidence supports it). Update checklist item obra#4 and the "Explore alternatives" key principle to match. Refs obra#1266
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem are you trying to solve?
Issue #1266 describes a reproducible failure mode in the
brainstormingskill: the first-pass recommendation is often surface-level rather than grounded in real analysis of the options, and the recommendation flips when the user asks "can you analyze these options in more depth?" The instability is the diagnostic — if the answer changes under deeper questioning, the original wasn't based on an in-depth comparison.Concrete user experience the issue describes: initial recommendation X → user asks for detailed analysis → recommendation changes to Y, which is clearly better reasoned. Users who don't think to push back act on a recommendation the agent itself would revise under scrutiny.
This is distinct from already-tracked concerns:
What does this PR change?
Three coordinated single-file edits to
skills/brainstorming/SKILL.md:Diff stats: 1 file changed, 14 insertions, 5 deletions.
Is this change appropriate for the core library?
Yes. The
brainstormingskill ships in core and is one of the project's most heavily-used behavior-shaping skills. The change is harness-neutral Markdown — no new tool calls, no third-party dependencies, no domain-specific or project-specific content. The four dimensions (assumptions, failure modes, disqualifying constraints, evidence) are the ones spelled out in issue #1266's "Proposed mechanism" and apply to brainstorming any kind of decision, not a specific stack or tool.What alternatives did you consider?
From issue #1266 itself plus design exploration:
critique([Feature Request] Add critique skill for nine-dimension self-review of AI responses #1013) skill after every brainstorm — post-hoc; the recommendation has already been presented and anchored. Doesn't address the first-pass instability.superpowers:writing-skillsdiscipline.The chosen approach (minimal coordinated wording changes that name the failure mode and the four dimensions) is what survived all four alternatives.
Does this PR contain multiple unrelated changes?
No. One file, three edits, all in service of a single goal: making first-pass brainstorming recommendations stable under "can you analyze in more depth?" pushback. The checklist item and Key Principles bullet exist solely to keep voice and terminology aligned with the new "Exploring approaches" block — splitting them across separate PRs would leave the skill internally contradictory.
Existing PRs
Searched
obra/superpowersopen AND closed PRs for "brainstorming" (50 results). None duplicate this change. The substantively-near PRs:Other open
fix(brainstorming):PRs in the search results address distinct problems and are not duplicates: #1170 (section overview), #1169 (assumptions/unknowns at handoff), #1037 (visual companion gating), #1097 (worktree handoff), #829 (worktree creation step), #759 (auto-open visual companion), #632 (worktree-before-spec ordering).One post-2026-05-09 PR matched the "brainstorming" search but is unrelated: #1507 ("Migrate superpowers to Pi platform with bilingual support and tests") — wholesale platform-migration PR, no overlap with the depth-of-comparison concern.
Related: #1266 (the issue this PR fixes). No prior PR addresses #1266.
Environment tested
New harness support (required if this PR adds a new harness)
N/A. This PR does not add support for a new harness. It modifies a skill that ships in core and is loaded by all currently-supported harnesses.
Clean-session transcript for "Let's make a react todo list"
Not applicable to this change (no new harness).
Evaluation
dev, 3 against this branch). Those sessions have not been completed at the time of this draft. This PR is intentionally a DRAFT until those evals are run and pasted in.devfor at least the Complex scenario, stability on this branch in all three. Trivial and Dominant scenarios may legitimately not flip ondev(small or one-sided decisions can be stable without depth) — those will be documented honestly rather than treated as failures.The locked scenario prompts that will produce the eval evidence:
Rigor
superpowers:writing-skillsand completed adversarial pressure testing (paste results below)Both adversarial-test boxes are intentionally unchecked. The first two require completed eval sessions. They are not yet run. Marking those boxes without the data would be fabrication. The third box is honest: this PR does not touch any Red Flags table, rationalization list, or "human partner" wording — those carefully-tuned regions are left intact.
The
superpowers:writing-skillsskill was invoked during implementation (its discipline is what produced the minimal, scope-limited three-edit change rather than a wider rewrite). What is missing is the eval-evidence requirement that completes the rigor checklist.This PR is a draft specifically so the maintainers do not have to triage an under-rigored skill change. Promotion to ready-for-review is gated on completing the 6 eval sessions and pasting before/after results into this body.
Human review
The complete diff is one file (
skills/brainstorming/SKILL.md), 14 insertions / 5 deletions, three localized hunks. My human partner is reviewing the diff in the GitHub UI as part of opening this draft.Fixes #1266.