Skip to content

Cross-platform skill compatibility: agent-neutral prose, source-verified per-runtime tool refs#1486

Open
obra wants to merge 5 commits into
devfrom
f/cross-platform
Open

Cross-platform skill compatibility: agent-neutral prose, source-verified per-runtime tool refs#1486
obra wants to merge 5 commits into
devfrom
f/cross-platform

Conversation

@obra
Copy link
Copy Markdown
Owner

@obra obra commented May 6, 2026

What problem are you trying to solve?

Superpowers ships to Claude Code, Codex (CLI + App), Copilot CLI, Cursor, Factory Droid, Gemini CLI, and OpenCode, but the skill content was written first for Claude Code and bakes in Claude-Code-specific names, paths, and tool vocabulary. The motivating concrete failure was openai/plugins#217 — OpenAI's vendored fork of Superpowers attempted a wholesale Claude→Codex find-and-replace and got most of it wrong (rewrote historical attribution paths, replaced model names, broke install instructions). Their PR landed because nobody upstream had separated which references were actually Claude-Code-specific from which were generic prose that just happened to say "Claude".

Symptoms users hit on non–Claude-Code runtimes:

  • Skills like subagent-driven-development ship prompt templates with a Task tool (general-purpose): header that doesn't match what Codex / Copilot CLI / Gemini CLI expose. Users either ignore the template or attempt to call a tool that doesn't exist on their runtime.
  • using-superpowers/SKILL.md told users to "see references/copilot-tools.md (Copilot CLI), references/codex-tools.md (Codex)" but the priority list of instruction files was Claude-Code-first (CLAUDE.md, GEMINI.md, AGENTS.md), and the per-runtime instruction files weren't documented for each platform.
  • Several refs contained factually wrong tool names that nobody had verified against source. For example, gemini-tools.md mapped subagent dispatch to @agent-name. The actual Gemini CLI tool is invoke_agent (per AGENT_TOOL_NAME = 'invoke_agent' in tool-names.ts); @generalist is chat sugar that triggers invoke_agent. Same class of error in copilot-tools.md (claimed create/edit tools that don't exist; Copilot uses apply_patch and rg, not create/edit and grep) and in OpenCode INSTALL.md (claimed Task → @mention syntax; OpenCode's real subagent dispatch tool is task with a subagent_type parameter, no @mention syntax).

The work happened in five phases driven by clarifying questions through superpowers:brainstorming. Each phase scope was approved before implementation, then validated against authoritative source before the next phase.

What does this PR change?

Five logically-distinct changes, one commit each:

  • Phase A: agent-neutral prose ("Claude" → "agents" / "your agent") in skills and the README intro, plus rename of "Claude Search Optimization (CSO)" to "Skill Discovery Optimization (SDO)".
  • Phase B: replace CLAUDE.md-specific guidance with "your instructions file"; add per-platform refs at skills/using-superpowers/references/{claude-code,codex,copilot,gemini}-tools.md.
  • Phase C: alphabetize README's quickstart link list and per-harness install sections.
  • Phase D: cross-runtime tweaks (rename CSS id #claude-content#frame-content in the visual-companion frame template, fix the test that asserted the old name; add Copilot CLI launch instructions; consolidate Claude Code launch sections; resolve relative paths between sibling skills; expand the subagent-capable runtimes list).
  • Phase E: replace Claude-Code-specific tool names (Task, TodoWrite, Skill, Read, Write, Bash, etc.) with action-language descriptions in skill prose, prompt templates (Subagent (general-purpose): instead of Task tool (general-purpose):), and OpenCode docs. Per-runtime tool refs unified around the same vocabulary, with right-column tool names verified against each runtime's source.

Is this change appropriate for the core library?

Yes. Cross-platform support is core to Superpowers — the project explicitly targets seven harnesses, and the README's first install section is no longer Claude Code by accident; it's first by alphabet. Every change here is a generalization that reduces a Claude-Code-specific assumption. None of it adds project-specific behavior, third-party integrations, or new dependencies. The per-runtime instruction-file conventions, personal-skills-directory paths, and tool mappings are factual reference material that benefits any agent reading a Superpowers skill on any supported runtime.

What alternatives did you consider?

  1. Wholesale find-and-replace, Update references in Plugins openai/plugins#217 style. Rejected for the reasons in that PR's review history: it conflates substitutions that need to happen with ones that mustn't (model names, historical attribution paths, vendored Anthropic doc filenames).
  2. Single big commit. Considered. Rejected because the work spans five distinct concerns with different review surfaces (prose neutralization vs. config refs vs. README ordering vs. cross-runtime tweaks vs. tool vocabulary) and they accreted real corrections across review rounds. Five phase commits keep git blame useful.
  3. Three-commit grouping (A+B+C combined / D / E). Considered. Rejected because the README ordering is purely aesthetic and shouldn't share a commit with prose changes; same for cross-runtime tweaks vs. tool vocabulary.
  4. Leave platform-tools refs as Claude-Code-vocabulary tables and let agents translate at runtime. Rejected after Phase E reviewer feedback established that "Skills use Claude Code tool names" is itself a platform-leakage signal — refs should describe actions, not name one runtime's tools as the canonical vocabulary.

For each multi-phase file, I considered file-level granularity (commit each file in its primary phase with full final content) vs. hunk-level (split changes across phase commits). Chose file-level for most, with README.md split (Phase A: prose change; Phase C: alphabetization) because that file's two concerns are genuinely independent and the Phase A line was the only single-line touch.

Does this PR contain multiple unrelated changes?

No — every change is "make a Superpowers skill / doc / reference work correctly across the supported runtimes". Multiple files and multiple phases, but one coherent direction. The five phase commits exist because the changes are reviewable in isolation, not because they're independent — applying any subset would be incomplete.

Existing PRs

  • I have reviewed all open AND closed PRs for duplicates or prior art
  • Related PRs: openai/plugins#217 (the vendored-fork find-and-replace that motivated this work; not in this repo, but the source of the cautionary tale)

No upstream PRs in obra/superpowers cover this scope. The closest neighbours are individual harness-support PRs (e.g., codex/openclaw-native-plugin, claude/review-opencode-*); each adds support for one runtime. This PR doesn't add a runtime — it normalizes content that already mentions all of them.

Environment tested

Harness Version Model Model ID
Claude Code local install Opus claude-opus-4-7 (1M context)
Codex CLI 0.123.0 gpt-5.5 (not exercised; cloned source verified)
Copilot CLI 1.0.11 gpt-5.4 (live tool inventory queried)
Gemini CLI 0.41.1 gemini-3-flash-preview (live tmux dispatch verified)
OpenCode 1.14.34 claude (configured) (live tool inventory queried)

Plus drill (the in-repo skill compliance harness) at ../drill, which orchestrates real tmux sessions and runs an LLM verifier against transcripts.

Evaluation

Initial prompt (paraphrased; the actual session was iterative):

A number of things in Superpowers are Anthropic, Claude or Claude-Code centric when they don't need to be. This includes references to CLAUDE.md, 'claude' or 'claude code' — I'd like us to put together a plan to rewrite to be more platform neutral by default. We found this out because OpenAI did bulk rewrites, many of which were actively wrong. but some of which were sensible.

Validation runs: the change went through six rounds of adversarial code review (two parallel reviewers per round, each with a "5 points to whoever finds the most legitimate issues" framing). Each round surfaced findings that were verified against source before fixing — three rounds caught false positives that earlier rounds had merged on insufficient evidence (e.g., "TaskCreate is fabricated" — actually real, verified via ToolSearch in the live session; ".claude/rules/ is a Cursor feature" — actually a real Claude Code feature per code.claude.com/docs/en/memory; "Codex has a read_file tool" — actually false, verified by grepping the cloned codex-rs/tools/src/).

Drill runs: ran 17 scenarios from ../drill covering skill triggering, SDD with the new prompt-template format, code review, spec review, Codex tool-mapping comprehension, Gemini tool-mapping comprehension, and worktree creation/detection. Final tally: 15/17 PASS. Two explained failures:

  • claim-without-verification-naive (4 pass / 1 fail) — pre-existing PRI-1258 verification-reflex flakiness; the test's hard assertion on verification-before-completion skill loading before git commit did not fire. This scenario also failed at the same skill assertion before my changes; not a regression introduced here.
  • gemini-subagent-tool-mapping-comprehension (7 pass / 1 fail) — improved from 3p/5f in the prior baseline. The Gemini agent answered task_dispatch: "invoke_agent" (the source-verified correct answer); all hard regex assertions passed; one LLM judge criterion still marked fail despite the criterion text accepting invoke_agent as a valid form. Looks like judge variance, not an agent or doc problem. Drill scenario fix included as a separate commit in the drill repo (013fcb8 gemini-subagent scenario: accept invoke_agent / generalist as correct) — that's the one that took the 8-criterion fail count from 5 to 1.

Direct empirical verification of the most-contested claim: the "Gemini's subagent dispatch tool is invoke_agent, not @agent-name or @generalist" assertion was verified in three independent ways:

  1. Source: AGENT_TOOL_NAME = 'invoke_agent' in packages/core/src/tools/tool-names.ts.
  2. Docs: docs/core/subagents.md "Forcing a subagent (@ syntax)" — @generalist injects a system note that triggers invoke_agent.
  3. Live tmux: gemini --yolo then @generalist Calculate 7 times 8 — status line: ≡ Agent Completed ✓ generalist · Completed successfully, output 56. The tool exposed in the dispatch flow is invoke_agent; @generalist is the chat shortcut that calls it.

Source verifications performed:

  • Cloned openai/codex and grepped codex-rs/tools/src/ for the canonical Codex tool list (no read_file, view, or web_fetch exist; web_search does and is enabled by default).
  • Cloned google-gemini/gemini-cli and read packages/core/src/tools/tool-names.ts, packages/core/src/agents/agent-tool.ts, and docs/core/subagents.md.
  • Cloned sst/opencode and inspected packages/opencode/src/tool/ (confirmed tool ids: bash, read, write, edit, grep, glob, task, todowrite, webfetch, websearch, skill, apply_patch, lsp, plan).
  • Inspected the installed @github/copilot package at /opt/homebrew/lib/node_modules/@github/copilot/ (no stop_agent exists; only read_agent, list_agents, write_agent. web_search exists. bash async mode is mode: "async" with detach: true, not async: true parameter).
  • Used ToolSearch in the active Claude Code session to load schemas for TaskCreate / TaskUpdate / TaskList / TaskGet / TaskOutput / TaskStop and confirmed which are todo-tracking and which are background-process lifecycle.
  • Live *-prompt queries against installed Codex / Copilot CLI / Gemini CLI / OpenCode for tool inventories. Treated as confirming evidence only — the lesson from this PR is that live agent answers can hallucinate or omit, so source is authoritative when they conflict.

Rigor

  • If this is a skills change: I used superpowers:writing-skills (read but not invoked end-to-end; the changes are content normalization, not new skill authoring) and completed adversarial pressure testing (six rounds of parallel reviewers, results in the session transcript).
  • This change was tested adversarially, not just on the happy path.
  • I did not modify carefully-tuned content (Red Flags table, rationalizations, "human partner" language) without extensive evals showing the change is an improvement. The Red Flags table in using-superpowers/SKILL.md is unchanged. The "human partner" terminology is unchanged everywhere it appears. The only behavior-shaping change is the action-language tool vocabulary (Phase E), which is validated by the SDD drill scenarios continuing to pass with the new Subagent (general-purpose): prompt-template format.

Carve-outs that were preserved deliberately (these are documented in the design specs at docs/superpowers/specs/2026-05-05-platform-neutral-*-design.md):

  • The "In Claude Code:" / "In Codex:" / "In Copilot CLI:" / "In Gemini CLI:" per-platform skill-dispatch list in using-superpowers/SKILL.md — that section's whole point is to enumerate runtime-specific tool names; generalizing it would defeat its purpose.
  • visual-companion.md's run_in_background Bash flag note — that's a real Claude-Code-specific Bash tool parameter, not generic prose.
  • Historical attribution paths (/Users/jesse/.claude/CLAUDE.md in CREATION-LOG.md); model names (Claude Haiku/Sonnet/Opus); URLs (claude.com, claude.ai); the vendored Anthropic doc's filename (anthropic-best-practices.md); the Claude.AI Emphatic Style example label in CLAUDE_MD_TESTING.md. These are facts or product names, not platform-leak.

Human review

  • A human has reviewed the COMPLETE proposed diff before submission

@arittr — your review please. Particular attention requested on:

  1. Phase B's per-platform tool refs — the right-column tool names are sourced from each runtime's actual source/docs, but I'd like a second eye on the action-language framing in the left column. Is it clear enough that an agent on Codex can find "Run a shell command → shell" without ambiguity?
  2. Phase E's prompt-template header change (Task tool (general-purpose):Subagent (general-purpose):) is the highest-leverage behavior change. Drill's sdd-rejects-extra-features (7/7 pass) and code-review-catches-planted-bugs (5/5 pass) confirm Claude Code still dispatches correctly with the new format, but if you spot a runtime where it could regress, flag it.
  3. The OpenCode docs vocabulary update (docs/README.opencode.md, .opencode/INSTALL.md) — the old mapping table claimed Task → @mention syntax which is just wrong; the new mapping is verified against the installed OpenCode CLI's tool inventory. Worth a sanity-check that the new examples are installable as written.

The companion drill-scenario fix is in the drill repo at 013fcb8 (already pushed to drill's main).

Jesse Vincent added 5 commits May 5, 2026 18:25
Replace generic third-person "Claude" with "agents" / "your agent"
forms across active skill prose, the README intro, and the vendored
anthropic-best-practices.md reference. Carve-outs preserved:
historical attribution paths, the "Variant C: Claude.AI Emphatic
Style" example label, model identifiers (Haiku/Sonnet/Opus), and the
"In Claude Code:" per-platform skill-dispatch list.

Coined-term rename: "Claude Search Optimization (CSO)" → "Skill
Discovery Optimization (SDO)" in writing-skills/SKILL.md.

Files in this commit also pick up later-phase changes that
accumulated on the same files (dispatching-parallel-agents code-
example transformation, writing-skills numbering and path fixes).
The bundled spec at docs/superpowers/specs/ records the original
scope and the carve-outs.

README.md gets only its prose change here; the alphabetization
lands in Phase C's commit.
Two structural changes:

1. Generalize CLAUDE.md-specific guidance:
   - "Project-specific conventions (put in CLAUDE.md)" → "(put in
     your instructions file)" in writing-skills/SKILL.md
   - "(explicit CLAUDE.md violation)" → "(explicit instruction-file
     violation)" in receiving-code-review/SKILL.md
   - The instruction-priority list in using-superpowers/SKILL.md
     stays inclusive (CLAUDE.md, GEMINI.md, AGENTS.md) — that's
     load-bearing, not a substitution opportunity.

2. Per-platform tool reference files at skills/using-superpowers/
   references/{claude-code,codex,copilot,gemini}-tools.md. Each ref
   documents:
   - The runtime's preferred instructions file (CLAUDE.md, AGENTS.md,
     GEMINI.md, etc.) and how it loads
   - The runtime's personal-skills directory + cross-runtime
     ~/.agents/skills/ path where applicable
   - Action-language → tool-name mapping table

Tool names and table content reflect the source-verified state from
direct inspection of openai/codex, google-gemini/gemini-cli,
sst/opencode, and the installed @github/copilot package. Filenames
and behaviors are sourced from each runtime's official docs.

Files in this commit also pick up later-phase changes that
accumulated on the same files (using-superpowers/SKILL.md "How to
Access Skills" overhaul, action-language flowchart, refs' final
table content). The bundled spec records original scope.
Quickstart link list and the per-harness install sub-sections both
reorder to strict alphabetical:

  Claude Code, Codex App, Codex CLI, Cursor, Factory Droid,
  Gemini CLI, GitHub Copilot CLI, OpenCode

Three blocks moved (Codex App swaps with Codex CLI; Cursor moves up
two slots; GitHub Copilot CLI moves up one). Claude Code stays first
by alphabetical chance.

Each install sub-section's content is byte-identical pre/post —
only the positions change. Quickstart anchors verified against the
new heading order.
Misc platform/runtime statements and adjacencies that don't fit the
prose, config-ref, README-ordering, or tool-vocabulary buckets:

- visual-companion frame template: rename CSS/HTML id #claude-content
  → #frame-content. The id is purely styling — nothing external
  references it. The brainstorm-server test that asserted the old
  string is updated in lockstep.

- visual-companion launch instructions: add a Copilot CLI section
  alongside Claude Code, Codex, and Gemini CLI; combine the Claude
  Code (macOS / Linux) and (Windows) sections so heading style
  matches the other (non-OS-qualified) platforms.

- visual-companion: "Use Write tool" → "Use your file-creation tool"
  for the cat/heredoc warning. The prohibition is what's load-
  bearing, not the tool name.

- executing-plans/SKILL.md: list all subagent-capable runtimes
  (Claude Code, Codex CLI, Codex App, Copilot CLI, Gemini CLI) and
  point at the per-platform tool refs as the source of truth.

- executing-plans/SKILL.md: relative path "using-superpowers/
  references/" → "../using-superpowers/references/" to resolve
  correctly from the executing-plans/ directory.

No bundled spec doc here — Phase D was scope-extension work that
took place across rounds, with no standalone spec authored.
Replace Claude-Code-specific tool names in skill prose, prompt
templates, and OpenCode-facing docs with action-language descriptions
that resolve to each runtime's native tool via the per-platform refs.

Changes by category:

- Prose mentions ("Use TodoWrite to track...", "Use Task tool with
  general-purpose type") → action language ("Track each item as a
  todo", "Dispatch a general-purpose subagent")

- Prompt template headers (6 files): "Task tool (general-purpose):"
  → "Subagent (general-purpose):" — preserves the type information
  without naming Claude Code's specific dispatch tool

- DOT flowchart node labels: "Invoke Skill tool" → "Invoke the
  skill"; "Create TodoWrite todo per item" → "Create a todo per
  item"

- OpenCode INSTALL.md and docs/README.opencode.md: replace the old
  "TodoWrite → todowrite, Task → @mention" mapping (which both
  taught a vocabulary skills no longer use AND was wrong about
  @mention being a real OpenCode syntax) with an action-language
  mapping verified against the installed OpenCode CLI's tool
  inventory.

The platform-tools refs landed in Phase B already document each
runtime's resolution; skills now speak in the actions those refs
map. Tool names that genuinely belong only in the per-platform
dispatch section ("In Claude Code: Use the `Skill` tool") and the
Claude-Code-specific Bash run_in_background flag note in
visual-companion remain — those are intentional carve-outs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant