Skip to content

feat: add pi extension and eval backend#1499

Open
obra wants to merge 4 commits into
devfrom
pi-extension-evals
Open

feat: add pi extension and eval backend#1499
obra wants to merge 4 commits into
devfrom
pi-extension-evals

Conversation

@obra
Copy link
Copy Markdown
Owner

@obra obra commented May 8, 2026

What problem are you trying to solve?

Pi can discover Superpowers skills, but Superpowers did not have a proper Pi package integration that loads the using-superpowers bootstrap at the right lifecycle points. In practice that means the skills can be present on disk without the startup instructions that make the agent check and load them before acting. The desired Pi behavior is a package users can install with Pi that loads the bootstrap at session startup and again after compaction, because compaction can remove the earlier bootstrap context.

This came from a Pi session where the human partner asked for a proper Superpowers extension for Pi: installable as a Pi package, with using-superpowers loaded as a user message at session startup and after compact, plus Pi-specific tool mapping and Pi coverage in the eval harness.

What does this PR change?

Adds a Pi package manifest and Pi extension under .pi/extensions/superpowers.ts that exposes the bundled skills and injects the using-superpowers bootstrap into Pi context on session start and after session_compact. Adds a Pi tool-mapping reference, Pi install docs, and a Drill pi backend with Pi session-log normalization and tests.

Is this change appropriate for the core library?

Yes. This is new harness support for Pi, which is general-purpose infrastructure for using the existing Superpowers workflow skills in another coding-agent harness. It does not add domain-specific skills or project-specific configuration.

What alternatives did you consider?

  • Relying only on Pi's native skill discovery: rejected because skill discovery alone does not inject the using-superpowers bootstrap that causes automatic skill checks.
  • Adding a compatibility Skill tool: rejected because Pi already has native skill support; the integration only needs Pi-specific instructions/tool mapping.
  • Hard-depending on subagent/task-list packages: rejected because Pi does not have one canonical subagent or task-list package. The extension documents optional mappings and keeps those companion packages optional.
  • Putting the extension in top-level extensions/: changed to .pi/extensions/ to match this repo's existing harness-specific layout.

Does this PR contain multiple unrelated changes?

No. The package extension, Pi tool mapping, docs, and Drill backend are all part of adding first-class Pi harness support and validating it.

Existing PRs

#500 (docs(pi): add experimental pi support (Phase 1)) is open and adds documentation/smoke-test oriented Pi support against main. This PR targets dev and is materially different: it adds runtime bootstrap injection at session start and after compaction, plus a Drill backend for Pi session logs. It should still be reconciled with #500 before merge.

#1440 (Add pi-review and pi-refine skills using pi-subagents) was closed and added Pi/subagent-specific skills. This PR does not add new workflow skills and does not require pi-subagents; subagents remain optional companion packages.

Environment tested

Harness (e.g. Claude Code, Cursor) Harness version Model Model version/ID
Pi coding agent API harness 0.74.0 OpenAI Codex gpt-5.5
Pi CLI smoke test 0.74.0 configured Pi default configured Pi default
Node.js test runner v25.2.1 n/a n/a
Drill unit tests via uv/Python uv 0.11.8, Python 3.14.3 n/a n/a

Implementation session ID: 019e0356-3e52-751f-af4a-d05ba8d44a75

New harness support (required if this PR adds a new harness)

Clean-session transcript for "Let's make a react todo list"
$ PI_SKIP_VERSION_CHECK=1 pi -e . --no-session -p "Let's make a react todo list"
Some of what we're working on might be easier to explain if I can show it to you in a web browser. I can put together mockups, diagrams, comparisons, and other visuals as we go. This feature is still new and can be token-intensive. Want to try it? (Requires opening a local URL)

This is the first response from the brainstorming skill, before code was written.

Evaluation

  • Initial prompt from the human partner: "We're working on building a proper superpowers extension for pi. that means easily installable and with the using-superpowers bootstrap getting loaded as a user message at session startup time AND after compact. it also probably means a pi-tools.md i think we also need dependencies on pi extensions that provide subagnets and task lists etc."
  • Eval sessions after making the change: 1 Pi smoke acceptance run with the required "Let's make a react todo list" prompt. No full multi-run Drill sweep has been run yet.
  • Before: Pi could discover skills but had no package-local bootstrap extension that re-injected using-superpowers after compaction.
  • After: the Pi smoke test auto-triggered brainstorming, and targeted tests verify package manifest, startup injection, post-compact injection, Pi tool mapping, and Pi Drill log normalization.

Needs real-world user testing: this PR adds the runtime integration and test coverage, but should be exercised in real interactive Pi sessions across startup, /compact, auto-compaction, and resumed sessions before considering the integration fully proven.

Verification commands run:

node --experimental-strip-types --check .pi/extensions/superpowers.ts
node --experimental-strip-types --test tests/pi/test-pi-extension.mjs
uv --project evals run ruff check evals/drill evals/tests
uv --project evals run ty check evals/drill evals/tests
uv --project evals run pytest evals/tests/test_backend.py evals/tests/test_setup.py evals/tests/test_engine.py evals/tests/test_normalizer.py -q

Results: Node tests 6/6 pass; Ruff all checks passed; Ty all checks passed; targeted pytest suite 49 passed.

Rigor

  • If this is a skills change: I used superpowers:writing-skills and completed adversarial pressure testing (paste results below)
  • This change was tested adversarially, not just on the happy path
  • I did not modify carefully-tuned content (Red Flags table, rationalizations, "human partner" language) without extensive evals showing the change is an improvement

This PR adds a Pi-specific reference document but does not rewrite existing behavior-shaping skill content. A reviewer subagent reviewed the implementation twice; one issue about Bash source classification was fixed, and one issue about persistent-vs-context injection was evaluated against the requirement and test behavior.

Human review

  • A human has reviewed the COMPLETE proposed diff before submission

Complete diff for review: dev...pi-extension-evals

@gadgj
Copy link
Copy Markdown

gadgj commented May 8, 2026

Thank you for all the work that went into this PR — this feature is exactly what I’ve been needing for my Pi + Superpowers workflow. Having the bootstrap automatically injected on session startup and after compaction makes a huge difference in real-world usage, and the design here feels clean and well thought out. Really appreciate the thorough testing and attention to detail.

Also looping in the author of #500, since your earlier work laid much of the groundwork for Pi support. It would be great to have your eyes on this PR as well and help push the Pi integration forward together.

Thanks again to everyone contributing to this effort — it’s exciting to see Pi support becoming more complete.

@Jefferson-Butler1
Copy link
Copy Markdown

Huge thanks for the work in this branch! Jazzed about more Pi support, everything seems to work well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants