Skip to content

Commit 06e2eb0

Browse files
authored
fix: harden simplification safety and add review deletion signal (#969)
1 parent c759a26 commit 06e2eb0

4 files changed

Lines changed: 11 additions & 5 deletions

File tree

docs/skills/ce-simplify-code.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ The compound-engineering ideation chain is `/ce-ideate → /ce-brainstorm → /c
1616
|----------|--------|
1717
| What does it do? | Spawns three parallel reviewer agents on the recently-changed code, applies their findings, and verifies behavior is preserved |
1818
| When to use it | Before opening a PR; after writing a feature; after AI generated code that works but feels heavy |
19-
| What it produces | Updated code (in place) + a summary of what was changed, what was good as-is, and which checks ran |
19+
| What it produces | Updated code (in place) + a summary of what was changed, what was good as-is, which checks ran, and a quantified impact by dimension (fixes applied per reuse/quality/efficiency, skipped count, verification result) |
2020
| What's next | Open the PR via `/ce-commit-push-pr` |
2121

2222
---
@@ -52,7 +52,7 @@ The orchestrator aggregates their findings, applies fixes, and runs typecheck +
5252

5353
A single "review and improve" prompt collapses into the agent's most-trained directions. Three reviewers each focused on one dimension cover meaningfully more ground:
5454

55-
- **Reuse** — searches for existing utilities and helpers; flags new functions that duplicate existing ones; flags inline logic that could use an existing utility
55+
- **Reuse** — searches for existing utilities and helpers; flags new functions that duplicate existing ones; flags inline logic that could use an existing utility; flags diff code that reimplements a language standard-library or runtime primitive (gated on behavior-equivalence, excluding UX-changing swaps)
5656
- **Quality** — redundant state, parameter sprawl, copy-paste with variation, leaky abstractions, stringly-typed code, unnecessary wrappers (in component-tree UI frameworks), deeply nested conditionals, unnecessary comments, dead code / unused imports / unused exports
5757
- **Efficiency** — unnecessary work (redundant computations, repeat reads), missed concurrency, hot-path bloat, recurring no-op updates, TOCTOU pre-checks, memory issues, overly broad operations
5858

@@ -62,7 +62,7 @@ The skill resolves the simplification scope in priority order: explicit user-nam
6262

6363
### 3. Behavior preservation verification
6464

65-
After applying fixes, the skill runs typecheck and lint over the project and runs tests scoped to the changed paths (broadening when the change has wide reach — e.g., a heavily-imported utility was rewritten). Failures are surfaced clearly with the failing check name and relevant output. **The skill refuses to relax assertions, weaken type signatures, or skip tests to make checks pass** — either fix the underlying break or revert the specific simplification that caused it.
65+
After applying fixes, the skill runs typecheck and lint over the project and runs tests scoped to the changed paths (broadening when the change has wide reach — e.g., a heavily-imported utility was rewritten). Failures are surfaced clearly with the failing check name and relevant output. **The skill refuses to relax assertions, weaken type signatures, or skip tests to make checks pass** — either fix the underlying break or revert the specific simplification that caused it. It also **never simplifies away a safety check** — input validation at trust boundaries, data-loss-preventing error handling, security checks, and accessibility affordances are preserved even when a finding frames them as removable boilerplate.
6666

6767
### 4. Mid-tier model selection — cost-aware
6868

plugins/compound-engineering/skills/ce-code-review/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -550,7 +550,7 @@ Per-severity tables are **5 columns** — `Route` is not shown here (it appears
550550
7. **Learnings & Past Solutions.** Surface ce-learnings-researcher results: if past solutions are relevant, flag them as "Known Pattern" with links to docs/solutions/ files.
551551
8. **Agent-Native Gaps.** Surface ce-agent-native-reviewer results. Omit section if no gaps found.
552552
9. **Deployment Notes.** If ce-deployment-verification-agent ran, surface the key Go/No-Go items: blocking pre-deploy checks, the most important verification queries, rollback caveats, and monitoring focus areas. Keep the checklist actionable rather than dropping it into Coverage. Schema drift appears in the findings tables as `data-migration` P1 rows — do not add a separate Schema Drift section.
553-
10. **Coverage.** Applied count (when Stage 5c ran), suppressed count by anchor (e.g., "N findings suppressed at anchor 50, M at anchor 25"), mode-aware demotion count, validator drop count and reasons (when Stage 5b ran), any P0/P1 with degraded validation (kept on validator infra failure), validator over-budget drops (when the 15-cap fired), residual risks, testing gaps, failed/timed-out reviewers, and inferred-intent uncertainty when applicable.
553+
10. **Coverage.** Applied count (when Stage 5c ran), suppressed count by anchor (e.g., "N findings suppressed at anchor 50, M at anchor 25"), mode-aware demotion count, validator drop count and reasons (when Stage 5b ran), any P0/P1 with degraded validation (kept on validator infra failure), validator over-budget drops (when the 15-cap fired), residual risks, testing gaps, failed/timed-out reviewers, and inferred-intent uncertainty when applicable. **Removable surface (only when deletion-oriented maintainability findings exist):** one line giving the approximate net lines/files those findings would remove if applied (e.g., "Removable surface: ~120 lines / 2 files across findings #4, #7"). This is a dead-weight signal, **not** a reduction target — never lower the bar for a finding or invent deletions to grow the number, and omit the line entirely when no finding proposes a deletion.
554554
11. **Verdict.** Ready to merge / Ready with fixes / Not ready. Fix order if applicable. When an `explicit` plan has unaddressed requirements or implementation units, the verdict must reflect it — a PR that's code-clean but missing planned requirements is "Not ready" unless the omission is intentional. When an `inferred` plan has unaddressed requirements or implementation units, note it in the verdict reasoning but do not block on it alone.
555555

556556
Do not include time estimates.

plugins/compound-engineering/skills/ce-code-review/references/review-output-template.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ Committed: `fix(review): cover empty-format branch + tighten export perms` (work
9595
### Coverage
9696

9797
- Suppressed: 2 findings below anchor 75 (1 at anchor 50, 1 at anchor 25)
98+
- Removable surface: ~40 lines / 1 file across findings #5 (signal only, not a target)
9899
- Residual risks: No rate limiting on export endpoint
99100
- Testing gaps: No test for concurrent export requests
100101

@@ -148,7 +149,7 @@ This fails because: no pipe-delimited tables, no severity-grouped `###` headers,
148149
- **Learnings & Past Solutions section** -- results from ce-learnings-researcher, with links to docs/solutions/ files
149150
- **Agent-Native Gaps section** -- results from ce-agent-native-reviewer. Omit if no gaps found.
150151
- **Deployment Notes section** -- key checklist items from ce-deployment-verification-agent. Omit if the agent did not run. Schema drift surfaces as `data-migration` findings — no separate section.
151-
- **Coverage section** -- suppressed count, residual risks, testing gaps, failed reviewers
152+
- **Coverage section** -- suppressed count, removable surface (only when deletion-oriented maintainability findings exist; approximate net lines/files removable if applied -- a dead-weight signal, never a reduction target, omit otherwise), residual risks, testing gaps, failed reviewers
152153
- **Summary uses blockquotes** for verdict, reasoning, and fix order
153154
- **Horizontal rule** (`---`) separates findings from verdict
154155
- **`###` headers** for each section -- never plain text headers

plugins/compound-engineering/skills/ce-simplify-code/SKILL.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ For each change:
3333
1. **Search for existing utilities and helpers** that could replace newly written code. Look for similar patterns elsewhere in the codebase — common locations are utility directories, shared modules, and files adjacent to the changed ones.
3434
2. **Flag any new function that duplicates existing functionality.** Suggest the existing function to use instead.
3535
3. **Flag any inline logic that could use an existing utility** — hand-rolled string manipulation, manual path handling, custom environment checks, ad-hoc type guards, and similar patterns are common candidates.
36+
4. **Flag diff code that reimplements a language standard-library or runtime primitive** — a hand-written routine the built-in stdlib/runtime API already provides (e.g., a manual array-dedup loop where the language ships a set-based idiom, a hand-rolled deep-clone/deep-merge where the runtime has one). Suggest the built-in **only when it is behavior-equivalent** for the inputs actually in play. Do not propose swaps that change behavior or UX: native UI controls (e.g., a custom date picker to `<input type=date>`), locale/`Intl`-dependent formatting, sort-stability assumptions, and serialization edge cases differ from their hand-rolled versions and are out of scope for a behavior-preserving pass.
3637

3738
### Agent 2: Code Quality Reviewer
3839

@@ -68,6 +69,8 @@ Wait for all three agents to complete. Aggregate their findings and fix each iss
6869

6970
Before applying each fix, confirm it preserves behavior: same output for every input, same error behavior, and same side effects and ordering. If a fix can't clear that test, skip it — automated checks in Step 4 don't cover every behavior.
7071

72+
**Never simplify away a safety check.** Input validation at trust boundaries, error handling that prevents data loss, security checks (authorization, escaping, sanitization), and accessibility affordances are not removable boilerplate — preserve them even when a finding frames them as redundant or inline-able. Code that drops one of these is not simpler, it is unfinished. If a proposed simplification would thin or remove one, skip it.
73+
7174
## Step 4: Verify behavior is preserved
7275

7376
The premise of this skill is that simplification preserves exact functionality. After applying fixes:
@@ -86,3 +89,5 @@ If no test suite, lint, or typecheck is configured, state that explicitly in the
8689
## Step 5: Summarize
8790

8891
Briefly summarize what was good vs improved and fixed, including which checks were run and their results. If there were no findings to act on, confirm the code didn't require any changes.
92+
93+
**Quantify the impact by dimension.** Report what was actually applied, not a line count: fixes applied per reviewer dimension (reuse, quality, efficiency), how many findings were skipped as false-positive or not worth addressing, and the behavior-preservation result (checks run and outcome). For example: "Applied 6 — reuse 2, quality 3, efficiency 1; skipped 2 false positives; typecheck + lint clean, 11 scoped tests pass." Do not headline a net-lines-removed figure or frame fewer lines as the win — many clarity, safety, and efficiency fixes preserve or add lines. The measure is what improved and that behavior held, not how much code shrank.

0 commit comments

Comments
 (0)