Skip to content

Commit e856756

Browse files
authored
fix(ce-code-review): keep finding numbers stable (#754)
1 parent a84cb75 commit e856756

4 files changed

Lines changed: 74 additions & 4 deletions

File tree

plugins/compound-engineering/skills/ce-code-review/SKILL.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ Sequence:
6767
- **Skip all user questions.** Never pause for approval or clarification once scope has been established.
6868
- **Apply only `safe_auto -> review-fixer` findings.** Leave `gated_auto`, `manual`, `human`, and `release` work unresolved.
6969
- **Write a run artifact** under `/tmp/compound-engineering/ce-code-review/<run-id>/` summarizing findings, applied fixes, residual actionable work, and advisory outputs. Orchestrators read this artifact to route residual `downstream-resolver` findings; the skill itself does not file tickets or prompt the user in autofix.
70-
- **Emit a compact Residual Actionable Work summary in the autofix return** listing each residual `downstream-resolver` finding with severity, file:line, title, and autofix_class. Include the run-artifact path. Callers read this summary directly without parsing the artifact. When no residuals exist, state `Residual actionable work: none.` explicitly.
70+
- **Emit a compact Residual Actionable Work summary in the autofix return** listing each residual `downstream-resolver` finding with its stable `#`, severity, file:line, title, and autofix_class. Structure the summary as two separate contiguous sections: applied `safe_auto` fixes first, then residual non-auto findings. Within the residual section, reuse each finding's stable `#` from Stage 5 -- never renumber. Include the run-artifact path. Callers read this summary directly without parsing the artifact. When no residuals exist, state `Residual actionable work: none.` explicitly.
7171
- **Never commit, push, or create a PR** from autofix mode. Parent workflows own those decisions.
7272

7373
### Report-only mode rules
@@ -550,7 +550,7 @@ Demotion is intentionally narrow. The conservative scope (testing/maintainabilit
550550
- in-skill fixer queue: only `safe_auto -> review-fixer`
551551
- residual actionable queue: unresolved `gated_auto` or `manual` findings whose owner is `downstream-resolver`
552552
- report-only queue: `advisory` findings plus anything owned by `human` or `release`
553-
9. **Sort.** Order by severity (P0 first) -> anchor (descending) -> file path -> line number.
553+
9. **Sort and number.** Order by severity (P0 first) -> anchor (descending) -> file path -> line number, then assign monotonically increasing `#` values across the full primary finding set in that sorted order. Do not restart numbering inside each severity table or autofix/routing bucket. If later sections repeat a finding (for example Residual Actionable Work after `safe_auto` fixes are applied), reuse the same stable `#` so users -- and downstream skills like `ce-resolve-pr-feedback` -- can reference findings by `#` after the autofix loop rewrites the report. Renumbering after autofix invalidates any prior reference: copied snippets, follow-up prompts citing `#3`, or tickets filed against an earlier render.
554554
10. **Collect coverage data.** Union residual_risks and testing_gaps across reviewers.
555555
11. **Preserve CE agent artifacts.** Keep the learnings, agent-native, schema-drift, and deployment-verification outputs alongside the merged finding set. Do not drop unstructured agent output just because it does not match the persona JSON schema.
556556

@@ -600,7 +600,7 @@ When Stage 5b does not run, the merged finding set from Stage 5 flows through to
600600
Assemble the final report using **pipe-delimited markdown tables for findings** from the review output template included below. The table format is mandatory for finding rows in interactive mode — do not render findings as freeform text blocks or horizontal-rule-separated prose. Other report sections (Applied Fixes, Learnings, Coverage, etc.) use bullet lists and the `---` separator before the verdict, as shown in the template.
601601

602602
1. **Header.** Scope, intent, mode, reviewer team with per-conditional justifications.
603-
2. **Findings.** Rendered as pipe-delimited tables grouped by severity (`### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`). Each finding row shows `#`, file, issue, reviewer(s), confidence, and synthesized route. Omit empty severity levels. Never render findings as freeform text blocks or numbered lists.
603+
2. **Findings.** Rendered as pipe-delimited tables grouped by severity (`### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`). Each finding row shows `#`, file, issue, reviewer(s), confidence, and synthesized route. Omit empty severity levels. Never render findings as freeform text blocks or numbered lists. Finding numbers come from the stable assignment in Stage 5 -- never re-derive them per severity table.
604604
3. **Requirements Completeness.** Include only when a plan was found in Stage 2b. For each requirement (R1, R2, etc.) and implementation unit in the plan, report whether corresponding work appears in the diff. Use a simple checklist: met / not addressed / partially addressed. Routing depends on `plan_source`:
605605
- **`explicit`** (caller-provided or PR body): Flag unaddressed requirements as P1 findings with `autofix_class: manual`, `owner: downstream-resolver`. These enter the residual actionable queue.
606606
- **`inferred`** (auto-discovered): Flag unaddressed requirements as P3 findings with `autofix_class: advisory`, `owner: human`. These stay in the report only — no autonomous follow-up. An inferred plan match is a hint, not a contract.

plugins/compound-engineering/skills/ce-code-review/references/review-output-template.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ Use this **exact format** when presenting synthesized review findings. Findings
5151
| # | File | Issue | Route | Next Step |
5252
|---|------|-------|-------|-----------|
5353
| 1 | `orders_controller.rb:42` | Ownership check missing on export lookup | `gated_auto -> downstream-resolver` | Defer via tracker (requires explicit approval before behavior change) |
54-
| 2 | `export_service.rb:91` | Pagination contract needs a broader API decision | `manual -> downstream-resolver` | Defer via tracker with contract and client impact details |
54+
| 3 | `export_service.rb:91` | Pagination contract needs a broader API decision | `manual -> downstream-resolver` | Defer via tracker with contract and client impact details |
5555

5656
### Pre-existing Issues
5757

@@ -117,6 +117,7 @@ This fails because: no pipe-delimited tables, no severity-grouped `###` headers,
117117

118118
- **Pipe-delimited markdown tables** for findings -- never ASCII box-drawing characters or per-finding horizontal-rule separators between entries (the report-level `---` before the verdict is still required)
119119
- **Severity-grouped sections** -- `### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`. Omit empty severity levels.
120+
- **Stable sequential finding numbers** -- assign finding numbers once after sorting, continue them across severity sections, and reuse those same numbers when findings are repeated in Residual Actionable Work. Do not restart at `1` for each severity or route bucket.
120121
- **Always include file:line location** for code review issues
121122
- **Reviewer column** shows which persona(s) flagged the issue. Multiple reviewers = cross-reviewer agreement.
122123
- **Confidence column** shows the finding's anchor as an integer (`50`, `75`, or `100`). Never render as a float.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
## Code Review Results
2+
3+
**Scope:** merge-base with main -> working tree
4+
**Intent:** Demonstrate stable finding numbering
5+
**Mode:** autofix
6+
7+
**Reviewers:** correctness, testing, maintainability
8+
9+
### P1 -- High
10+
11+
| # | File | Issue | Reviewer | Confidence | Route |
12+
|---|------|-------|----------|------------|-------|
13+
| 1 | `export_service.rb:87` | Loads all orders into memory | performance | 100 | `safe_auto -> review-fixer` |
14+
| 2 | `export_service.rb:91` | Missing pagination contract | api-contract | 75 | `manual -> downstream-resolver` |
15+
16+
### P2 -- Moderate
17+
18+
| # | File | Issue | Reviewer | Confidence | Route |
19+
|---|------|-------|----------|------------|-------|
20+
| 3 | `export_service.rb:45` | Missing error handling | correctness | 75 | `gated_auto -> downstream-resolver` |
21+
22+
### Applied Fixes
23+
24+
- `safe_auto`: Applied bounded export loading fix for #1.
25+
26+
### Residual Actionable Work
27+
28+
| # | File | Issue | Route | Next Step |
29+
|---|------|-------|-------|-----------|
30+
| 2 | `export_service.rb:91` | Missing pagination contract | `manual -> downstream-resolver` | Defer via tracker with API contract context |
31+
| 3 | `export_service.rb:45` | Missing error handling | `gated_auto -> downstream-resolver` | Defer via tracker pending behavior approval |

tests/review-skill-contract.test.ts

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -731,8 +731,46 @@ describe("ce-code-review contract", () => {
731731
test("ce-code-review autofix emits a residual-work summary in-chat, not only in the artifact", async () => {
732732
const content = await readRepoFile("plugins/compound-engineering/skills/ce-code-review/SKILL.md")
733733
expect(content).toMatch(/Emit a compact Residual Actionable Work summary/)
734+
expect(content).toContain("with its stable `#`, severity, file:line, title, and autofix_class")
735+
expect(content).toContain("Structure the summary as two separate contiguous sections")
736+
expect(content).toContain("applied `safe_auto` fixes first, then residual non-auto findings")
737+
expect(content).toContain("reuse each finding's stable `#` from Stage 5 -- never renumber")
734738
expect(content).toContain("Residual actionable work: none.")
735739
})
740+
741+
test("ce-code-review uses stable sequential finding numbers across grouped output", async () => {
742+
const content = await readRepoFile("plugins/compound-engineering/skills/ce-code-review/SKILL.md")
743+
const template = await readRepoFile(
744+
"plugins/compound-engineering/skills/ce-code-review/references/review-output-template.md",
745+
)
746+
const fixture = await readRepoFile("tests/fixtures/ce-code-review-stable-numbering.md")
747+
748+
const stage5 = content.split("### Stage 5b:")[0].split("### Stage 5:")[1]
749+
expect(stage5).toMatch(/Sort and number/)
750+
expect(stage5).toMatch(/Do not restart numbering inside each severity table or autofix\/routing bucket/)
751+
expect(stage5).toMatch(/reuse the same stable `#`/)
752+
expect(stage5).toMatch(/ce-resolve-pr-feedback/)
753+
754+
const stage6 = content.split("### Headless output format")[0].split("### Stage 6: Synthesize and present")[1]
755+
expect(stage6).toContain("Finding numbers come from the stable assignment in Stage 5")
756+
expect(stage6).toContain("never re-derive them per severity table")
757+
expect(template).toContain("Stable sequential finding numbers")
758+
expect(template).toContain("reuse those same numbers when findings are repeated in Residual Actionable Work")
759+
760+
const primaryFindingIds = Array.from(
761+
fixture.matchAll(/^\| (\d+) \| `[^`]+` \| .* \| .* \| \d+ \| `.*` \|$/gm),
762+
([, id]) => Number(id),
763+
)
764+
expect(primaryFindingIds).toEqual([1, 2, 3])
765+
766+
const residualSection = fixture.split("### Residual Actionable Work")[1]
767+
const residualIds = Array.from(
768+
residualSection.matchAll(/^\| (\d+) \| `[^`]+` \| .* \| `.*` \| .* \|$/gm),
769+
([, id]) => Number(id),
770+
)
771+
expect(residualIds).toEqual([2, 3])
772+
expect(residualIds.every((id) => primaryFindingIds.includes(id))).toBe(true)
773+
})
736774
})
737775

738776
describe("testing-reviewer contract", () => {

0 commit comments

Comments
 (0)