You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From task-3170a13a retrospective (order reversal, PR ocannl-staging#27): two
captures — (1) a relabel's completeness check is a repo-wide negative grep
minus an allowlist (case-insensitive, plural-aware, exclude historical
records), not a positive sweep of planned files; (2) a label-printing golden
test staying all-PASS is the direct decision-set-unchanged gate. Via
/ludics-process-suggestions.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/swe-textbook.md
+16Lines changed: 16 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -681,3 +681,19 @@ Description: When an acceptance criterion names two places a value must appear
681
681
Precipitating retro: `task-35e74651` (PR #557, 2026-06-05). AC4 named both the briefing-lag stale-sentinel note and the health-check `outbound-staging-ff-stale:<project>` finding as carriers of the outbound-push cause+remedy annotation. The briefing-lag note had a clean `src/` seam (`src/briefing-lag.ts` reading `latestOutboundCauseRemedy`); the health-check finding is computed inline in `skills/ludics-health-check.md` bash with no `src/` read-boundary, and the AC forbade adding skill prose. The coder implemented the briefing-lag arm, left the skill markdown byte-unchanged, cited the proposal's own flagged-ambiguity lines (~209–223) authorizing the narrower scope, and the deferred surface was filed as follow-up `task-c4aedd6b`.
682
682
683
683
Filter decision: Under the competent-SWE filter this is "obvious-to-experienced-engineer" scope discipline — knowing not to fabricate plumbing or violate a stated constraint to satisfy an AC's letter is general engineering judgment. Captured here rather than promoted to always-loaded prompts because the failure mode is shape-specific (a multi-surface AC where the surfaces have asymmetric implementability) and the always-loaded `feedback_self_contradicting_ac_revise_not_fabricate.md` already covers the broader "don't fabricate to satisfy a contradictory AC" family; the textbook entry preserves the specific "implement the seam, cite the proposal's authorization for the gap, file the rest as follow-up" resolution as a precedent for coders facing partially-implementable ACs.
684
+
685
+
### A relabel's completeness check is a repo-wide negative grep minus an allowlist, not a positive sweep of the planned files
686
+
687
+
Description: When a change relabels a vocabulary across a codebase — a rename, a terminology flip, an order reversal — the question "did I get all of it?" is not answered by greping the files you planned to touch and finding them clean. "Clean over the swept files" is not "no stale vocabulary remains": the stale term hides in files outside the plan (a `.mli` interface, an `AGENTS.md`, a paper `.tex`, a constructor name in a module you didn't list). The reliable completeness check inverts the method: run the term-list grep **repo-wide**, then **subtract an explicit allowlist** of known-OK hits, and require the remainder to be empty. Three refinements the naive grep misses: (a) make it **case-insensitive** — the term appears capitalized in prose and in type constructors (`Cur`/`Subr`, `Least Upper Bounds`), not just lowercased in code; (b) make it **plural- and morphology-aware** (`LUBs?`, not `LUB`) — the plural form slips a `\bLUB\b` anchor; (c) the allowlist is not just false-positive substrings (`cur_sh`, `occur`) but also **intentionally-correct-post-change strings**: under a structural transform like an order reversal, a surface phrase can read like stale vocab yet be genuinely correct in its new referent (after reversing the lattice, "any two dims have a least upper bound" is a true statement about the join-semilattice's joins, distinct from the maintained solver bound that is now the GLB). Distinguishing "stale" from "intentional dual" is part of building the allowlist, and is a judgment call the bare grep cannot make. A companion scope rule: **exclude historical-record files** from the sweep — changelogs (`CHANGES.md`), decision logs, and superseded proposals legitimately preserve the old vocabulary because they describe what was true at a past version; rewriting them to the new terms falsifies the record. Current guidance and exposition (`AGENTS.md`, `docs/*.md`, `*.tex`) get reoriented; historical records do not.
688
+
689
+
Precipitating retro: `task-3170a13a` (PR ocannl-staging#27, 2026-06-05) — the broadcast-order reversal (LUB→GLB, top↔bottom). The coder's round-1 checklist (`rg '\bLUB\b|least upper|\bcur\b|\bsubr\b'`, case-sensitive, over the planned files) reported clean but silently missed `Least Upper Bounds (LUBs)` in `shape.mli`, the capitalized `Cur`/`Subr` constructors in `row.ml`, and `LUB` in `AGENTS.md`. The round-2 fix re-ran `rg -in 'broadcast.{0,4}bottom|\bLUBs?\b|\blub\b|least upper bound|\b[Ss]ubr\b|\b[Cc]ur\b|…'` repo-wide minus an allowlist (`cur_sh`/`current`/`occur*`, the intentional join-semilattice duals, traversal `bottom-up`), excluding `docs/proposals/*` and `CHANGES.md` as historical records, and required zero remaining hits.
690
+
691
+
Filter decision: Under the competent-SWE filter this is "obvious-to-experienced-engineer" sweep thoroughness — a competent engineer knows a grep can be too narrow. Captured here rather than promoted to always-loaded prompts because the failure mode is recognition-shaped and method-specific (the inversion to negative-grep-minus-allowlist, the case/plural sensitivity, the intentional-dual allowlist, the historical-record exclusion) and pairs with the existing "Config-key removal sweep covers all of `templates/`" and "a guard added to a shared validator reaches every transitive caller" entries — all three are "the sweep is wider than first enumerated", this one specialised for vocabulary relabels under a structural transform.
692
+
693
+
### For a behavior-preserving relabel, a label-printing golden test that stays all-PASS is the direct decision-set-unchanged gate
694
+
695
+
Description: When a change is supposed to preserve behavior while renaming things (a pure relabel, an order-reversal, a terminology flip), the cheapest and most direct proof that no decision actually moved is an existing **golden/`%expect`/cram test that prints the system's own accept/reject (or classify, or dispatch) labels** for a battery of cases. If that test's expected output stays byte-identical except for the deliberately-changed wording — every case still reads `PASS`, or the same branch label — then the accept/reject *set* is provably unchanged, because a flipped comparison or reversed branch would move at least one case across the boundary and show up as a `PASS→FAIL` (or relabeled-decision) diff. This is stronger and cheaper than re-deriving invariance from the diff of the production code: you don't have to argue that no `<=` became `>=`; the golden test would have caught it. The discipline at relabel time: identify the test that enumerates the decision surface in printed form, freeze it, and treat *any* non-wording diff in it as the stop-and-flag signal that the relabel changed behavior. If no such test exists, writing one that prints the labels for the boundary cases is the highest-leverage safety net before starting the rename.
696
+
697
+
Precipitating retro: `task-3170a13a` (PR ocannl-staging#27, 2026-06-05). The order reversal flipped the `⊑` operands and renamed `meet_dim→join_dim`. `test/einsum/test_basis_total_order.expected` prints `PASS`/`FAIL` for a battery of `d1 ⊑ d2` accept/reject cases (`5_rgb ⊑ 1_(bcast_if_1) accepts`, `5_rgb ⊑ 5_(bcast_if_1) rejects`, …); it stayed all-`PASS` after the relabel, directly proving the accept/reject set didn't move, while a `solve_dim_ineq` comparison reversal would have flipped at least one case. The only `.expected` diffs were wording/glyph-order promotes.
698
+
699
+
Filter decision: Under the competent-SWE filter this is "obvious-to-experienced-engineer" test-strategy literacy — using a golden test as a behavior-invariance gate is general regression-testing sense. Captured here rather than promoted to always-loaded prompts because it is recognition-shaped (the cue is "I'm doing a behavior-preserving relabel — which existing test prints the decisions I must not move?") and complements the always-loaded post-edit-grep discipline with a test-side gate; pairs with the vacuous-assertion entries (a label-printing golden test is the *non-vacuous* counterpart — it fails under exactly the mutation a relabel risks).
0 commit comments