Skip to content

feat(mag): docs/swe-textbook.md as Mag-side write memory for filter-rejected retros#499

Merged
lukstafi merged 3 commits intomainfrom
ludics/task-c4e0e80a-s2/root
May 5, 2026
Merged

feat(mag): docs/swe-textbook.md as Mag-side write memory for filter-rejected retros#499
lukstafi merged 3 commits intomainfrom
ludics/task-c4e0e80a-s2/root

Conversation

@lukstafi
Copy link
Copy Markdown
Owner

@lukstafi lukstafi commented May 5, 2026

Summary

Implements task-c4e0e80a. Introduces docs/swe-textbook.md as a write-only journal for competent-SWE-filter-rejected retro learnings, plus a capture-textbook disposition in /ludics-process-suggestions and the /ludics-feedback-digest worker. The textbook is consulted only by Mag and the feedback-digest worker; coder/reviewer agent prompts gain no pull-side pointer (AC7 negative control).

  • New doc: docs/swe-textbook.md — preamble enumerates five directionality statements, four labelled entry-shape fields, single canonical ## Capture Idempotency section with the bash snippet, seed entry citing gh-ocannl-270.
  • Three skill files gain capture-textbook: ludics-process-suggestions.md (three-way classify, new <!-- section:capture-textbook --> step, sibling textbookCaptures array, third Judgment Criteria bucket); ludics-feedback-digest-worker.md (new ### 3a step, 7th Response Contract field); ludics-feedback-digest.md (Status routing + Result fields + output template).
  • worker-conventions.md: one new row in ## Field Contract Reference (feedback-digest | textbookCaptures). The row contains no swe-textbook substring → AC7 stays clean.

AC verification

  • AC1, AC2, AC6docs/swe-textbook.md preamble has the five directionality statements + four labelled fields; seed entry instantiates fields a–d citing gh-ocannl-270. Pinned by docs/swe-textbook.shape.test.ts (AC1/AC2/AC6 tests).
  • AC3<!-- section:classify --> slice has the three-way split with capture-textbook; <!-- section:write-result --> JSON example has textbookCaptures: [{suggestion, entryHeadline, precipitatingRetro}]; ## Judgment Criteria has a third bucket. Pinned by AC3a/AC3b/AC3c shape-test slices.
  • AC4 — Worker ### 3a step + 7th Response Contract field; orchestrator ## Status routing row + ## Result fields JSON + output template extension; worker-conventions.md field-contract row. Pinned by AC4a–AC4d shape-test assertions; cross-checked by scripts/lint-contracts.ts (worker ### Response Contract ↔ orchestrator ## Status routing field-pair coherence).
  • AC5 — Idempotency guard lives only in docs/swe-textbook.md#capture-idempotency. Both skills cite the anchor and describe inputs/outputs in prose; neither duplicates the bash snippet. Enforced by AC5-positive (cardinality + canonical-snippet presence + corrected ENTRY_HEADLINE input contract per feat(mag): docs/swe-textbook.md as Mag-side write memory for filter-rejected retros #499 review) AND AC5-negative (no skill body contains grep -Fq "### ${ENTRY_HEADLINE}", grep -Fq "${PRECIPITATING_RETRO}", echo "skip-duplicate", or echo "append") shape-test assertions.
  • AC7 — Recursive walker in docs/swe-textbook.shape.test.ts asserts no non-allowlisted skills/**.md file contains swe-textbook; mutation-tested by appending the string to worker-conventions.md (assertion fired) and reverting. The proposal's literal grep has a known mechanical defect (its -v only strips diff header lines, not in-scope-file content lines); the corrected per-file grep returns zero hits, and the shape-test walker is the canonical enforcement.
  • AC8git diff --name-only main...HEAD | grep -E '^src/(coder|reviewer|orchestration)/' returns zero lines.

Test plan

  • bun test → 2228 pass / 0 fail (was 2205 pre-edit; +23 new tests).
  • bun test docs/swe-textbook.shape.test.ts → 23/23 pass / 68 expect() calls.
  • bun test scripts/lint-contracts.test.ts → 31/31 pass.
  • bun run lint:contracts → clean (worker/orchestrator field contracts in sync).
  • bun run lint:skill-cli-refs → clean (all 92 refs across 51 files resolve).
  • bun run typecheck → clean.
  • AC7 grep (corrected per-file form) → zero hits across the four in-scope skills/ files.
  • AC8 grep on src/(coder|reviewer|orchestration)/ → zero hits.

Scope expansions (declared)

  1. docs/swe-textbook.shape.test.ts (new) — regression-test infrastructure for the new doc and skill markdown. Mirrors docs/task-frontmatter-reference.shape.test.ts shape.
  2. skills/worker-conventions.md row addition — canonical field-contract reference must include textbookCaptures to avoid silent drift the runtime lint cannot catch (scripts/lint-contracts.ts REVERSE_EXCLUDE skips this file). The new row contains no swe-textbook substring, keeping AC7 clean.

Commits

  • 37c8893 — feature implementation (six files, +222/−11).
  • 9f911db — fix(test): remove a literal NUL byte from the new shape test that turned it into a git-binary blob; refactored to a sliceToEnd(body, opener) helper.
  • 1f1815e — fix: clarify ENTRY_HEADLINE input contract excludes the leading ### prefix (codex review on PR feat(mag): docs/swe-textbook.md as Mag-side write memory for filter-rejected retros #499 — guard would have false-appended on already-captured headlines under the original contract phrasing); regression test added (mutation-confirmed).

🤖 Generated with Claude Code

lukstafi and others added 2 commits May 5, 2026 11:04
…er-rejected retro learnings

Implements task-c4e0e80a. Introduces docs/swe-textbook.md as a
write-only journal of competent-SWE-filter-rejected retro learnings,
plus a `capture-textbook` disposition in /ludics-process-suggestions
and the /ludics-feedback-digest worker. The textbook is consulted only
by Mag and the feedback-digest worker; coder/reviewer agent prompts
gain no pull-side pointer, preserving always-loaded prompt leanness
(AC7 negative control).

AC coverage:
- AC1/AC2/AC6: docs/swe-textbook.md preamble enumerates five
  directionality statements + four labelled entry-shape fields; seed
  entry instantiates the gh-ocannl-270 lesson.
- AC3: ludics-process-suggestions.md classification step rewritten
  three-way; new <!-- section:capture-textbook --> step routes to the
  canonical idempotency check; result-JSON example gains a sibling
  textbookCaptures array; Judgment Criteria gains a third bucket.
- AC4: feedback-digest worker gains ### 3a filter step and a 7th
  Response Contract field (textbookCaptures); orchestrator preserves
  the field via Status routing + Result fields + output template.
  worker-conventions.md ## Field Contract Reference gains a row.
- AC5: idempotency guard lives in exactly one location
  (docs/swe-textbook.md#capture-idempotency); both skills cite the
  anchor and describe inputs/outputs in prose only — neither
  duplicates the bash snippet, enforced by AC5-negative shape-test
  assertions on each skill body.
- AC7/AC8: zero src/coder|reviewer|orchestration paths in diff;
  zero non-allowlisted skills/* files mention swe-textbook.

Regression coverage: docs/swe-textbook.shape.test.ts (22 tests, 66
expect calls). Existing scripts/lint-contracts.ts cross-checks
worker/orchestrator field-pair coherence on textbookCaptures.

scope-expansion: docs/swe-textbook.shape.test.ts — regression-test
infrastructure for the new doc and skill markdown.
scope-expansion: skills/worker-conventions.md row — canonical
field-contract reference must include textbookCaptures (the
lint-contracts.ts REVERSE_EXCLUDE skips this file, so silent drift
is otherwise unchecked); the new row contains no `swe-textbook`
substring, keeping AC7 clean.

Test plan:
- bun test → 2227 tests / 0 fail (was 2205 pre-edit; +22 new tests).
- bun test docs/swe-textbook.shape.test.ts → 22/22 pass.
- bun test scripts/lint-contracts.test.ts → 31/31 pass.
- bun run lint:contracts → clean.
- bun run lint:skill-cli-refs → clean.
- bun run typecheck → clean.
- AC7 grep (verbatim from proposal) → zero lines.
- AC8 grep on src/coder|reviewer|orchestration → zero lines.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit's seed-entry slice used `/^\0NEVER_MATCH/` as a
never-match closer regex, which embedded a literal NUL byte in the
test source. Git treated the file as binary (`git diff --numstat
main...HEAD` reported `- -`), making the regression test opaque in
normal review diffs.

Replace the NUL-bearing impossible-regex pattern with a small
`sliceToEnd(body, opener)` helper that walks from the opener line to
end of file. Same semantics for the seed-entry slice, no embedded
control bytes — git now sees a normal text blob.

Test plan:
- bun test docs/swe-textbook.shape.test.ts → 22/22 pass.
- python3 NUL-count probe on the file → 0.
- file(1) reports the file as UTF-8 text, no longer binary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f911dbc3e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docs/swe-textbook.md Outdated

Inputs from the calling skill:

- `ENTRY_HEADLINE` — the proposed `### <headline>` text.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize ENTRY_HEADLINE before idempotency grep

The input contract says ENTRY_HEADLINE is the full ### <headline> text, but the canonical guard searches with grep -Fq "### ${ENTRY_HEADLINE}"; if a caller follows the documented contract literally, the pattern becomes ### ### ... and existing entries are not detected. This causes the duplicate guard to return append for already-captured headlines, so repeated retros can create duplicate textbook entries instead of updating Second occurrence:.

Useful? React with 👍 / 👎.

… `### ` prefix

PR #499 review (codex): the canonical idempotency guard's input
contract said "ENTRY_HEADLINE — the proposed `### <headline>` text"
but the bash snippet does `grep -Fq "### ${ENTRY_HEADLINE}"`. A
caller following the contract literally produces `### ### <headline>`
and the guard never matches existing entries — false-`append` on
already-captured headlines, so repeated retros create duplicate
textbook entries instead of amending `Second occurrence:`. AC5's
falsifier is reachable through this contract bug.

Fix: clarify the input contract — `ENTRY_HEADLINE` is the bare
headline phrase WITHOUT the leading `### ` markdown prefix; the
guard prepends `### ` itself. Add an explicit example. The bash
snippet is unchanged (it was already correct); only the contract
description needed to match it.

Both skills' prose ("derive `ENTRY_HEADLINE` (a short pattern-naming
phrase)") was already consistent with the corrected contract — only
the doc needed tightening.

Add a regression test in `docs/swe-textbook.shape.test.ts`
(AC5-positive — input contract …) that pins the corrected phrasing
and the "guard prepends `### `" claim. Mutation-tested by reverting
to the buggy phrasing → assertion fails (1/23); restored → 23/23.

Test plan:
- bun test docs/swe-textbook.shape.test.ts → 23/23 pass.
- bun run typecheck → clean.
- bun run lint:contracts → clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lukstafi
Copy link
Copy Markdown
Owner Author

lukstafi commented May 5, 2026

note: branch state has drifted since this body was written (baseline: 2 commits at 2026-05-05T09:15:28Z, current: 3 commits). consider gh pr edit https://github.com/lukstafi/ludics/pull/499 to refresh.

@lukstafi
Copy link
Copy Markdown
Owner Author

lukstafi commented May 5, 2026

note: branch state has drifted since this body was written (baseline: 2 commits at 2026-05-05T09:15:32Z, current: 3 commits). consider gh pr edit https://github.com/lukstafi/ludics/pull/499 to refresh.

@lukstafi lukstafi merged commit a478612 into main May 5, 2026
1 check passed
@lukstafi lukstafi deleted the ludics/task-c4e0e80a-s2/root branch May 5, 2026 09:23
@lukstafi
Copy link
Copy Markdown
Owner Author

lukstafi commented May 5, 2026

What I'd do differently next time — task-c4e0e80a (PR #499)

Five durable lessons from this round, ranked by cost paid:

1. Dry-run bash snippets against an example before publishing the contract

Codex caught the ENTRY_HEADLINE contract bug (PR #499 inline review).
The doc said ENTRY_HEADLINE — the proposed ### text but
the snippet does grep -Fq "### ${ENTRY_HEADLINE}". A 30-second
dry-run with a concrete example (ENTRY_HEADLINE="### My pattern"
grep -Fq "### ### My pattern" → never matches) would have caught
this in round 1. Cost paid: an extra commit (1f1815e) + a review
round on a contract bug whose falsifier matched AC5's literal text.

Refactor: when a doc-contract describes a snippet's inputs, paste
the inputs into the snippet mentally (or in a scratch buffer) and
trace what the resulting command actually runs. The "publication
seed" framing of swe-textbook.md made this worse — entries are
written for unfamiliar future readers, so the contract has to be
unambiguous to someone with zero context. Self-contradicting prose +
runnable bash is a high-value bug class to scan for.

2. Pick ASCII-printable sentinels for "impossible regex" closers

Used /^\0NEVER_MATCH/ as a never-match closer for a slice that
should run to EOF. The literal NUL byte flipped git's text/binary
detection and turned the test file into a binary blob — opaque in PR
review (git diff --numstat showed - -). Cost: an extra review
round + commit 9f911db.

Refactor: sentinels stay ASCII-printable
(/^__NEVER_MATCH_SENTINEL__$/), OR refactor to a closer-less helper
(sliceToEnd(body, opener)). One-liner pre-commit self-checks:

  • file(1) <test-file> should report "UTF-8 text".
  • python3 -c "print(open(p,'rb').read().count(b'\x00'))" should be 0.

3. bun test | tail -40 buffers; pipe to a file instead

Round-1 plan was written under the false belief that bun test had
hung — the harness's output file stayed empty for ~15 minutes
because tail -40 only emits at pipeline end. The merge plan had to
record a "baseline pending" placeholder that the reviewer (correctly)
rejected, forcing a second merge round.

Refactor: when the suite is long, pipe to a file
(bun test 2>&1 > /tmp/x.out) or use line-buffered grep
(bun test 2>&1 | grep --line-buffered -E "fail|Ran"). Don't proceed
with a "narrow baseline" placeholder when ~50s of patience produces
the real one.

4. Run mutations BEFORE writing the verification line

Wrote "mutation-tested by appending the string to
worker-conventions.md (assertion fired)" in the AC7 verification
before actually running the mutation. The claim happened to be
true, but the ordering was sloppy and would have been embarrassing
if the assertion hadn't actually fired.

Refactor: per feedback_ac_self_check_invariant_plus_harness.md,
the right order is: (a) write the assertion, (b) instrument the
mutation, (c) observe the failure, (d) revert, (e) write the
verification line citing the observed failure. No claim before the
observation.

5. Grep wider than the plan's occurrence list before declaring scope

The reviewer caught worker-conventions.md ## Field Contract Reference as an in-scope file I had originally classified SKIP.
The grep I ran in round 1 was for swe-textbook|capture-textbook| textbookCaptures — but the file's role as "canonical cross-skill
reference" was visible in its own preamble at line 186, and the
existing feedback-digest rows at 224–229 made it obvious that a
new field needed a row. The runtime lint (scripts/lint-contracts.ts)
puts the file in REVERSE_EXCLUDE, so silent drift was the failure
mode if I shipped without it.

Refactor: when adding a new structured field, grep not just for
the field name but for the type of artifact the field belongs to —
"Field Contract Reference", "schema", "row", "table" — to surface
canonical-reference docs that the test/lint layer doesn't gate.
Borrow the orchestration pattern's "data-shape consumer sweep"
discipline: every consumer gets a per-row disposition, including the
ones humans read.


These five tracked the round's actual cost. The first two cost a
review round each; the others cost reviewer-pushback rounds at the
plan-merge phase. None of the five are about Ludics-internal
mechanics — all four cleanly migrate to the broader "competent-SWE
write-memory" the merged textbook now houses, once entries accrue
beyond the seed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant