docs: replace testing philosophy with structured testing rules

apankov1 · claude · apankov1 · commit 7477f087d95e · 2026-04-11T11:10:55.000-04:00
Evolve the 35-line testing philosophy into a full testing contract with
pre-test gate, QA technique matching, MUST/SHOULD quality standards,
and an explicit anti-patterns table. Stripped from project-specific
references to serve as a reusable ruleset.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/README.md b/README.md
@@ -61,43 +61,17 @@ Before building a QA/testing skill, try the same prompt without any skill. If th
 
 If the skill just teaches techniques the model already knows, it adds context tokens without adding value.
 
-## Author's testing.md
+## Author's testing rules
 
-After retiring the skills, the testing guidance that remains is a 35-line philosophy file used as a Claude Code rule (`.claude/rules/testing.md`). It encodes the principles that survived the evaluation — the stuff that actually matters for test quality:
+After retiring the skills, the testing guidance that remains is a structured ruleset used as a Claude Code rule. It encodes the pre-test gate, quality standards, and anti-patterns that survived the evaluation — evolved from a 35-line philosophy into a full testing contract:
 
 See [testing-rules/testing.md](testing-rules/testing.md) for the full file.
 
-```markdown
-# Testing
-
-## Test real systems, not simulations
-
-No mocks. Integration tests with real bindings (Miniflare for D1/KV/R2).
-vi.fn() only for platform APIs unavailable in test (WebSocket, ExecutionContext).
-
-Why: mocked tests pass while prod breaks. The mock diverges from the real
-system silently. If you can test against the real thing, do it.
-
-## Test the boundary, not the internals
-
-Call the exported function. If deleting the call site doesn't break the test,
-you're testing the wrong layer. Never write inline "simulators" that copy
-production logic — import and call the actual code.
-
-## Bugs get tests first
-
-Write the failing test. Verify it fails for the right reason. Then fix.
-This order is non-negotiable — it proves the test actually catches the bug.
-
-## What to test
-
-- **Defect-first**: find fault-prone patterns, target those
-- **State machines**: all N×N transitions, not just happy path
-- **Combinatorial inputs**: pairwise coverage for multi-factor scenarios
-- **Boundaries**: Zod parse at every trust boundary
-```
-
-This is the behavioral instruction that 14 skills couldn't improve on.
+Key elements:
+- **Pre-test gate** — a 6-step mandatory process before writing any test (read the requirement, read the implementation, select QA technique, enumerate cases, write AAA tests, self-verify)
+- **QA technique matching** — equivalence partitioning, boundary value analysis, decision tables, state transition testing matched per function
+- **Quality standards** with MUST/SHOULD priority markers
+- **Anti-patterns table** — 8 explicitly forbidden patterns (tautological assertions, mock-the-SUT, truthiness-only, etc.)
 
 ## License
 
diff --git a/testing-rules/testing.md b/testing-rules/testing.md
@@ -1,35 +1,99 @@
-# Testing
+# Testing Rules
 
-> Author's `.claude/rules/testing.md` — the testing philosophy that survived evaluating 14 QA skills.
-> Used as a Claude Code rule file (path-scoped to `**/*.spec.ts`).
+Quality standards and pre-test gate for all test code.
 
-## Test real systems, not simulations
+Priority markers: **MUST** = correctness risk if violated. **SHOULD** = quality risk. **MAY** = advisory.
 
-No mocks. Integration tests with real bindings (Miniflare for D1/KV/R2). `vi.fn()` only for platform APIs unavailable in test (WebSocket, ExecutionContext, DOLogger stubs).
+## Pre-Test Gate
 
-Why: mocked tests pass while prod breaks. The mock diverges from the real system silently. If you can test against the real thing, do it.
+**MUST** complete before writing ANY test file. This is non-negotiable.
 
-## Test the boundary, not the internals
+### Step 1: Identify the requirement
 
-Call the exported function. If deleting the call site doesn't break the test, you're testing the wrong layer. Never write inline "simulators" that copy production logic — import and call the actual code.
+Which issue or acceptance criteria does this test address? Read them.
 
-If the function is private, extract the pure logic into its own module. Test that module. Production code delegates to it.
+### Step 2: Read the implementation
 
-## Bugs get tests first
+Read the source file(s) being tested. Identify:
+- Exported functions and their signatures
+- Decision branches (if/else, switch, early returns)
+- Error paths (throws, catch blocks, error returns)
+- External boundaries (API calls, DB queries, external service bindings)
+- Edge cases visible in the code (null checks, empty arrays, boundary comparisons)
 
-Write the failing test. Verify it fails for the right reason. Then fix. Then full suite. This order is non-negotiable — it proves the test actually catches the bug.
+### Step 3: Select QA technique per function
 
-## What to test
+Match each function to the appropriate technique:
+- Multi-branch logic → Equivalence partitioning (one test per class)
+- Threshold / limit → Boundary value analysis (at, below, above)
+- Multiple conditions to outcome → Decision table (one test per row)
+- Entity lifecycle → State transition testing (valid + invalid transitions)
+- Data transformation → Equivalence partitioning + boundaries
+- Error handling → Equivalence partitioning (per error category)
 
-- **Defect-first**: look at the production code, find the fault-prone patterns, write tests that target those — not tests that exercise the API shape
-- **State machines**: test all N×N transitions, not just happy path. Invalid transitions must throw.
-- **Combinatorial inputs**: pairwise coverage for multi-factor scenarios. Cover all factor pairs in near-minimal cases.
-- **Boundaries**: Zod parse at every trust boundary. Valid input, invalid input, edge values.
+### Step 4: Enumerate test cases
 
-## Naming
+For each function under test, list cases derived from the technique:
+- Name the specific partition, boundary, state transition, or decision row
+- Describe the expected output
+- Name the production defect it would catch
 
-`module.spec.ts` (unit), `module.workers.spec.ts` (Miniflare), `module.contract.spec.ts` (schema), `module.pairwise.spec.ts` (combinatorial).
+### Step 5: Write test code (AAA structure)
 
-Test names describe behavior: `'returns X when Y is Z'`. Never: `'works correctly'`, `'should work'`.
+```typescript
+it("returns 'house' for 'House' origin value", () => {
+  // Arrange
+  const input = "House";
 
-Assertions use specific values: `expect(result.code).toBe('game_not_found')` not `expect(result).toBeDefined()`.
+  // Act
+  const result = normalizeOrigin(input);
+
+  // Assert
+  expect(result).toBe("house");
+});
+```
+
+### Step 6: Self-verify
+
+Could this test fail if the production code had a real bug? If the test would still pass with a broken implementation, delete it.
+
+## Quality Standards
+
+### MUST
+
+- Complete the pre-test gate before writing any test file
+- Select a QA technique from the guide for each function under test
+- Enumerate test cases before writing test code
+- Use AAA structure (Arrange, Act, Assert) in every test
+- Assert on computed values with specific expected values — never truthiness-only
+- Import and call at least one production function per test file
+- No tautological assertions (`expect(true).toBe(true)`)
+- No self-referential assertions (`expect(x).toBe(x)`)
+- Never mock the system under test — mock only at external boundaries (fetch, timers, external services)
+- Include negative test cases (error paths, invalid inputs, throws)
+- Bugs get tests first: write the failing test, verify it fails for the right reason, then fix
+
+### SHOULD
+
+- Use test data builders instead of inline object literals
+- Name test files after the function or behavior, not the source file
+- One focused concern per test file
+- Test the boundary, not the internals — if deleting the call site doesn't break the test, you're testing the wrong layer
+
+## Test Tiers
+
+- **Unit** (`unit/`): Pure logic, no I/O, no DB. Use fake timers for clock-dependent helpers.
+- **Integration** (`integration/`): Real database bindings. Real time only (DB time functions are not controlled by fake timers). Prefer seeded timestamps over elapsed-time waits.
+
+## Anti-Patterns (Explicitly Forbidden)
+
+| Anti-pattern | Example | Why |
+|---|---|---|
+| Tautological assertion | `expect(true).toBe(true)` | Cannot fail |
+| Self-referential | `expect(x).toBe(x)` | Always passes |
+| Literal roundtrip | Build `{name: "foo"}`, assert `obj.name === "foo"` | Tests construction |
+| Truthiness-only | `expect(result).toBeTruthy()` | Passes for any non-null |
+| Mock the SUT | Mock `doThing` to test `doThing` | Tests the mock |
+| Empty test body | `it("works", () => {})` | Proves nothing |
+| No production call | `it("adds", () => expect(1+1).toBe(2))` | Tests JavaScript |
+| Schema-success-only | `expect(result.success).toBe(true)` | Doesn't verify parsed data |