You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: plugins/dso/docs/ACCEPTANCE-CRITERIA-LIBRARY.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -119,6 +119,8 @@ Use when a task removes, moves, or replaces a command, skill, script, or other w
119
119
Verify: grep -q 'def {test_name}' {test_path}
120
120
-[ ] Running the test returns non-zero exit pre-implementation
121
121
Verify: python -m pytest {test_path}::{test_name} 2>&1; test $? -ne 0
122
+
-[ ] Test is behavioral: executes the code under test (calls a function, runs a script, or exercises a code path with inputs and asserts on outputs/side effects) — not a grep/sed scan of the source file for implementation strings. Structural tests (negative constraints, metadata validation, syntax checks) are exempt.
123
+
Verify: manual review — test approach in task description describes what is executed and what output is asserted
Copy file name to clipboardExpand all lines: plugins/dso/skills/implementation-plan/SKILL.md
+24Lines changed: 24 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -282,6 +282,30 @@ A RED test task:
282
282
- Must fail (RED) before the implementation task runs
283
283
- Is a standalone task in the plan, not embedded in the implementation task description
284
284
- Uses `TEST_CMD` (resolved from `commands.test` in workflow-config) as the verify command
285
+
-**Must be a behavioral test** — see Behavioral Test Requirement below
286
+
287
+
#### Behavioral Test Requirement
288
+
289
+
RED tests must verify **behavior** (what the code does), not **presence** (that specific code text exists in a source file). A test that greps a source file for a function name, string pattern, or implementation detail is a **change-detector test** — it passes when the code is written and fails when it's deleted, regardless of whether the code actually works.
290
+
291
+
**A valid RED test must do at least one of:**
292
+
- Execute the code under test and assert on its output, exit code, or side effects
293
+
- Create test fixtures (files, repos, mock services) and verify the code handles them correctly
294
+
- Import a module/function and call it with inputs, asserting the return value
295
+
296
+
**Structural tests are acceptable ONLY for these categories:**
297
+
-**Negative constraints** ("must NOT contain X") — e.g., no hardcoded paths after a migration, no relative paths in hook libs. These protect against regression to a known-bad state.
298
+
-**Metadata/schema validation** — e.g., skill frontmatter has required fields, config file has required keys. These verify structure that has no executable behavior.
299
+
-**Syntax checks** — `bash -n`, `python -m py_compile`, JSON schema validation. These verify the code is parseable.
- Asserting that a function name appears in a source file (use: call the function)
304
+
- Asserting that a string appears near another string via `sed -n` range extraction (use: create the scenario and verify the behavior)
305
+
- Counting `grep -c` matches as a proxy for "feature is implemented" (use: exercise the feature)
306
+
- Verifying a script handles edge cases by grepping for the edge case code (use: create the edge case input and verify the output)
307
+
308
+
When the TDD task description specifies the RED test, it must include a **test approach** sentence explaining what the test executes and what output/behavior it asserts. If the test approach describes grepping a source file, the task must be revised to describe a behavioral assertion instead.
0 commit comments