|
| 1 | +# AWOS test suite |
| 2 | + |
| 3 | +A three-layer safety net that catches structural regressions in AWOS prompts and installer behavior at PR time. Built on Node's `node:test` built-in — **zero npm dependencies**. Runs identically under `node --test` (CI primary) and `bun test` (local cross-runtime sanity). |
| 4 | + |
| 5 | +## Why this exists |
| 6 | + |
| 7 | +AWOS distributes markdown prompts that users install into their projects. Prompts and the installer share several silent contracts — task markers, file paths, frontmatter fields, dimension DAGs, copy semantics — and a typo or rename in one prompt can break a downstream user's `/awos:implement` run a week later, with no PR-time signal. This suite asserts those contracts so future prompt edits get caught before they ship. |
| 8 | + |
| 9 | +What it does **not** do: validate prompt _behavior_ (does the LLM actually do the right thing when run?). That requires an LLM in the loop and an API budget; we rely on the scratch-project smoke test described in the root `CLAUDE.md` for that. |
| 10 | + |
| 11 | +## Running the suite |
| 12 | + |
| 13 | +```sh |
| 14 | +# Primary path (Node 22+, what CI runs) |
| 15 | +npm test |
| 16 | + |
| 17 | +# Per-layer |
| 18 | +npm run test:lint # Layer 1 |
| 19 | +npm run test:installer # Layer 2 |
| 20 | +npm run test:fixtures # Layer 3 |
| 21 | + |
| 22 | +# Local cross-runtime sanity check |
| 23 | +bun test tests/ |
| 24 | +``` |
| 25 | + |
| 26 | +CI runs `npm test` under Node 22 in `.github/workflows/quality-check.yml` (non-blocking initially; flip to required after two consecutive green PR runs). |
| 27 | + |
| 28 | +## Layout |
| 29 | + |
| 30 | +``` |
| 31 | +tests/ |
| 32 | +├── README.md # this file |
| 33 | +├── lint-prompts.test.js # Layer 1: static prompt linter |
| 34 | +├── config/ |
| 35 | +│ └── wrapper-schema.json # which wrapper frontmatter fields are required |
| 36 | +├── installer/ # Layer 2: installer unit tests |
| 37 | +│ ├── file-copier.test.js |
| 38 | +│ ├── migration-runner.test.js |
| 39 | +│ └── setup-orchestrator.test.js |
| 40 | +├── fixtures.test.js # Layer 3: harness for example projects |
| 41 | +├── fixtures/ # Layer 3: example projects |
| 42 | +│ ├── fresh-project/ |
| 43 | +│ ├── existing-awos-v0/ |
| 44 | +│ ├── customized-wrapper/ |
| 45 | +│ ├── mid-workflow/ |
| 46 | +│ └── pre-migration-v1/ |
| 47 | +└── helpers/ |
| 48 | + ├── frontmatter.js # minimal YAML-frontmatter parser, no deps |
| 49 | + ├── manifest.js # load + assert fixture manifests |
| 50 | + └── temp-project.js # mkdtemp / copyTree / silenced helpers |
| 51 | +``` |
| 52 | + |
| 53 | +Behavioral end-to-end tests (real `claude` sessions, session-log parsing) live in the separate **`awos-qa`** repository. |
| 54 | + |
| 55 | +## Layer 1 — Static prompt linter |
| 56 | + |
| 57 | +`tests/lint-prompts.test.js`. Reads markdown across `commands/`, `claude/commands/`, `templates/`, and `plugins/awos/skills/ai-readiness-audit/dimensions/` and asserts: |
| 58 | + |
| 59 | +- **Wrapper symmetry.** Every `claude/commands/<name>.md` has a matching `commands/<name>.md`. |
| 60 | +- **Wrapper include line.** Each wrapper contains either `@.awos/commands/<name>.md` (preferred) or the legacy `Refer to the instructions located in this file: .awos/commands/<name>.md`. Output logs the count of each form so the migration to `@`-import is visible. |
| 61 | +- **Wrapper frontmatter schema.** Required keys defined in `tests/config/wrapper-schema.json`. Tighten this file (don't edit the test) when new wrapper-frontmatter contracts are added. |
| 62 | +- **Wrapper description matches root.** Drift between a wrapper's `description` and the corresponding root command's `description` fails the suite — the slash-command palette shows the wrapper's text, so it has to stay in sync with the canonical one. |
| 63 | +- **Agent marker preservation.** `commands/tasks.md` (writer) and `commands/implement.md` (reader) both contain the literal `**[Agent: ` marker token — this is how the orchestrator extracts each task's specialist assignment. |
| 64 | +- **XML scope, investigate, and skills snippets.** `commands/implement.md` contains `<scope_discipline>` (don't over-engineer), `<investigate_before_answering>` (don't hallucinate), and `<use_available_skills>` (apply matching project/user/plugin skills) — all passed through to the delegated subagent prompt. |
| 65 | +- **`Agent()` invocation example.** `commands/implement.md` and `commands/tech.md` both show an explicit `Agent(subagent_type=..., ...)` call so the delegation step is concrete, not just described. |
| 66 | +- **`INTERACTION` section in every core command.** Every `commands/*.md` declares its own `# INTERACTION` section that names `AskUserQuestion`. Wrappers must _not_ duplicate that rule — AWOS targets Claude Code only, so the tool is a framework default, not host-specific customization. |
| 67 | +- **Subagent discovery (filesystem + plugins).** `commands/tasks.md`, `commands/tech.md`, and `commands/hire.md` reference both `.claude/agents/` (project-local, parsed via frontmatter) _and_ the `Agent` tool's description block (plugin-provided agents, recognized by the `plugin-name:` prefix on `subagent_type`). |
| 68 | +- **`agent-template.md` cues skills application.** The body of `templates/agent-template.md` instructs spawned agents to apply the skills listed in their frontmatter — without this, `/awos:hire`'s skill-attachment work is inert at run time. |
| 69 | +- **`context/product/hired-agents.md` rename pinned.** The `/awos:hire`-owned coverage report is referenced at its post-rename path; no prompt still references the legacy `context/product/agents.md`. |
| 70 | +- **Slash-command cross-references.** Every `/awos:<word>` mentioned in any prompt resolves to a real `commands/<word>.md` (or the plugin path for `/awos:ai-readiness-audit`). |
| 71 | +- **Dimension DAG.** Every dimension under `plugins/awos/skills/ai-readiness-audit/dimensions/*.md` has required frontmatter, `name` matches its filename, severity is in the allowed set, `depends-on` entries resolve to real dimension names, and the graph topologically sorts (no cycles). |
| 72 | +- **`context/...` path consistency.** Cross-prompt path references are mutually reachable — if two prompts read the same path, at least one writer of it must exist. |
| 73 | +- **`setup-config.js` ↔ source-tree consistency.** Every `copyOperation.source` directory exists; every top-level source directory matching `^(commands|templates|scripts|claude)/` is referenced by exactly one `copyOperation`. |
| 74 | + |
| 75 | +Cost: ~30 ms. Catches roughly 80 % of structural regressions on its own. |
| 76 | + |
| 77 | +## Layer 2 — Installer unit tests |
| 78 | + |
| 79 | +`tests/installer/*.test.js`. Exercises `src/services/file-copier.js`, `src/migrations/runner.js`, and `src/core/setup-orchestrator.js` against `fs.mkdtemp()` temp directories. Only Node built-ins, only public exports of the installer modules — no monkey-patching. |
| 80 | + |
| 81 | +- **`file-copier.test.js`** |
| 82 | + - Fresh install lands every source file at its declared destination. |
| 83 | + - Synthetic `commands/synth-test.md` is auto-discovered (validates "no `setup-config.js` edit needed when adding files inside an existing tree"). |
| 84 | + - Wrapper overwrite behavior pinned to current code (`.claude/commands/awos/*.md` _is_ overwritten on update). Comments in the test point at the open §11 docs-vs-code question; flip the assertion when that's resolved intentionally. |
| 85 | + - Dry-run honesty: `dryRun: true` produces zero filesystem changes. |
| 86 | +- **`migration-runner.test.js`** |
| 87 | + - Migration 001 is idempotent (run twice, second run is a no-op). |
| 88 | + - `skip_if_any` triggers on already-migrated state and reports `already_applied`. |
| 89 | + - Migration version meta-test: every JSON under `src/migrations/` has a unique version, no gaps, no duplicates. |
| 90 | + - Dry-run does not touch disk. |
| 91 | +- **`setup-orchestrator.test.js`** |
| 92 | + - End-to-end `runSetup({ workingDir, packageRoot })` against a temp dir completes without throwing. |
| 93 | + - Re-running on an existing install is idempotent on the on-disk side. |
| 94 | + |
| 95 | +Cost: ~50 ms. |
| 96 | + |
| 97 | +## Layer 3 — Example fixture projects |
| 98 | + |
| 99 | +`tests/fixtures.test.js` is a harness that runs once per directory under `tests/fixtures/`. For each fixture: |
| 100 | + |
| 101 | +1. Make a fresh `fs.mkdtemp()` temp dir. |
| 102 | +2. If the fixture has a `before/` subtree, copy it into the temp dir. |
| 103 | +3. Run the real installer (`runSetup({ workingDir, packageRoot: repoRoot })`). |
| 104 | +4. Load `expected-after.json` and assert the resulting tree matches the manifest. |
| 105 | + |
| 106 | +Each `expected-after.json` lists files with one or more of: `{ exists, sha256, contains, unchanged }`. Files not listed are not asserted — fixtures are deliberately selective. |
| 107 | + |
| 108 | +Currently shipped fixtures: |
| 109 | + |
| 110 | +| Fixture | Scenario | What it pins down | |
| 111 | +| --------------------- | ----------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- | |
| 112 | +| `fresh-project/` | Empty project | Full install layout: `.awos/commands/`, `.claude/commands/awos/`, `context/`, `.awos/.migration-version` | |
| 113 | +| `existing-awos-v0/` | Stale `.awos/commands/architecture.md` from a prior install | Framework internals always get the latest content (overwritten) | |
| 114 | +| `customized-wrapper/` | User-customized `.claude/commands/awos/architecture.md` | Pins the current always-overwrite behavior; see the §11 open question in the plan | |
| 115 | +| `mid-workflow/` | Populated `context/spec/001-test-feature/*.md` | Installer never touches user spec work | |
| 116 | +| `pre-migration-v1/` | `.claude/agents/python-expert.md` at the pre-v1 path | Migrations 001 + 002 land cleanly and the version file reads `2` | |
| 117 | + |
| 118 | +Adding a new fixture: create `tests/fixtures/<name>/`, optionally with a `before/` subtree, plus an `expected-after.json` manifest. The harness picks it up automatically. |
| 119 | + |
| 120 | +Cost: ~65 ms for all five. |
| 121 | + |
| 122 | +## Behavioral end-to-end tests live in the `awos-qa` repo |
| 123 | + |
| 124 | +Static lint catches "prompt mentions X"; only running the real LLM catches "Claude actually did X". That second class of test lives in the separate **`awos-qa`** repository, sibling to this one. It drives a Claude Code session against a seeded scratch project and parses the resulting session log to assert on the tool-call trace. |
| 125 | + |
| 126 | +It's intentionally a separate repo so prompt-author iteration here doesn't pull in the behavioral-test surface area, and so awos-qa can grow other test types (perf, evals, integration) without coupling them to AWOS's release cycle. |
| 127 | + |
| 128 | +## Adding tests for new contracts |
| 129 | + |
| 130 | +The rule (also in the root `CLAUDE.md`): **any PR that introduces a new structural contract must ship its test in the same PR.** |
| 131 | + |
| 132 | +- New wrapper frontmatter key → add it to `tests/config/wrapper-schema.json`. |
| 133 | +- New required marker in a prompt → add a `test('marker preserved', …)` to `tests/lint-prompts.test.js`. |
| 134 | +- New migration in `src/migrations/` → add an idempotency + skip-semantics test to `tests/installer/migration-runner.test.js`. If user wrappers or agents are rewritten, add a fixture under `tests/fixtures/` that exercises a representative pre-migration tree. |
| 135 | +- New copy operation in `src/config/setup-config.js` → the consistency check in Layer 1 will fail unless the matching source directory exists; the fixture suite picks up the new destination automatically once any fixture asserts a file under it. |
| 136 | +- New audit dimension → Layer 1's DAG check picks it up automatically; just make sure the frontmatter is complete. |
| 137 | + |
| 138 | +## Constraints (don't break these) |
| 139 | + |
| 140 | +- **No npm dependencies.** AWOS's installer is dep-free for cross-runtime portability. Tests inherit that constraint. |
| 141 | +- **Cross-runtime compatible.** Same files must run under both `node --test` and `bun test`. Avoid Node-only APIs Bun lacks. |
| 142 | +- **Tests assert today's code as truth.** If a test fails after a code change you didn't intend to make, fix the code, not the test. If you intentionally changed a contract, update the test in the same commit and explain why in the message. |
0 commit comments