You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**What**: Labeled quality checks in `packages/mcp/tests/quality/` that answer “is the MCP useful to agents?” rather than only “does it execute?”
107
+
108
+
**How**:
109
+
110
+
```bash
111
+
pnpm mcp:evaluate
112
+
```
113
+
114
+
Add a retrieval case whenever a real agent query should reliably find a rule. Add a review fixture whenever `review_code` gains a new heuristic or previously noisy behavior is fixed.
115
+
116
+
### 6. Tool performance benchmarks
80
117
81
118
**What**: In-process latency benchmarks for the main tools; asserts p95 stays within budget and that `review_code` scales sub-linearly as the rule set grows.
"ci:check": "pnpm run lint && pnpm run typecheck && pnpm run validate:rule-structure && pnpm run validate:guide-structure && pnpm run validate:guides && pnpm run validate:evidence && pnpm run test:ci",
0 commit comments