Skip to content

Commit 2c96200

Browse files
apankov1claude
andauthored
feat(slop-test-detector): benchmark eval fixtures (#21)
## Summary - 13 eval fixtures covering all 18 slop detection rules, validated against `analyzeTestFile()` ground truth - `evals.json` with per-eval expectations, rule coverage matrix, and holdout split for benchmark runs - SKILL.md updated with pushier description (better triggering) and explicit guidance on reporting findings with impact - Validation scripts (`validate-fixtures.ts`, `run-quick-eval.ts`) for ground truth regression testing ## Key design decisions - **Fixture headers stripped of ground truth comments** — baseline testing revealed models read JSDoc hints and reproduce expected rule names, invalidating with/without comparisons - **13 fixtures, not 12** — added `self_referential_assertion` fixture to close the last rule coverage gap - **3 baseline runs (Sonnet, no skill)** confirmed the skill's value is precision (0 FPs on clean code vs 4 baseline FPs) and consistency (stable rule names vs ad hoc labels), not just detection ## Rule coverage All 18 rules exercised across fixtures. Distribution: 3 clearly bad, 3 mixed, 3 clean/FP-resistant, 4 rule-specific. ## Test plan - [x] All 13 evals pass ground truth validation (172 checks, 0 failures) - [x] All 157 existing slop-detector tests pass - [x] Baseline comparison run without answer leak confirms skill adds value - [ ] Full benchmark (5 runs × 2 configs × 13 evals) — deferred pending cost/value assessment 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 37212f1 commit 2c96200

18 files changed

Lines changed: 1254 additions & 5 deletions

biome.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,6 @@
2525
},
2626
"files": {
2727
"include": ["skills/**/*.ts"],
28-
"ignore": ["node_modules"]
28+
"ignore": ["node_modules", "skills/*/evals/fixtures"]
2929
}
3030
}

skills/slop-test-detector/SKILL.md

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
name: slop-test-detector
3-
description: "Static analyzer that detects 18 slop patterns in test code — tests that compile and pass but catch zero bugs."
3+
description: "Static analyzer that detects 18 slop patterns in test code — tests that compile and pass but catch zero bugs. Use when reviewing test files, auditing test quality, checking for weak tests, validating generated tests, or when the user mentions test slop, empty tests, missing assertions, or tautological checks. TRIGGER on any test review or test quality task."
44
---
55

66
# Slop Test Detector
@@ -134,7 +134,18 @@ const report = analyzeTestFile(source, 'my-module.spec.ts');
134134
console.log(formatReport(report));
135135
```
136136

137-
### Step 2: Validate During Generation
137+
### Step 2: Report Findings with Impact
138+
139+
When presenting findings, always include:
140+
- **Rule name** — the specific slop pattern (e.g., `tautological_assertion`)
141+
- **Test name and line number** — so the user can navigate to it
142+
- **Severity**`must-fail` (test is structurally broken) vs `should-fail` (quality concern)
143+
- **Why it matters** — what production bug could slip through because of this pattern
144+
- **What clean tests should NOT be flagged** — avoid false positives on well-written tests
145+
146+
If a test file has assertion-equivalent helpers (functions named `assert*()` or `test*()` that internally call `assert.*`), configure `assertionEquivalents` to prevent false `empty_test_body` findings.
147+
148+
### Step 3: Validate During Generation
138149

139150
Before writing a generated test to disk, check for slop:
140151

@@ -151,15 +162,15 @@ if (findings.some(f => f.severity === 'must-fail')) {
151162
}
152163
```
153164

154-
### Step 3: Machine-Readable Output (CI)
165+
### Step 4: Machine-Readable Output (CI)
155166

156167
```typescript
157168
const report = analyzeTestFile(source, filePath, getPreset('balanced'));
158169
const json = formatReportJSON(report);
159170
// Outputs structured JSON with filePath, score, summary, findings[]
160171
```
161172

162-
### Step 4: Interpret the Score
173+
### Step 5: Interpret the Score
163174

164175
The score formula weights must-fail findings (1.0) higher than should-fail (0.3):
165176

skills/slop-test-detector/evals/evals.json

Lines changed: 321 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
/**
2+
* Notification service contract tests
3+
*/
4+
5+
import assert from "node:assert/strict";
6+
import { describe, it } from "node:test";
7+
import { NotificationService } from "../src/notification-service.js";
8+
9+
describe("NotificationService", () => {
10+
it("should deliver email notification with correct subject", () => {
11+
const service = new NotificationService({ provider: "ses" });
12+
const result = service.send({
13+
channel: "email",
14+
to: "user@example.com",
15+
subject: "Order Confirmed",
16+
body: "Your order #1234 has been confirmed.",
17+
});
18+
assert.equal(result.status, "delivered");
19+
assert.equal(result.channel, "email");
20+
assert.ok(result.messageId.startsWith("msg_"));
21+
});
22+
23+
it("should batch multiple recipients in single API call", () => {
24+
const service = new NotificationService({ provider: "ses", batchSize: 50 });
25+
const recipients = Array.from({ length: 120 }, (_, i) => `user${i}@example.com`);
26+
const batches = service.prepareBatches(recipients);
27+
assert.equal(batches.length, 3);
28+
assert.equal(batches[0].length, 50);
29+
assert.equal(batches[1].length, 50);
30+
assert.equal(batches[2].length, 20);
31+
});
32+
33+
it("should throw on unsupported notification channel", () => {
34+
const service = new NotificationService({ provider: "ses" });
35+
assert.throws(
36+
() =>
37+
service.send({
38+
channel: "carrier-pigeon" as any,
39+
to: "user@example.com",
40+
subject: "test",
41+
body: "test",
42+
}),
43+
{ code: "UNSUPPORTED_CHANNEL" },
44+
);
45+
});
46+
47+
it("should respect quiet hours by deferring delivery", () => {
48+
const service = new NotificationService({
49+
provider: "ses",
50+
quietHours: { start: 22, end: 7, timezone: "America/New_York" },
51+
});
52+
const result = service.send({
53+
channel: "push",
54+
to: "device_token_abc",
55+
subject: "Reminder",
56+
body: "Don't forget your appointment",
57+
sendAt: new Date("2026-03-15T03:00:00-05:00"),
58+
});
59+
assert.equal(result.status, "deferred");
60+
assert.ok(result.scheduledFor);
61+
assert.ok(new Date(result.scheduledFor).getHours() >= 7);
62+
});
63+
64+
it("should redact PII from notification logs", () => {
65+
const service = new NotificationService({ provider: "ses" });
66+
service.send({
67+
channel: "email",
68+
to: "sensitive@example.com",
69+
subject: "Password Reset",
70+
body: "Click here to reset: https://app.com/reset?token=abc123",
71+
});
72+
const logs = service.getAuditLog();
73+
assert.ok(logs.length > 0);
74+
const lastLog = logs[logs.length - 1];
75+
assert.equal(lastLog.to, "s*******e@example.com");
76+
assert.ok(!lastLog.body.includes("abc123"));
77+
});
78+
});
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
/**
2+
* Order lifecycle state machine tests
3+
*/
4+
5+
import assert from "node:assert/strict";
6+
import { describe, it } from "node:test";
7+
import { OrderStateMachine } from "../src/order-state-machine.js";
8+
9+
describe("OrderStateMachine", () => {
10+
it("should start in 'pending' state", () => {
11+
const order = new OrderStateMachine("order_001");
12+
assert.equal(order.currentState, "pending");
13+
assert.deepEqual(order.history, [{ state: "pending", timestamp: order.createdAt }]);
14+
});
15+
16+
it("should transition pending → confirmed on payment", () => {
17+
const order = new OrderStateMachine("order_002");
18+
order.transition("confirm", { paymentId: "pay_abc" });
19+
assert.equal(order.currentState, "confirmed");
20+
assert.equal(order.history.length, 2);
21+
assert.equal(order.metadata.paymentId, "pay_abc");
22+
});
23+
24+
it("should transition confirmed → shipped with tracking", () => {
25+
const order = new OrderStateMachine("order_003");
26+
order.transition("confirm", { paymentId: "pay_def" });
27+
order.transition("ship", { trackingNumber: "1Z999AA10123456784", carrier: "UPS" });
28+
assert.equal(order.currentState, "shipped");
29+
assert.equal(order.metadata.trackingNumber, "1Z999AA10123456784");
30+
assert.equal(order.metadata.carrier, "UPS");
31+
});
32+
33+
it("should reject invalid transitions", () => {
34+
const order = new OrderStateMachine("order_004");
35+
assert.throws(() => order.transition("ship", { trackingNumber: "TRACK123" }), {
36+
message: /Cannot transition from 'pending' via 'ship'/,
37+
});
38+
});
39+
40+
it("should allow cancellation from pending or confirmed", () => {
41+
const pendingOrder = new OrderStateMachine("order_005");
42+
pendingOrder.transition("cancel", { reason: "Customer request" });
43+
assert.equal(pendingOrder.currentState, "cancelled");
44+
45+
const confirmedOrder = new OrderStateMachine("order_006");
46+
confirmedOrder.transition("confirm", { paymentId: "pay_ghi" });
47+
confirmedOrder.transition("cancel", { reason: "Out of stock" });
48+
assert.equal(confirmedOrder.currentState, "cancelled");
49+
});
50+
51+
it("should reject cancellation of shipped orders", () => {
52+
const order = new OrderStateMachine("order_007");
53+
order.transition("confirm", { paymentId: "pay_jkl" });
54+
order.transition("ship", { trackingNumber: "TRACK789", carrier: "FedEx" });
55+
assert.throws(() => order.transition("cancel", { reason: "Changed mind" }), {
56+
message: /Cannot transition from 'shipped' via 'cancel'/,
57+
});
58+
});
59+
60+
it("should record full audit trail of state changes", () => {
61+
const order = new OrderStateMachine("order_008");
62+
order.transition("confirm", { paymentId: "pay_mno" });
63+
order.transition("ship", { trackingNumber: "TRACK456", carrier: "DHL" });
64+
order.transition("deliver", {});
65+
assert.equal(order.history.length, 4);
66+
assert.deepEqual(
67+
order.history.map((h) => h.state),
68+
["pending", "confirmed", "shipped", "delivered"],
69+
);
70+
});
71+
});
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
/**
2+
* Cache eviction policy tests
3+
*/
4+
5+
import assert from "node:assert/strict";
6+
import { describe, it } from "node:test";
7+
import { LRUCache } from "../src/lru-cache.js";
8+
9+
describe("LRUCache", () => {
10+
it("should evict least recently used entry when at capacity", () => {
11+
const cache = new LRUCache<string>({ maxSize: 3 });
12+
cache.set("a", "alpha");
13+
cache.set("b", "bravo");
14+
cache.set("c", "charlie");
15+
cache.get("a"); // access 'a' to make 'b' the LRU
16+
cache.set("d", "delta"); // should evict 'b'
17+
assert.equal(cache.get("b"), undefined);
18+
assert.equal(cache.get("a"), "alpha");
19+
assert.equal(cache.get("d"), "delta");
20+
});
21+
22+
it("should respect max size limit", () => {
23+
const cache = new LRUCache<number>({ maxSize: 2 });
24+
cache.set("x", 1);
25+
cache.set("y", 2);
26+
cache.set("z", 3);
27+
assert.equal(cache.size, 2);
28+
assert.equal(cache.has("x"), false);
29+
});
30+
31+
// Defect: TTL bug
32+
it("should handle TTL expiry correctly", () => {
33+
const cache = new LRUCache<string>({ maxSize: 10, ttlMs: 100 });
34+
cache.set("temp", "value");
35+
// Simulate time passage
36+
cache._advanceClock(150);
37+
assert.equal(cache.get("temp"), undefined);
38+
assert.equal(cache.size, 0);
39+
});
40+
41+
// Defect: get() must update eviction order to most-recently-used, otherwise hot keys
42+
// get evicted and hit rate degrades under load — breaks LRU invariant.
43+
it("should update access order on get to maintain LRU invariant", () => {
44+
const cache = new LRUCache<string>({ maxSize: 3 });
45+
cache.set("a", "1");
46+
cache.set("b", "2");
47+
cache.set("c", "3");
48+
cache.get("a"); // touch 'a', making 'b' the LRU
49+
cache.set("d", "4"); // evicts 'b', not 'a'
50+
assert.equal(cache.has("a"), true);
51+
assert.equal(cache.has("b"), false);
52+
assert.equal(cache.has("c"), true);
53+
assert.equal(cache.has("d"), true);
54+
});
55+
});
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
/**
2+
* URL shortener service tests
3+
*/
4+
5+
import assert from "node:assert/strict";
6+
import { describe, it } from "node:test";
7+
import { UrlShortener, classify } from "../src/url-shortener.js";
8+
9+
describe("UrlShortener", () => {
10+
it("should shorten HTTP URL to 8-char slug", () => {
11+
const shortener = new UrlShortener({ slugLength: 8 });
12+
const result = shortener.shorten("https://example.com/very/long/path");
13+
assert.ok(result.slug);
14+
assert.equal(result.slug.length, 8);
15+
assert.equal(result.originalUrl, "https://example.com/very/long/path");
16+
});
17+
18+
it("should shorten HTTPS URL to 8-char slug", () => {
19+
const shortener = new UrlShortener({ slugLength: 8 });
20+
const result = shortener.shorten("https://example.com/very/long/path");
21+
assert.ok(result.slug);
22+
assert.equal(result.slug.length, 8);
23+
assert.equal(result.originalUrl, "https://example.com/very/long/path");
24+
});
25+
26+
it("should classify URL protocol as HTTP", () => {
27+
assert.equal(classify("https://example.com"), "secure");
28+
});
29+
30+
it("should classify URL protocol for redirect", () => {
31+
assert.equal(classify("https://example.com"), "secure");
32+
});
33+
34+
it("should track click count per slug", () => {
35+
const shortener = new UrlShortener({ slugLength: 8 });
36+
const { slug } = shortener.shorten("https://docs.example.com/guide");
37+
shortener.resolve(slug);
38+
shortener.resolve(slug);
39+
shortener.resolve(slug);
40+
const stats = shortener.getStats(slug);
41+
assert.equal(stats.clicks, 3);
42+
});
43+
});
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
/**
2+
* Payment gateway integration tests
3+
*/
4+
5+
import assert from "node:assert/strict";
6+
import { describe, it } from "node:test";
7+
import { PaymentGateway } from "../src/payment-gateway.js";
8+
9+
describe("PaymentGateway", () => {
10+
it("should process refund for valid transaction", () => {
11+
const gateway = new PaymentGateway({ apiKey: "sk_test_123" });
12+
const transaction = gateway.createTransaction({
13+
amount: 4999,
14+
currency: "USD",
15+
cardToken: "tok_visa_debit",
16+
});
17+
// Will add assertions after integration env is up
18+
});
19+
20+
it("should validate card number format before charging", () => {
21+
const gateway = new PaymentGateway({ apiKey: "sk_test_123" });
22+
const result = gateway.validateCard("4111111111111111");
23+
// assert.equal(result.valid, true);
24+
// assert.equal(result.brand, "visa");
25+
// assert.equal(result.lastFour, "1111");
26+
});
27+
28+
it("should log transaction with correlation ID", () => {
29+
const logs: string[] = [];
30+
const correlationId = "corr-abc-123";
31+
const timestamp = new Date().toISOString();
32+
const entry = `[${timestamp}] ${correlationId} PAYMENT_INIT`;
33+
logs.push(entry);
34+
assert.ok(logs.length > 0);
35+
});
36+
37+
it("should reject expired cards", () => {
38+
const gateway = new PaymentGateway({ apiKey: "sk_test_123" });
39+
assert.throws(() => gateway.charge({ cardToken: "tok_expired", amount: 1000 }), { code: "CARD_EXPIRED" });
40+
});
41+
});

0 commit comments

Comments
 (0)