Skip to content

Commit 066b2d7

Browse files
fix: tier-check branch support and draft/extension scoring (#153)
* fix: pass --branch to tier-check CLI in tier-audit skill The skill was not forwarding the --branch argument to the tier-check CLI, causing policy signal checks to always run against the repo's default branch. Files on feature branches showed as 'Not found.' Now derives the branch from the local checkout if not explicitly provided, and always passes it to the CLI. * fix: exclude draft/extension scenarios from tier-scoring conformance rates SEP-1730 says date-versioned scenarios count toward tier scoring while draft and extension scenarios are informational. The CLI was including all scenarios in pass_rate, causing extension-only failures to block Tier 1. Changes: - pass_rate now only counts scenarios with at least one date-versioned spec version - Terminal and markdown output split into a tier-scoring matrix (date-versioned + All*) and an informational section (draft/extension) that only renders when there are draft/extension scenarios
1 parent 8ab0831 commit 066b2d7

3 files changed

Lines changed: 148 additions & 52 deletions

File tree

.claude/skills/mcp-sdk-tier-audit/SKILL.md

Lines changed: 33 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: >-
55
Produces tier classification (1/2/3) with evidence table, gap list, and
66
remediation guide. Works for any official MCP SDK (TypeScript, Python, Go,
77
C#, Java, Kotlin, PHP, Swift, Rust, Ruby).
8-
argument-hint: '<local-path> <conformance-server-url> [client-cmd]'
8+
argument-hint: '<local-path> <conformance-server-url> [client-cmd] [--branch <branch>]'
99
---
1010

1111
# MCP SDK Tier Audit
@@ -43,6 +43,7 @@ Extract from the user's input:
4343
- **local-path**: absolute path to the SDK checkout (e.g. `~/src/mcp/typescript-sdk`)
4444
- **conformance-server-url**: URL where the SDK's everything server is already running (e.g. `http://localhost:3000/mcp`)
4545
- **client-cmd** (optional): command to run the SDK's conformance client (e.g. `npx tsx test/conformance/src/everythingClient.ts`). If not provided, client conformance tests are skipped and noted as a gap in the report.
46+
- **branch** (optional): Git branch to check on GitHub (e.g. `--branch fweinberger/v1x-governance-docs`). If not provided, derive from the local checkout's current branch: `cd <local-path> && git rev-parse --abbrev-ref HEAD`. This is passed to the tier-check CLI so that policy signal file checks use the correct branch instead of the repo's default branch.
4647

4748
The first two arguments are required. If either is missing, ask the user to provide it.
4849

@@ -59,12 +60,13 @@ The `tier-check` CLI handles all deterministic checks — server conformance, cl
5960
```bash
6061
npm run --silent tier-check -- \
6162
--repo <owner/repo> \
63+
--branch <branch> \
6264
--conformance-server-url <conformance-server-url> \
6365
--client-cmd '<client-cmd>' \
6466
--output json
6567
```
6668

67-
If no client-cmd was detected, omit the `--client-cmd` flag (client conformance will be skipped).
69+
If no client-cmd was detected, omit the `--client-cmd` flag (client conformance will be skipped). The `--branch` flag should always be included (derived from the local checkout if not explicitly provided).
6870

6971
The CLI output includes server conformance pass rate, client conformance pass rate (with per-spec-version breakdown), issue triage compliance, P0 resolution times, label taxonomy, stable release status, policy signal files, and spec tracking gap. Parse the JSON output to feed into Step 4.
7072

@@ -115,8 +117,8 @@ Combine the deterministic scorecard (from the CLI) with the evaluation results (
115117

116118
### Tier 1 requires ALL of:
117119

118-
- Server conformance test pass rate == 100%
119-
- Client conformance test pass rate == 100%
120+
- Server conformance test pass rate == 100% (date-versioned scenarios only; `draft` and `extension` are informational and not scored)
121+
- Client conformance test pass rate == 100% (date-versioned scenarios only; `draft` and `extension` are informational and not scored)
120122
- Issue triage compliance >= 90% within 2 business days
121123
- All P0 bugs resolved within 7 days
122124
- Stable release >= 1.0.0 with no pre-release suffix
@@ -127,8 +129,8 @@ Combine the deterministic scorecard (from the CLI) with the evaluation results (
127129

128130
### Tier 2 requires ALL of:
129131

130-
- Server conformance test pass rate >= 80%
131-
- Client conformance test pass rate >= 80%
132+
- Server conformance test pass rate >= 80% (date-versioned scenarios only)
133+
- Client conformance test pass rate >= 80% (date-versioned scenarios only)
132134
- Issue triage compliance >= 80% within 1 month
133135
- P0 bugs resolved within 2 weeks
134136
- At least one stable release >= 1.0.0
@@ -151,11 +153,19 @@ The **full suite** pass rates (server total, client total) are used for tier thr
151153

152154
Example:
153155

154-
| | 2025-03-26 | 2025-06-18 | 2025-11-25 | draft | extension | All\* |
155-
| ------------ | ---------- | ---------- | ---------- | ----- | --------- | ------------ |
156-
| Server || 26/26 | 4/4 ||| 30/30 (100%) |
157-
| Client: Core || 2/2 | 2/2 ||| 4/4 (100%) |
158-
| Client: Auth | 0/2 | 3/3 | 6/11 | 0/1 | 0/2 | 9/19 (47%) |
156+
| | 2025-03-26 | 2025-06-18 | 2025-11-25 | All\* |
157+
| ------------ | ---------- | ---------- | ---------- | ------------ |
158+
| Server || 26/26 | 4/4 | 30/30 (100%) |
159+
| Client: Core || 2/2 | 2/2 | 4/4 (100%) |
160+
| Client: Auth | 2/2 | 3/3 | 6/11 | 8/16 (50%) |
161+
162+
Informational (not scored for tier):
163+
164+
| | draft | extension |
165+
| ------------ | ----- | --------- |
166+
| Client: Auth | 0/1 | 0/2 |
167+
168+
The tier-scoring table only includes date-versioned scenarios. `draft` and `extension` scenarios are shown separately as informational — they do not affect tier advancement.
159169

160170
This immediately shows where failures concentrate. Failures clustered in Client: Auth / `2025-11-25` means "new auth features not yet implemented" — a scope gap, not a quality problem. Failures in Server or Client: Core are more concerning.
161171

@@ -205,15 +215,21 @@ After the subagents finish, output a short executive summary directly to the use
205215
206216
Conformance:
207217
208-
| | 2025-03-26 | 2025-06-18 | 2025-11-25 | draft | extension | All* | T2 | T1 |
209-
|--------------|------------|------------|------------|-------|-----------|-------|----|----|
210-
| Server | — | pass/total | pass/total | — | — | pass/total (rate%) | ✓/✗ | ✓/✗ |
211-
| Client: Core | — | pass/total | pass/total | — | — | pass/total (rate%) | — | — |
212-
| Client: Auth | pass/total | pass/total | pass/total | pass/total | pass/total | pass/total (rate%) | — | — |
213-
| **Client Total** | | | | | | **pass/total (rate%)** | **✓/✗** | **✓/✗** |
218+
| | 2025-03-26 | 2025-06-18 | 2025-11-25 | All* | T2 | T1 |
219+
|--------------|------------|------------|------------|------|----|----|
220+
| Server | — | pass/total | pass/total | pass/total (rate%) | ✓/✗ | ✓/✗ |
221+
| Client: Core | — | pass/total | pass/total | pass/total (rate%) | — | — |
222+
| Client: Auth | pass/total | pass/total | pass/total | pass/total (rate%) | — | — |
223+
| **Client Total** | | | | **pass/total (rate%)** | **✓/✗** | **✓/✗** |
214224
215225
\* unique scenarios — a scenario may apply to multiple spec versions
216226
227+
Informational (not scored for tier):
228+
229+
| | draft | extension |
230+
|--------------|-------|-----------|
231+
| Client: Auth | pass/total | pass/total |
232+
217233
If a baseline file was found, add a note below the conformance table:
218234
> **Baseline**: {N} failures in `baseline.yml` ({list by cell, e.g. "6 in Client: Auth/2025-11-25, 2 in Client: Auth/extension"}).
219235

src/tier-check/checks/test-conformance-results.ts

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,15 @@ import {
88
listActiveClientScenarios,
99
getScenarioSpecVersions
1010
} from '../../scenarios';
11-
import { ConformanceCheck } from '../../types';
11+
import { ConformanceCheck, SpecVersion } from '../../types';
12+
13+
const NON_SCORING_VERSIONS: SpecVersion[] = ['draft', 'extension'];
14+
15+
/** Whether a scenario counts toward tier scoring (has at least one date-versioned spec). */
16+
function isTierScoring(specVersions?: SpecVersion[]): boolean {
17+
if (!specVersions || specVersions.length === 0) return true; // unknown = count it
18+
return specVersions.some((v) => !NON_SCORING_VERSIONS.includes(v));
19+
}
1220

1321
/**
1422
* Parse conformance results from an output directory.
@@ -132,7 +140,16 @@ function reconcileWithExpected(
132140
}
133141
}
134142

135-
result.pass_rate = result.total > 0 ? result.passed / result.total : 0;
143+
// pass_rate only counts tier-scoring scenarios (date-versioned, not draft/extension).
144+
// passed/failed/total reflect ALL scenarios for full reporting; pass_rate and status
145+
// reflect only tier-scoring scenarios for tier logic.
146+
const tierDetails = result.details.filter((d) =>
147+
isTierScoring(d.specVersions)
148+
);
149+
const tierPassed = tierDetails.filter((d) => d.passed).length;
150+
const tierTotal = tierDetails.length;
151+
152+
result.pass_rate = tierTotal > 0 ? tierPassed / tierTotal : 0;
136153
result.status =
137154
result.pass_rate >= 1.0
138155
? 'pass'

src/tier-check/output.ts

Lines changed: 96 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -23,23 +23,28 @@ function statusIcon(status: CheckStatus): string {
2323
}
2424
}
2525

26-
const SPEC_VERSIONS = [
27-
'2025-03-26',
28-
'2025-06-18',
29-
'2025-11-25',
30-
'draft',
31-
'extension'
32-
] as const;
26+
const TIER_SPEC_VERSIONS = ['2025-03-26', '2025-06-18', '2025-11-25'] as const;
27+
28+
const INFO_SPEC_VERSIONS = ['draft', 'extension'] as const;
3329

3430
type Cell = { passed: number; total: number };
3531

3632
interface MatrixRow {
3733
cells: Map<string, Cell>;
38-
unique: Cell;
34+
/** Unique scenario counts for tier-scoring versions only. */
35+
tierUnique: Cell;
36+
/** Unique scenario counts for informational versions only. */
37+
infoUnique: Cell;
3938
}
4039

40+
const INFO_SET = new Set<string>(INFO_SPEC_VERSIONS);
41+
4142
function newRow(): MatrixRow {
42-
return { cells: new Map(), unique: { passed: 0, total: 0 } };
43+
return {
44+
cells: new Map(),
45+
tierUnique: { passed: 0, total: 0 },
46+
infoUnique: { passed: 0, total: 0 }
47+
};
4348
}
4449

4550
interface ConformanceMatrix {
@@ -58,29 +63,32 @@ function buildConformanceMatrix(
5863
clientAuth: newRow()
5964
};
6065

61-
for (const d of server.details) {
62-
matrix.server.unique.total++;
63-
if (d.passed) matrix.server.unique.passed++;
64-
for (const v of d.specVersions ?? ['unknown']) {
65-
const cell = matrix.server.cells.get(v) ?? { passed: 0, total: 0 };
66+
function addToRow(
67+
row: MatrixRow,
68+
d: { passed: boolean; specVersions?: string[] }
69+
) {
70+
const versions = d.specVersions ?? ['unknown'];
71+
const isTierScoring = versions.some((v) => !INFO_SET.has(v));
72+
const bucket = isTierScoring ? row.tierUnique : row.infoUnique;
73+
bucket.total++;
74+
if (d.passed) bucket.passed++;
75+
for (const v of versions) {
76+
const cell = row.cells.get(v) ?? { passed: 0, total: 0 };
6677
cell.total++;
6778
if (d.passed) cell.passed++;
68-
matrix.server.cells.set(v, cell);
79+
row.cells.set(v, cell);
6980
}
7081
}
7182

83+
for (const d of server.details) {
84+
addToRow(matrix.server, d);
85+
}
86+
7287
for (const d of client.details) {
7388
const row = d.scenario.startsWith('auth/')
7489
? matrix.clientAuth
7590
: matrix.clientCore;
76-
row.unique.total++;
77-
if (d.passed) row.unique.passed++;
78-
for (const v of d.specVersions ?? ['unknown']) {
79-
const cell = row.cells.get(v) ?? { passed: 0, total: 0 };
80-
cell.total++;
81-
if (d.passed) cell.passed++;
82-
row.cells.set(v, cell);
83-
}
91+
addToRow(row, d);
8492
}
8593

8694
return matrix;
@@ -121,9 +129,10 @@ export function formatMarkdown(scorecard: TierScorecard): string {
121129
c.client_conformance as ConformanceResult
122130
);
123131

132+
// Tier-scoring matrix
124133
lines.push('');
125-
lines.push(`| | ${SPEC_VERSIONS.join(' | ')} | All* |`);
126-
lines.push(`|---|${SPEC_VERSIONS.map(() => '---|').join('')}---|`);
134+
lines.push(`| | ${TIER_SPEC_VERSIONS.join(' | ')} | All* |`);
135+
lines.push(`|---|${TIER_SPEC_VERSIONS.map(() => '---|').join('')}---|`);
127136

128137
const mdRows: [string, MatrixRow][] = [
129138
['Server', matrix.server],
@@ -133,14 +142,39 @@ export function formatMarkdown(scorecard: TierScorecard): string {
133142

134143
for (const [label, row] of mdRows) {
135144
lines.push(
136-
`| ${label} | ${SPEC_VERSIONS.map((v) => formatCell(row.cells.get(v))).join(' | ')} | ${formatRate(row.unique)} |`
145+
`| ${label} | ${TIER_SPEC_VERSIONS.map((v) => formatCell(row.cells.get(v))).join(' | ')} | ${formatRate(row.tierUnique)} |`
137146
);
138147
}
139148

140149
lines.push('');
141150
lines.push(
142151
'_* unique scenarios — a scenario may apply to multiple spec versions_'
143152
);
153+
154+
// Informational matrix (draft/extension)
155+
const hasInfoMd = mdRows.some(([, row]) =>
156+
INFO_SPEC_VERSIONS.some((v) => {
157+
const cell = row.cells.get(v);
158+
return cell && cell.total > 0;
159+
})
160+
);
161+
if (hasInfoMd) {
162+
lines.push('');
163+
lines.push('_Informational (not scored for tier):_');
164+
lines.push('');
165+
lines.push(`| | ${INFO_SPEC_VERSIONS.join(' | ')} |`);
166+
lines.push(`|---|${INFO_SPEC_VERSIONS.map(() => '---|').join('')}`);
167+
for (const [label, row] of mdRows) {
168+
const hasData = INFO_SPEC_VERSIONS.some((v) => {
169+
const cell = row.cells.get(v);
170+
return cell && cell.total > 0;
171+
});
172+
if (!hasData) continue;
173+
lines.push(
174+
`| ${label} | ${INFO_SPEC_VERSIONS.map((v) => formatCell(row.cells.get(v))).join(' | ')} |`
175+
);
176+
}
177+
}
144178
lines.push('');
145179
lines.push(
146180
`| Labels | ${c.labels.status} | ${c.labels.present}/${c.labels.required} required labels${c.labels.missing.length > 0 ? ` (missing: ${c.labels.missing.join(', ')})` : ''} |`
@@ -208,8 +242,9 @@ export function formatTerminal(scorecard: TierScorecard): void {
208242
const rp = (s: string, w: number) => s.padStart(w);
209243
const lp = (s: string, w: number) => s.padEnd(w);
210244

245+
// Tier-scoring matrix (date-versioned specs only)
211246
console.log(
212-
` ${COLORS.DIM}${lp('', lw + 2)} ${SPEC_VERSIONS.map((v) => rp(v, vw)).join(' ')} ${rp('All*', tw)}${COLORS.RESET}`
247+
` ${COLORS.DIM}${lp('', lw + 2)} ${TIER_SPEC_VERSIONS.map((v) => rp(v, vw)).join(' ')} ${rp('All*', tw)}${COLORS.RESET}`
213248
);
214249

215250
const rows: [string, MatrixRow, CheckStatus | null, boolean][] = [
@@ -223,21 +258,49 @@ export function formatTerminal(scorecard: TierScorecard): void {
223258
const b = bold ? COLORS.BOLD : '';
224259
const r = bold ? COLORS.RESET : '';
225260
console.log(
226-
` ${icon}${b}${lp(label, lw)}${r} ${SPEC_VERSIONS.map((v) => rp(formatCell(row.cells.get(v)), vw)).join(' ')} ${b}${rp(formatRate(row.unique), tw)}${r}`
261+
` ${icon}${b}${lp(label, lw)}${r} ${TIER_SPEC_VERSIONS.map((v) => rp(formatCell(row.cells.get(v)), vw)).join(' ')} ${b}${rp(formatRate(row.tierUnique), tw)}${r}`
227262
);
228263
}
229264

230-
// Client total line
231-
const clientTotal: Cell = {
232-
passed: matrix.clientCore.unique.passed + matrix.clientAuth.unique.passed,
233-
total: matrix.clientCore.unique.total + matrix.clientAuth.unique.total
265+
// Client total line (tier-scoring only)
266+
const clientTierTotal: Cell = {
267+
passed:
268+
matrix.clientCore.tierUnique.passed + matrix.clientAuth.tierUnique.passed,
269+
total:
270+
matrix.clientCore.tierUnique.total + matrix.clientAuth.tierUnique.total
234271
};
235272
console.log(
236-
` ${statusIcon(c.client_conformance.status)} ${COLORS.BOLD}${lp('Client Total', lw)}${COLORS.RESET} ${' '.repeat(SPEC_VERSIONS.length * (vw + 1) - 1)} ${COLORS.BOLD}${rp(formatRate(clientTotal), tw)}${COLORS.RESET}`
273+
` ${statusIcon(c.client_conformance.status)} ${COLORS.BOLD}${lp('Client Total', lw)}${COLORS.RESET} ${' '.repeat(TIER_SPEC_VERSIONS.length * (vw + 1) - 1)} ${COLORS.BOLD}${rp(formatRate(clientTierTotal), tw)}${COLORS.RESET}`
237274
);
238275
console.log(
239276
`\n ${COLORS.DIM}* unique scenarios — a scenario may apply to multiple spec versions${COLORS.RESET}`
240277
);
278+
279+
// Informational matrix (draft/extension) — only if there are any
280+
const hasInfo = rows.some(([, row]) =>
281+
INFO_SPEC_VERSIONS.some((v) => {
282+
const cell = row.cells.get(v);
283+
return cell && cell.total > 0;
284+
})
285+
);
286+
if (hasInfo) {
287+
console.log(`\n Informational (not scored for tier):\n`);
288+
console.log(
289+
` ${COLORS.DIM}${lp('', lw + 2)} ${INFO_SPEC_VERSIONS.map((v) => rp(v, vw)).join(' ')}${COLORS.RESET}`
290+
);
291+
for (const [label, row, , bold] of rows) {
292+
const hasData = INFO_SPEC_VERSIONS.some((v) => {
293+
const cell = row.cells.get(v);
294+
return cell && cell.total > 0;
295+
});
296+
if (!hasData) continue;
297+
const b = bold ? COLORS.BOLD : '';
298+
const r = bold ? COLORS.RESET : '';
299+
console.log(
300+
` ${b}${lp(label, lw)}${r} ${INFO_SPEC_VERSIONS.map((v) => rp(formatCell(row.cells.get(v)), vw)).join(' ')}`
301+
);
302+
}
303+
}
241304
console.log(`\n${COLORS.BOLD}Repository Health:${COLORS.RESET}\n`);
242305
console.log(
243306
` ${statusIcon(c.labels.status)} Labels ${c.labels.present}/${c.labels.required} required labels`

0 commit comments

Comments
 (0)