fix: tier-check branch support and draft/extension scoring (#153)

felixweinberger · web-flow · commit 066b2d70800a · 2026-02-17T13:09:31.000Z
* fix: pass --branch to tier-check CLI in tier-audit skill

The skill was not forwarding the --branch argument to the tier-check
CLI, causing policy signal checks to always run against the repo's
default branch. Files on feature branches showed as 'Not found.'

Now derives the branch from the local checkout if not explicitly
provided, and always passes it to the CLI.

* fix: exclude draft/extension scenarios from tier-scoring conformance rates

SEP-1730 says date-versioned scenarios count toward tier scoring while
draft and extension scenarios are informational. The CLI was including
all scenarios in pass_rate, causing extension-only failures to block
Tier 1.

Changes:
- pass_rate now only counts scenarios with at least one date-versioned
  spec version
- Terminal and markdown output split into a tier-scoring matrix
  (date-versioned + All*) and an informational section (draft/extension)
  that only renders when there are draft/extension scenarios
diff --git a/.claude/skills/mcp-sdk-tier-audit/SKILL.md b/.claude/skills/mcp-sdk-tier-audit/SKILL.md
@@ -5,7 +5,7 @@ description: >-
   Produces tier classification (1/2/3) with evidence table, gap list, and
   remediation guide. Works for any official MCP SDK (TypeScript, Python, Go,
   C#, Java, Kotlin, PHP, Swift, Rust, Ruby).
-argument-hint: '<local-path> <conformance-server-url> [client-cmd]'
+argument-hint: '<local-path> <conformance-server-url> [client-cmd] [--branch <branch>]'
 ---
 
 # MCP SDK Tier Audit
@@ -43,6 +43,7 @@ Extract from the user's input:
 - **local-path**: absolute path to the SDK checkout (e.g. `~/src/mcp/typescript-sdk`)
 - **conformance-server-url**: URL where the SDK's everything server is already running (e.g. `http://localhost:3000/mcp`)
 - **client-cmd** (optional): command to run the SDK's conformance client (e.g. `npx tsx test/conformance/src/everythingClient.ts`). If not provided, client conformance tests are skipped and noted as a gap in the report.
+- **branch** (optional): Git branch to check on GitHub (e.g. `--branch fweinberger/v1x-governance-docs`). If not provided, derive from the local checkout's current branch: `cd <local-path> && git rev-parse --abbrev-ref HEAD`. This is passed to the tier-check CLI so that policy signal file checks use the correct branch instead of the repo's default branch.
 
 The first two arguments are required. If either is missing, ask the user to provide it.
 
@@ -59,12 +60,13 @@ The `tier-check` CLI handles all deterministic checks — server conformance, cl
 ```bash
 npm run --silent tier-check -- \
   --repo <owner/repo> \
+  --branch <branch> \
   --conformance-server-url <conformance-server-url> \
   --client-cmd '<client-cmd>' \
   --output json
 ```
 
-If no client-cmd was detected, omit the `--client-cmd` flag (client conformance will be skipped).
+If no client-cmd was detected, omit the `--client-cmd` flag (client conformance will be skipped). The `--branch` flag should always be included (derived from the local checkout if not explicitly provided).
 
 The CLI output includes server conformance pass rate, client conformance pass rate (with per-spec-version breakdown), issue triage compliance, P0 resolution times, label taxonomy, stable release status, policy signal files, and spec tracking gap. Parse the JSON output to feed into Step 4.
 
@@ -115,8 +117,8 @@ Combine the deterministic scorecard (from the CLI) with the evaluation results (
 
 ### Tier 1 requires ALL of:
 
-- Server conformance test pass rate == 100%
-- Client conformance test pass rate == 100%
+- Server conformance test pass rate == 100% (date-versioned scenarios only; `draft` and `extension` are informational and not scored)
+- Client conformance test pass rate == 100% (date-versioned scenarios only; `draft` and `extension` are informational and not scored)
 - Issue triage compliance >= 90% within 2 business days
 - All P0 bugs resolved within 7 days
 - Stable release >= 1.0.0 with no pre-release suffix
@@ -127,8 +129,8 @@ Combine the deterministic scorecard (from the CLI) with the evaluation results (
 
 ### Tier 2 requires ALL of:
 
-- Server conformance test pass rate >= 80%
-- Client conformance test pass rate >= 80%
+- Server conformance test pass rate >= 80% (date-versioned scenarios only)
+- Client conformance test pass rate >= 80% (date-versioned scenarios only)
 - Issue triage compliance >= 80% within 1 month
 - P0 bugs resolved within 2 weeks
 - At least one stable release >= 1.0.0
@@ -151,11 +153,19 @@ The **full suite** pass rates (server total, client total) are used for tier thr
 
 Example:
 
-|              | 2025-03-26 | 2025-06-18 | 2025-11-25 | draft | extension | All\*        |
-| ------------ | ---------- | ---------- | ---------- | ----- | --------- | ------------ |
-| Server       | —          | 26/26      | 4/4        | —     | —         | 30/30 (100%) |
-| Client: Core | —          | 2/2        | 2/2        | —     | —         | 4/4 (100%)   |
-| Client: Auth | 0/2        | 3/3        | 6/11       | 0/1   | 0/2       | 9/19 (47%)   |
+|              | 2025-03-26 | 2025-06-18 | 2025-11-25 | All\*        |
+| ------------ | ---------- | ---------- | ---------- | ------------ |
+| Server       | —          | 26/26      | 4/4        | 30/30 (100%) |
+| Client: Core | —          | 2/2        | 2/2        | 4/4 (100%)   |
+| Client: Auth | 2/2        | 3/3        | 6/11       | 8/16 (50%)   |
+
+Informational (not scored for tier):
+
+|              | draft | extension |
+| ------------ | ----- | --------- |
+| Client: Auth | 0/1   | 0/2       |
+
+The tier-scoring table only includes date-versioned scenarios. `draft` and `extension` scenarios are shown separately as informational — they do not affect tier advancement.
 
 This immediately shows where failures concentrate. Failures clustered in Client: Auth / `2025-11-25` means "new auth features not yet implemented" — a scope gap, not a quality problem. Failures in Server or Client: Core are more concerning.
 
@@ -205,15 +215,21 @@ After the subagents finish, output a short executive summary directly to the use
 
 Conformance:
 
-|              | 2025-03-26 | 2025-06-18 | 2025-11-25 | draft | extension | All* | T2 | T1 |
-|--------------|------------|------------|------------|-------|-----------|-------|----|----|
-| Server       | —          | pass/total | pass/total | —     | —         | pass/total (rate%) | ✓/✗ | ✓/✗ |
-| Client: Core | —          | pass/total | pass/total | —     | —         | pass/total (rate%) | — | — |
-| Client: Auth | pass/total | pass/total | pass/total | pass/total | pass/total | pass/total (rate%) | — | — |
-| **Client Total** | | | | | | **pass/total (rate%)** | **✓/✗** | **✓/✗** |
+|              | 2025-03-26 | 2025-06-18 | 2025-11-25 | All* | T2 | T1 |
+|--------------|------------|------------|------------|------|----|----|
+| Server       | —          | pass/total | pass/total | pass/total (rate%) | ✓/✗ | ✓/✗ |
+| Client: Core | —          | pass/total | pass/total | pass/total (rate%) | — | — |
+| Client: Auth | pass/total | pass/total | pass/total | pass/total (rate%) | — | — |
+| **Client Total** | | | | **pass/total (rate%)** | **✓/✗** | **✓/✗** |
 
 \* unique scenarios — a scenario may apply to multiple spec versions
 
+Informational (not scored for tier):
+
+|              | draft | extension |
+|--------------|-------|-----------|
+| Client: Auth | pass/total | pass/total |
+
 If a baseline file was found, add a note below the conformance table:
 > **Baseline**: {N} failures in `baseline.yml` ({list by cell, e.g. "6 in Client: Auth/2025-11-25, 2 in Client: Auth/extension"}).
 
diff --git a/src/tier-check/checks/test-conformance-results.ts b/src/tier-check/checks/test-conformance-results.ts
@@ -8,7 +8,15 @@ import {
   listActiveClientScenarios,
   getScenarioSpecVersions
 } from '../../scenarios';
-import { ConformanceCheck } from '../../types';
+import { ConformanceCheck, SpecVersion } from '../../types';
+
+const NON_SCORING_VERSIONS: SpecVersion[] = ['draft', 'extension'];
+
+/** Whether a scenario counts toward tier scoring (has at least one date-versioned spec). */
+function isTierScoring(specVersions?: SpecVersion[]): boolean {
+  if (!specVersions || specVersions.length === 0) return true; // unknown = count it
+  return specVersions.some((v) => !NON_SCORING_VERSIONS.includes(v));
+}
 
 /**
  * Parse conformance results from an output directory.
@@ -132,7 +140,16 @@ function reconcileWithExpected(
     }
   }
 
-  result.pass_rate = result.total > 0 ? result.passed / result.total : 0;
+  // pass_rate only counts tier-scoring scenarios (date-versioned, not draft/extension).
+  // passed/failed/total reflect ALL scenarios for full reporting; pass_rate and status
+  // reflect only tier-scoring scenarios for tier logic.
+  const tierDetails = result.details.filter((d) =>
+    isTierScoring(d.specVersions)
+  );
+  const tierPassed = tierDetails.filter((d) => d.passed).length;
+  const tierTotal = tierDetails.length;
+
+  result.pass_rate = tierTotal > 0 ? tierPassed / tierTotal : 0;
   result.status =
     result.pass_rate >= 1.0
       ? 'pass'
diff --git a/src/tier-check/output.ts b/src/tier-check/output.ts
@@ -23,23 +23,28 @@ function statusIcon(status: CheckStatus): string {
   }
 }
 
-const SPEC_VERSIONS = [
-  '2025-03-26',
-  '2025-06-18',
-  '2025-11-25',
-  'draft',
-  'extension'
-] as const;
+const TIER_SPEC_VERSIONS = ['2025-03-26', '2025-06-18', '2025-11-25'] as const;
+
+const INFO_SPEC_VERSIONS = ['draft', 'extension'] as const;
 
 type Cell = { passed: number; total: number };
 
 interface MatrixRow {
   cells: Map<string, Cell>;
-  unique: Cell;
+  /** Unique scenario counts for tier-scoring versions only. */
+  tierUnique: Cell;
+  /** Unique scenario counts for informational versions only. */
+  infoUnique: Cell;
 }
 
+const INFO_SET = new Set<string>(INFO_SPEC_VERSIONS);
+
 function newRow(): MatrixRow {
-  return { cells: new Map(), unique: { passed: 0, total: 0 } };
+  return {
+    cells: new Map(),
+    tierUnique: { passed: 0, total: 0 },
+    infoUnique: { passed: 0, total: 0 }
+  };
 }
 
 interface ConformanceMatrix {
@@ -58,29 +63,32 @@ function buildConformanceMatrix(
     clientAuth: newRow()
   };
 
-  for (const d of server.details) {
-    matrix.server.unique.total++;
-    if (d.passed) matrix.server.unique.passed++;
-    for (const v of d.specVersions ?? ['unknown']) {
-      const cell = matrix.server.cells.get(v) ?? { passed: 0, total: 0 };
+  function addToRow(
+    row: MatrixRow,
+    d: { passed: boolean; specVersions?: string[] }
+  ) {
+    const versions = d.specVersions ?? ['unknown'];
+    const isTierScoring = versions.some((v) => !INFO_SET.has(v));
+    const bucket = isTierScoring ? row.tierUnique : row.infoUnique;
+    bucket.total++;
+    if (d.passed) bucket.passed++;
+    for (const v of versions) {
+      const cell = row.cells.get(v) ?? { passed: 0, total: 0 };
       cell.total++;
       if (d.passed) cell.passed++;
-      matrix.server.cells.set(v, cell);
+      row.cells.set(v, cell);
     }
   }
 
+  for (const d of server.details) {
+    addToRow(matrix.server, d);
+  }
+
   for (const d of client.details) {
     const row = d.scenario.startsWith('auth/')
       ? matrix.clientAuth
       : matrix.clientCore;
-    row.unique.total++;
-    if (d.passed) row.unique.passed++;
-    for (const v of d.specVersions ?? ['unknown']) {
-      const cell = row.cells.get(v) ?? { passed: 0, total: 0 };
-      cell.total++;
-      if (d.passed) cell.passed++;
-      row.cells.set(v, cell);
-    }
+    addToRow(row, d);
   }
 
   return matrix;
@@ -121,9 +129,10 @@ export function formatMarkdown(scorecard: TierScorecard): string {
     c.client_conformance as ConformanceResult
   );
 
+  // Tier-scoring matrix
   lines.push('');
-  lines.push(`| | ${SPEC_VERSIONS.join(' | ')} | All* |`);
-  lines.push(`|---|${SPEC_VERSIONS.map(() => '---|').join('')}---|`);
+  lines.push(`| | ${TIER_SPEC_VERSIONS.join(' | ')} | All* |`);
+  lines.push(`|---|${TIER_SPEC_VERSIONS.map(() => '---|').join('')}---|`);
 
   const mdRows: [string, MatrixRow][] = [
     ['Server', matrix.server],
@@ -133,14 +142,39 @@ export function formatMarkdown(scorecard: TierScorecard): string {
 
   for (const [label, row] of mdRows) {
     lines.push(
-      `| ${label} | ${SPEC_VERSIONS.map((v) => formatCell(row.cells.get(v))).join(' | ')} | ${formatRate(row.unique)} |`
+      `| ${label} | ${TIER_SPEC_VERSIONS.map((v) => formatCell(row.cells.get(v))).join(' | ')} | ${formatRate(row.tierUnique)} |`
     );
   }
 
   lines.push('');
   lines.push(
     '_* unique scenarios — a scenario may apply to multiple spec versions_'
   );
+
+  // Informational matrix (draft/extension)
+  const hasInfoMd = mdRows.some(([, row]) =>
+    INFO_SPEC_VERSIONS.some((v) => {
+      const cell = row.cells.get(v);
+      return cell && cell.total > 0;
+    })
+  );
+  if (hasInfoMd) {
+    lines.push('');
+    lines.push('_Informational (not scored for tier):_');
+    lines.push('');
+    lines.push(`| | ${INFO_SPEC_VERSIONS.join(' | ')} |`);
+    lines.push(`|---|${INFO_SPEC_VERSIONS.map(() => '---|').join('')}`);
+    for (const [label, row] of mdRows) {
+      const hasData = INFO_SPEC_VERSIONS.some((v) => {
+        const cell = row.cells.get(v);
+        return cell && cell.total > 0;
+      });
+      if (!hasData) continue;
+      lines.push(
+        `| ${label} | ${INFO_SPEC_VERSIONS.map((v) => formatCell(row.cells.get(v))).join(' | ')} |`
+      );
+    }
+  }
   lines.push('');
   lines.push(
     `| Labels | ${c.labels.status} | ${c.labels.present}/${c.labels.required} required labels${c.labels.missing.length > 0 ? ` (missing: ${c.labels.missing.join(', ')})` : ''} |`
@@ -208,8 +242,9 @@ export function formatTerminal(scorecard: TierScorecard): void {
   const rp = (s: string, w: number) => s.padStart(w);
   const lp = (s: string, w: number) => s.padEnd(w);
 
+  // Tier-scoring matrix (date-versioned specs only)
   console.log(
-    `  ${COLORS.DIM}${lp('', lw + 2)} ${SPEC_VERSIONS.map((v) => rp(v, vw)).join(' ')}  ${rp('All*', tw)}${COLORS.RESET}`
+    `  ${COLORS.DIM}${lp('', lw + 2)} ${TIER_SPEC_VERSIONS.map((v) => rp(v, vw)).join(' ')}  ${rp('All*', tw)}${COLORS.RESET}`
   );
 
   const rows: [string, MatrixRow, CheckStatus | null, boolean][] = [
@@ -223,21 +258,49 @@ export function formatTerminal(scorecard: TierScorecard): void {
     const b = bold ? COLORS.BOLD : '';
     const r = bold ? COLORS.RESET : '';
     console.log(
-      `  ${icon}${b}${lp(label, lw)}${r} ${SPEC_VERSIONS.map((v) => rp(formatCell(row.cells.get(v)), vw)).join(' ')}  ${b}${rp(formatRate(row.unique), tw)}${r}`
+      `  ${icon}${b}${lp(label, lw)}${r} ${TIER_SPEC_VERSIONS.map((v) => rp(formatCell(row.cells.get(v)), vw)).join(' ')}  ${b}${rp(formatRate(row.tierUnique), tw)}${r}`
     );
   }
 
-  // Client total line
-  const clientTotal: Cell = {
-    passed: matrix.clientCore.unique.passed + matrix.clientAuth.unique.passed,
-    total: matrix.clientCore.unique.total + matrix.clientAuth.unique.total
+  // Client total line (tier-scoring only)
+  const clientTierTotal: Cell = {
+    passed:
+      matrix.clientCore.tierUnique.passed + matrix.clientAuth.tierUnique.passed,
+    total:
+      matrix.clientCore.tierUnique.total + matrix.clientAuth.tierUnique.total
   };
   console.log(
-    `  ${statusIcon(c.client_conformance.status)} ${COLORS.BOLD}${lp('Client Total', lw)}${COLORS.RESET} ${' '.repeat(SPEC_VERSIONS.length * (vw + 1) - 1)}  ${COLORS.BOLD}${rp(formatRate(clientTotal), tw)}${COLORS.RESET}`
+    `  ${statusIcon(c.client_conformance.status)} ${COLORS.BOLD}${lp('Client Total', lw)}${COLORS.RESET} ${' '.repeat(TIER_SPEC_VERSIONS.length * (vw + 1) - 1)}  ${COLORS.BOLD}${rp(formatRate(clientTierTotal), tw)}${COLORS.RESET}`
   );
   console.log(
     `\n  ${COLORS.DIM}* unique scenarios — a scenario may apply to multiple spec versions${COLORS.RESET}`
   );
+
+  // Informational matrix (draft/extension) — only if there are any
+  const hasInfo = rows.some(([, row]) =>
+    INFO_SPEC_VERSIONS.some((v) => {
+      const cell = row.cells.get(v);
+      return cell && cell.total > 0;
+    })
+  );
+  if (hasInfo) {
+    console.log(`\n  Informational (not scored for tier):\n`);
+    console.log(
+      `  ${COLORS.DIM}${lp('', lw + 2)} ${INFO_SPEC_VERSIONS.map((v) => rp(v, vw)).join(' ')}${COLORS.RESET}`
+    );
+    for (const [label, row, , bold] of rows) {
+      const hasData = INFO_SPEC_VERSIONS.some((v) => {
+        const cell = row.cells.get(v);
+        return cell && cell.total > 0;
+      });
+      if (!hasData) continue;
+      const b = bold ? COLORS.BOLD : '';
+      const r = bold ? COLORS.RESET : '';
+      console.log(
+        `    ${b}${lp(label, lw)}${r} ${INFO_SPEC_VERSIONS.map((v) => rp(formatCell(row.cells.get(v)), vw)).join(' ')}`
+      );
+    }
+  }
   console.log(`\n${COLORS.BOLD}Repository Health:${COLORS.RESET}\n`);
   console.log(
     `  ${statusIcon(c.labels.status)} Labels         ${c.labels.present}/${c.labels.required} required labels`