Skip to content

feat(065): security baseline section in baseline_v1.json (MCP-815)#569

Merged
Dumbris merged 1 commit into
mainfrom
feat/065-security-baseline
Jun 2, 2026
Merged

feat(065): security baseline section in baseline_v1.json (MCP-815)#569
Dumbris merged 1 commit into
mainfrom
feat/065-security-baseline

Conversation

@Dumbris
Copy link
Copy Markdown
Member

@Dumbris Dumbris commented Jun 2, 2026

Summary

Fills the reserved security block of the Spec 065 / D2 regression baseline (specs/065-evaluation-foundation/datasets/baseline_v1.json) from real scorer output — the board-approved follow-up to B3/SecurityScorer (MCP-741, merged to mcp-eval 76df3a47).

Measured end-to-end (no estimates):

cmd/scan-eval @ 33bb6e39  over  security_corpus_v1.json (43 entries)
  -> mcp-eval security @ 76df3a47  (--corpus coverage on)
sensitive-data (deterministic, N=1): P=0.667 R=0.100 F1=0.174 FPR=0.043  [PASS]
  tp=2 fp=1 tn=22 fn=18

Per-detector floats are copied verbatim at full precision so a fresh CI run diffs cleanly against the committed anchor (same diffability principle as the retrieval section).

Baked-in gate (approved at Gate 2)

threshold value rationale
fpr_ceiling 0.10 observed 0.0435 → ~2× headroom; fails on the 3rd FP (a real noise regression)
recall_floor 0.05 total-blindness anchor; mirrors retrieval's 0.05 tolerance

recall is intentionally low: sensitive-data detects secret/path leakage, not prompt injection. ~18/20 malicious corpus entries (prompt_injection/shadowing/rug_pull) are structurally out of its scope and will be owned by future detectors. The FPR ceiling is the primary quiet-security gate; recall_floor is a regression anchor against total blindness.

Tests

specs/065-evaluation-foundation/datasets/baseline_test.go (new):

  • shape guard — security block carries per-detector metrics + gate thresholds the scorer's SecurityReport shape requires.
  • self-consistency guard — the committed anchor itself passes its own gate (fpr ≤ ceiling AND recall ≥ floor), so a future refresh that bakes in gate-violating numbers fails CI before the scorer even runs.
ok  specs/065-evaluation-foundation/datasets   (4/4 PASS)
gofmt / go vet / golangci-lint: clean

Feeds C1 / MCP-742 (the eval.yml CI gate, which needs a committed security baseline to gate against). Retrieval section + all other top-level keys untouched (dataset immutability, CN-002/CN-004).

Gate 3: do not self-merge — human merges on GitHub.

🤖 Generated with Claude Code

Fill the reserved `security` block of the Spec 065 D2 regression baseline
from real scorer output. Numbers measured end-to-end (no estimates):

  cmd/scan-eval @ 33bb6e3 over security_corpus_v1.json (43 entries)
  -> mcp-eval security @ 76df3a47 (corpus-id coverage on)

sensitive-data (deterministic, N=1): P=0.667 R=0.100 F1=0.174 FPR=0.043
(tp=2 fp=1 tn=22 fn=18). Per-detector floats copied verbatim (full
precision) so a fresh CI run diffs cleanly against the anchor.

Baked-in gate (approved at Gate 2, board-accepted 2026-06-02):
  fpr_ceiling=0.10  recall_floor=0.05

recall is intentionally low: sensitive-data detects secret/path-leak, not
prompt injection; ~18/20 malicious entries are out of scope and will be
owned by future detectors. The FPR ceiling is the primary quiet-security
gate; recall_floor is a total-blindness anchor.

Adds baseline_test.go: shape guard + a self-consistency check that the
committed anchor itself passes its own gate (fails CI on a bad refresh).
Feeds C1/MCP-742 (eval.yml CI gate). Retrieval section untouched (CN-002).

Co-Authored-By: Paperclip <noreply@paperclip.ing>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying mcpproxy-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 7cbba8d
Status: ✅  Deploy successful!
Preview URL: https://3bdfd985.mcpproxy-docs.pages.dev
Branch Preview URL: https://feat-065-security-baseline.mcpproxy-docs.pages.dev

View logs

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

📦 Build Artifacts

Workflow Run: View Run
Branch: feat/065-security-baseline

Available Artifacts

  • archive-darwin-amd64 (28 MB)
  • archive-darwin-arm64 (25 MB)
  • archive-linux-amd64 (16 MB)
  • archive-linux-arm64 (14 MB)
  • archive-windows-amd64 (27 MB)
  • archive-windows-arm64 (24 MB)
  • frontend-dist-pr (0 MB)
  • installer-dmg-darwin-amd64 (21 MB)
  • installer-dmg-darwin-arm64 (19 MB)

How to Download

Option 1: GitHub Web UI (easiest)

  1. Go to the workflow run page linked above
  2. Scroll to the bottom "Artifacts" section
  3. Click on the artifact you want to download

Option 2: GitHub CLI

gh run download 26795283880 --repo smart-mcp-proxy/mcpproxy-go

Note: Artifacts expire in 14 days.

@Dumbris Dumbris merged commit c3baf9b into main Jun 2, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants