feat(065): security baseline section in baseline_v1.json (MCP-815)#569
Merged
Conversation
Fill the reserved `security` block of the Spec 065 D2 regression baseline from real scorer output. Numbers measured end-to-end (no estimates): cmd/scan-eval @ 33bb6e3 over security_corpus_v1.json (43 entries) -> mcp-eval security @ 76df3a47 (corpus-id coverage on) sensitive-data (deterministic, N=1): P=0.667 R=0.100 F1=0.174 FPR=0.043 (tp=2 fp=1 tn=22 fn=18). Per-detector floats copied verbatim (full precision) so a fresh CI run diffs cleanly against the anchor. Baked-in gate (approved at Gate 2, board-accepted 2026-06-02): fpr_ceiling=0.10 recall_floor=0.05 recall is intentionally low: sensitive-data detects secret/path-leak, not prompt injection; ~18/20 malicious entries are out of scope and will be owned by future detectors. The FPR ceiling is the primary quiet-security gate; recall_floor is a total-blindness anchor. Adds baseline_test.go: shape guard + a self-consistency check that the committed anchor itself passes its own gate (fails CI on a bad refresh). Feeds C1/MCP-742 (eval.yml CI gate). Retrieval section untouched (CN-002). Co-Authored-By: Paperclip <noreply@paperclip.ing>
Deploying mcpproxy-docs with
|
| Latest commit: |
7cbba8d
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://3bdfd985.mcpproxy-docs.pages.dev |
| Branch Preview URL: | https://feat-065-security-baseline.mcpproxy-docs.pages.dev |
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
📦 Build ArtifactsWorkflow Run: View Run Available Artifacts
How to DownloadOption 1: GitHub Web UI (easiest)
Option 2: GitHub CLI gh run download 26795283880 --repo smart-mcp-proxy/mcpproxy-go
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fills the reserved
securityblock of the Spec 065 / D2 regression baseline (specs/065-evaluation-foundation/datasets/baseline_v1.json) from real scorer output — the board-approved follow-up to B3/SecurityScorer (MCP-741, merged to mcp-eval76df3a47).Measured end-to-end (no estimates):
Per-detector floats are copied verbatim at full precision so a fresh CI run diffs cleanly against the committed anchor (same diffability principle as the retrieval section).
Baked-in gate (approved at Gate 2)
fpr_ceilingrecall_floorrecallis intentionally low:sensitive-datadetects secret/path leakage, not prompt injection. ~18/20 malicious corpus entries (prompt_injection/shadowing/rug_pull) are structurally out of its scope and will be owned by future detectors. The FPR ceiling is the primary quiet-security gate;recall_flooris a regression anchor against total blindness.Tests
specs/065-evaluation-foundation/datasets/baseline_test.go(new):SecurityReportshape requires.fpr ≤ ceiling AND recall ≥ floor), so a future refresh that bakes in gate-violating numbers fails CI before the scorer even runs.Feeds C1 / MCP-742 (the
eval.ymlCI gate, which needs a committed security baseline to gate against). Retrieval section + all other top-level keys untouched (dataset immutability, CN-002/CN-004).🤖 Generated with Claude Code