Skip to content

feat: ScopeBlind protect-mcp integration — Cedar policy enforcement + verifiable receipts#667

Merged
imran-siddique merged 4 commits intomicrosoft:mainfrom
tomjwxf:feat/scopeblind-protect-mcp-integration
Apr 3, 2026
Merged

feat: ScopeBlind protect-mcp integration — Cedar policy enforcement + verifiable receipts#667
imran-siddique merged 4 commits intomicrosoft:mainfrom
tomjwxf:feat/scopeblind-protect-mcp-integration

Conversation

@tomjwxf
Copy link
Copy Markdown
Contributor

@tomjwxf tomjwxf commented Apr 1, 2026

ScopeBlind protect-mcp Integration

Adds runtime Cedar policy enforcement and cryptographically verifiable decision receipts to AgentMesh, via protect-mcp (v0.4.6, MIT, 821KB).

What this PR adds

packages/agentmesh-integrations/scopeblind-protect-mcp/ — a new integration package with:

Component Purpose
CedarPolicyBridge Maps Cedar allow/deny into AGT evaluate() — Cedar deny is authoritative
ReceiptVerifier Validates Ed25519-signed decision receipt structure
SpendingGate Enforces issuer-blind spending authority with trust gating
scopeblind_context() Builds AGT-compatible context from protect-mcp artifacts

Architecture — complementary to mcp-trust-proxy

AGT's mcp-trust-proxy gates on trust scores (soft signals). protect-mcp gates on Cedar policies (formal, deterministic, auditable). They compose:

Layer AGT (existing) protect-mcp (this PR)
Analysis MCP Security Scanner (static) Cedar WASM (runtime, every call)
Identity DID + trust scores Ed25519 passports + VOPRF
Decisions PolicyEngine evaluate() Cedar allow/deny + signed receipts
Proof Audit log Cryptographic receipts (offline-verifiable)
Privacy Trust scores visible Issuer-blind (verifier learns nothing about issuer)

Cedar deny is a hard constraint — it cannot be overridden by a high trust score. Trust scoring is a soft signal layered on top of Cedar allow decisions.

Key differentiator: issuer-blind verification

protect-mcp receipts use VOPRF (RFC 9497) so the verifier can confirm a receipt is valid without learning which organization issued it. This prevents supply-chain surveillance — you can prove compliance without revealing your org structure.

AGT policy rule example

- name: require-cedar-allow
  type: capability
  conditions:
    scopeblind.cedar.effect: 'allow'
    scopeblind.receipt.present: true
  allowed_actions:
    - 'tool_call.*'

Tests

36 tests covering: Cedar decision parsing, policy bridge authorization (deny authoritative, trust layering, receipt requirements, trust cap), receipt validation (structure, types, spending authority, AGT context), spending gate (amount limits, categories, utilization bands, trust floors, receipt requirements), and context shape compatibility.

============================== 36 passed in 0.04s ==============================

Protocol

Decision receipts follow the Veritas Acta signed receipt format, an IETF Internet-Draft for portable, verifiable decision artifacts.

protect-mcp: npm | GitHub | Docs | Cedar policies

…nd verifiable decision receipts

Adds `packages/agentmesh-integrations/scopeblind-protect-mcp/` with four components:

- CedarPolicyBridge: maps Cedar allow/deny decisions into AGT evaluate() —
  Cedar deny is authoritative and cannot be overridden by trust scores
- ReceiptVerifier: validates Ed25519-signed decision receipt structure,
  converts to AGT-compatible context
- SpendingGate: enforces issuer-blind spending authority with trust-score
  gating and utilization band checks
- scopeblind_context(): builds AGT-compatible context from protect-mcp artifacts

Key architectural difference from mcp-trust-proxy:
  mcp-trust-proxy gates on trust scores (soft signals).
  protect-mcp gates on Cedar policies (formal, deterministic, auditable).
  Decision receipts provide cryptographic proof via IETF Internet-Draft
  draft-farley-acta-signed-receipts.

36 tests covering: Cedar decision parsing, policy bridge authorization,
receipt validation, spending gate limits/categories/utilization bands,
and AGT context shape compatibility.

protect-mcp: https://www.npmjs.com/package/protect-mcp (v0.4.6, MIT)
Docs: https://scopeblind.com/docs/protect-mcp

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

Welcome to the Agent Governance Toolkit! Thanks for your first pull request.
Please ensure tests pass, code follows style (ruff check), and you have signed the CLA.
See our Contributing Guide.

@github-actions github-actions bot added the size/XL Extra large PR (500+ lines) label Apr 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

🤖 AI Agent: contributor-guide — Welcome! 🎉

Welcome! 🎉

Hi @first-time-contributor! 👋 Welcome to the microsoft/agent-governance-toolkit community, and thank you for taking the time to contribute. We’re thrilled to have you here and really appreciate your effort in submitting this pull request. Your contribution is a fantastic addition to the project, and we’re excited to review it with you!


What You Did Well 🌟

  1. Thorough Documentation: Your PR description is incredibly detailed and well-structured. The breakdown of components, architecture, and examples in the README.md is excellent and provides a clear understanding of the new functionality.

  2. Test Coverage: Including 36 passing tests is fantastic! It’s clear you’ve put a lot of thought into ensuring the robustness of your implementation.

  3. Security Considerations: You've included a SECURITY.md file and provided details about how cryptographic material is handled, which is crucial for a security-sensitive project like this.

  4. Alignment with Project Goals: The integration of protect-mcp aligns well with the project's mission of enhancing agent governance, and the issuer-blind verification is a great feature that adds significant value.


Suggestions for Improvement 🛠️

While your contribution is excellent, there are a few areas where we can align it better with the project’s conventions and ensure long-term maintainability:

  1. Linting:

    • We use ruff for linting, specifically with the E, F, and W error codes. Please run ruff on your code and address any issues it flags. You can find the configuration in the root of the repository.
  2. Testing Location:

    • While your test coverage is great, the tests should be placed in the packages/agentmesh-integrations/scopeblind-protect-mcp/tests/ directory to align with our project structure. This ensures consistency across the repository.
  3. Commit Message Format:

    • We follow the Conventional Commits specification. Your commit message feat: ScopeBlind protect-mcp integration — Cedar policy enforcement + verifiable receipts is close, but it would be great if you could simplify it to something like:
      feat(scopeblind-protect-mcp): add Cedar policy enforcement and verifiable receipts
      
      This format helps us maintain a clean and consistent commit history.
  4. Security-Sensitive Code:

    • Since this PR involves cryptographic operations (e.g., Ed25519 signature validation and VOPRF), it will require extra scrutiny. While your SECURITY.md file is a great start, we recommend adding comments in the code to highlight security-critical sections. For example:
      • In ReceiptVerifier.validate_structure, clarify how you ensure the integrity of the receipt payload.
      • In SpendingGate.evaluate_spend, explain how you handle high-value transactions and ensure no sensitive data is leaked.
  5. Dependencies:

    • While protect-mcp is a lightweight dependency, it’s always good to document why it’s necessary and how it aligns with the project’s goals. Consider adding a note in the CONTRIBUTING.md file about this new dependency.

Next Steps 🚀

  1. Run ruff on your code and address any linting issues.
  2. Move your test files to packages/agentmesh-integrations/scopeblind-protect-mcp/tests/.
  3. Update your commit message to follow the Conventional Commits format.
  4. Add inline comments to highlight security-sensitive sections of your code.
  5. If you haven’t already, please review our CONTRIBUTING.md and QUICKSTART.md for additional guidance.

Once you’ve made these updates, push your changes to this branch, and we’ll take another look. If you have any questions or need help with anything, don’t hesitate to ask — we’re here to help!

Thank you again for contributing to the project. We’re excited to collaborate with you! 😊

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This PR introduces a new integration package, scopeblind-protect-mcp, which integrates the protect-mcp library into the Agent Governance Toolkit (AGT). The integration provides runtime Cedar policy enforcement and cryptographically verifiable decision receipts, enhancing the security and auditability of the AGT framework. The PR includes new components such as CedarPolicyBridge, ReceiptVerifier, SpendingGate, and scopeblind_context().

The implementation is well-documented, and the provided tests indicate a good level of coverage. However, there are several areas that require attention to ensure the robustness, security, and maintainability of the code.


🔴 CRITICAL

  1. Receipt Verification Delegation

    • The ReceiptVerifier class delegates cryptographic verification of Ed25519 signatures to external libraries (@veritasacta/verify or protect-mcp runtime). While this is acceptable, the absence of direct cryptographic verification in this library introduces a potential security risk if the external library has vulnerabilities or is compromised.
    • Action: Add a fallback mechanism to verify Ed25519 signatures directly within this library using a trusted Python cryptography library (e.g., pynacl). This ensures that the library can independently verify receipts if the external library is unavailable or untrusted.
  2. Trust Score Manipulation

    • The CedarPolicyBridge adjusts the trust score based on Cedar decisions (trust_bonus_per_allow and deny_penalty). This could lead to unintended consequences, such as trust scores being manipulated by malicious actors to bypass other security checks.
    • Action: Clearly document the rationale for modifying trust scores and consider adding safeguards to prevent abuse. For example, ensure that trust score adjustments are bounded and cannot be exploited by repeatedly triggering Cedar allow decisions.
  3. Receipt Hashing

    • The receipt hash is calculated using hashlib.sha256 and truncated to 16 characters. This truncation significantly reduces the entropy of the hash, making it susceptible to collisions.
    • Action: Use the full hash value or a cryptographically secure truncation method (e.g., HMAC with a secret key) to prevent potential collision attacks.
  4. Receipt Validation

    • The validate_structure method in ReceiptVerifier only validates the structure of the receipt and does not perform cryptographic verification. This could lead to a false sense of security if developers assume the receipt is fully validated.
    • Action: Rename the method to validate_structure_only or similar to make it clear that cryptographic validation is not performed. Additionally, provide explicit documentation about the need for external cryptographic verification.

🟡 WARNING

  1. Backward Compatibility

    • The introduction of scopeblind-protect-mcp does not appear to modify existing functionality in AGT. However, the integration of Cedar policies as "hard constraints" may impact existing workflows if users are not aware of the implications.
    • Action: Clearly document the impact of Cedar policy enforcement on existing AGT workflows. Provide migration guidance for users who may need to adapt their configurations to accommodate the new integration.
  2. Dependency Management

    • The protect-mcp library is a Node.js package, which introduces a cross-language dependency. This could complicate deployment and increase the risk of dependency-related issues.
    • Action: Ensure that the integration package includes clear instructions for installing and managing the protect-mcp dependency. Consider providing a Docker container or other pre-configured environment to simplify deployment.

💡 SUGGESTIONS

  1. Type Annotations

    • While the code includes some type annotations, they are not comprehensive. For example, the evaluate_spend method in SpendingGate has parameters without type annotations.
    • Action: Add type annotations to all methods and parameters to improve code clarity and enable static type checking.
  2. Thread Safety

    • The CedarPolicyBridge, ReceiptVerifier, and SpendingGate classes maintain internal state (e.g., _history, _verified, _decisions) without any thread-safety mechanisms. This could lead to race conditions in concurrent environments.
    • Action: Use thread-safe data structures (e.g., queue.Queue) or add locks to ensure thread safety.
  3. Error Handling

    • The code does not handle exceptions that may arise from external dependencies (e.g., JSON parsing, receipt validation). This could lead to unhandled exceptions in production.
    • Action: Add error handling for external dependencies and provide meaningful error messages to help diagnose issues.
  4. Test Coverage

    • While the PR mentions 36 tests, it is unclear if they cover edge cases, such as malformed receipts, invalid signatures, or extreme trust score values.
    • Action: Add tests for edge cases, including:
      • Receipts with missing or invalid fields.
      • Trust scores at the boundaries of trust_floor and trust_bonus.
      • Concurrent evaluations to test thread safety.
  5. Documentation

    • The documentation is comprehensive but could benefit from additional examples and diagrams to illustrate the integration between AGT and protect-mcp.
    • Action: Add diagrams or flowcharts to the README to visually explain how scopeblind-protect-mcp integrates with AGT.

Final Assessment

This PR introduces a valuable integration that enhances the security and auditability of AGT. However, the identified critical issues must be addressed to ensure the robustness and security of the implementation. Additionally, addressing the warnings and suggestions will improve the maintainability and usability of the code.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

🤖 AI Agent: security-scanner — Security Analysis of the Pull Request

Security Analysis of the Pull Request

This pull request introduces a new integration for the ScopeBlind protect-mcp library into the AgentMesh framework. The integration enables runtime enforcement of Cedar policies and the use of cryptographically verifiable decision receipts. Below is a detailed security analysis based on the specified criteria:


1. Prompt Injection Defense Bypass

  • Risk: The CedarPolicyBridge class maps Cedar policy decisions into the AGT PolicyEngine. If the CedarDecision object is not properly sanitized, it could allow malicious input to bypass policy guards.
  • Analysis: The CedarDecision class does not appear to sanitize its inputs. For example, the tool_name and policy_ids fields are directly used in the evaluate method of CedarPolicyBridge without validation. This could allow an attacker to inject malicious data into the policy evaluation process.
  • Rating: 🔴 CRITICAL
  • Fix: Add input validation and sanitization for all fields in CedarDecision. Ensure that tool_name and policy_ids conform to expected formats and do not contain malicious payloads.

2. Policy Engine Circumvention

  • Risk: Policies enforced by the CedarPolicyBridge could be bypassed if the require_receipt flag is set to False, as the receipt validation is skipped entirely.
  • Analysis: The require_receipt flag defaults to False, which could allow attackers to bypass receipt validation. This weakens the security guarantees of the system, as the cryptographic proof provided by the receipt is not enforced by default.
  • Rating: 🟠 HIGH
  • Fix: Change the default value of require_receipt to True to ensure that receipt validation is always enforced unless explicitly disabled. Additionally, log a warning if require_receipt is set to False.

3. Trust Chain Weaknesses

  • Risk: The ReceiptVerifier class validates receipt structure but delegates cryptographic verification to external libraries. If these libraries are compromised or misconfigured, the trust chain could be broken.
  • Analysis: The ReceiptVerifier does not perform cryptographic verification itself, relying on external tools like @veritasacta/verify. While this is a reasonable design choice, it introduces a dependency on the security of external libraries.
  • Rating: 🟡 MEDIUM
  • Fix: Perform an audit of the @veritasacta/verify library to ensure its security. Additionally, consider implementing a fallback mechanism for cryptographic verification within the ReceiptVerifier class.

4. Credential Exposure

  • Risk: Sensitive information such as cryptographic keys or debug data could be exposed in logs or error messages.
  • Analysis: The ReceiptVerifier logs receipt validation results, including the receipt type and tool name. While no sensitive information appears to be logged, it is important to ensure that no sensitive data (e.g., private keys, full receipt contents) is ever logged.
  • Rating: 🔵 LOW
  • Fix: Review all logging statements to ensure that no sensitive information is logged. Add a warning in the documentation about logging practices.

5. Sandbox Escape

  • Risk: The integration could allow malicious actors to escape the sandbox and execute arbitrary code on the host system.
  • Analysis: The code does not appear to execute untrusted code directly, and the use of WASM for Cedar policy evaluation provides a strong sandboxing mechanism. However, the use of threading in CedarPolicyBridge and ReceiptVerifier could introduce concurrency issues that might be exploited.
  • Rating: 🔵 LOW
  • Fix: Review the threading implementation to ensure there are no race conditions or vulnerabilities that could be exploited for sandbox escape.

6. Deserialization Attacks

  • Risk: Unsafe deserialization of untrusted data could lead to code execution or data corruption.
  • Analysis: The CedarDecision.from_receipt method deserializes receipt payloads without validating their structure or content. This could allow an attacker to craft a malicious receipt that exploits the deserialization process.
  • Rating: 🔴 CRITICAL
  • Fix: Implement strict validation of receipt payloads before deserialization. Use a schema validation library to ensure that the payload conforms to the expected structure.

7. Race Conditions

  • Risk: Concurrency issues in the CedarPolicyBridge and ReceiptVerifier classes could lead to inconsistent state or security vulnerabilities.
  • Analysis: Both classes use threading locks to manage shared state (_history and _seen_receipts). While this is a good practice, it is important to ensure that the locks are used consistently and that there are no potential deadlocks or race conditions.
  • Rating: 🟡 MEDIUM
  • Fix: Conduct a thorough review of the threading implementation to ensure that locks are used correctly and that there are no race conditions or deadlocks.

8. Supply Chain Risks

  • Risk: The protect-mcp library and its dependencies could introduce vulnerabilities if they are compromised or malicious.
  • Analysis: The protect-mcp library is a third-party dependency. While it is open-source and has a valid MIT license, its security posture and update history should be reviewed. Additionally, the @veritasacta/verify library used for cryptographic verification should be audited.
  • Rating: 🟠 HIGH
  • Fix: Perform a security audit of the protect-mcp and @veritasacta/verify libraries. Use dependency scanning tools to monitor for vulnerabilities in these libraries and their dependencies.

Summary of Findings and Recommendations

Finding Rating Recommendation
Prompt injection defense bypass 🔴 CRITICAL Validate and sanitize all inputs to CedarDecision and CedarPolicyBridge.
Policy engine circumvention 🟠 HIGH Set require_receipt to True by default and log warnings if it is disabled.
Trust chain weaknesses 🟡 MEDIUM Audit @veritasacta/verify and consider implementing fallback cryptographic checks.
Credential exposure 🔵 LOW Review logging practices to ensure no sensitive information is logged.
Sandbox escape 🔵 LOW Review threading implementation for potential race conditions or deadlocks.
Deserialization attacks 🔴 CRITICAL Validate receipt payloads before deserialization using a schema validation library.
Race conditions 🟡 MEDIUM Conduct a thorough review of threading and locking mechanisms.
Supply chain risks 🟠 HIGH Audit protect-mcp and @veritasacta/verify libraries for security vulnerabilities.

General Recommendations

  1. Testing: Add unit tests for edge cases, such as malformed receipts, missing fields, and invalid data types.
  2. Documentation: Clearly document the security implications of configuration options like require_receipt.
  3. Dependency Management: Use tools like npm audit and pip-audit to regularly scan for vulnerabilities in dependencies.

This PR introduces significant security improvements to the AgentMesh framework but also introduces potential risks that need to be addressed before merging.

@tomjwxf
Copy link
Copy Markdown
Contributor Author

tomjwxf commented Apr 2, 2026

@microsoft-github-policy-service agree company="ScopeBlind Pty Ltd"

@tomjwxf
Copy link
Copy Markdown
Contributor Author

tomjwxf commented Apr 2, 2026

Thanks for the thorough review. Addressing each point:

🔴 CRITICAL responses:

  1. Receipt Verification Delegation — Agreed. This adapter intentionally validates structure only (schema conformance). Cryptographic verification is delegated to @veritasacta/verify (Apache-2.0, zero ScopeBlind dependency) which handles JCS canonicalization + Ed25519. Adding direct pynacl verification as a fallback is a good suggestion — will add in a follow-up.

  2. Trust Score Manipulation — The trust bonus/penalty is bounded: min(1000, score + bonus) and max(0, score - penalty). An attacker triggering repeated Cedar allows gets capped at 1000 regardless. Will add explicit documentation of the bounds.

  3. Receipt Hash Truncation — The 16-char truncation is a reference ID for tracing, not a security-critical hash. The full SHA-256 is in the signed receipt itself. Will rename the field to receipt_ref to clarify this is a short reference, not a security hash.

  4. Receipt Validation Naming — Good call. Will rename validate_structurevalidate_structure_only with docstring clarifying that cryptographic verification requires @veritasacta/verify or equivalent Ed25519 implementation.

🟡 WARNING responses:

  1. Backward Compatibility — Cedar deny as hard constraint is by design (matches Claude Code's own hook precedence: deny > ask > allow). Will add migration guidance noting that existing AGT workflows using trust-score-only gating are unaffected unless they opt into CedarPolicyBridge.

  2. Cross-Language Dependencyprotect-mcp is the Node.js runtime. This Python adapter is standalone — it contains the bridge logic without requiring protect-mcp as a Python dependency. The adapter consumes protect-mcp's output (receipt JSON), not its runtime.

💡 SUGGESTIONS:

Will address type annotations, thread safety (adding threading.Lock for _history/_decisions), error handling, and edge case tests in a follow-up commit. The documentation diagrams suggestion is excellent — will add an architecture flow diagram to the README.

Updated IETF draft (draft-farley-acta-signed-receipts-01) adds two new receipt types relevant to this integration: protectmcp:lifecycle for agent swarm tracking and scopeblind:spending_authority for issuer-blind financial authorization. protect-mcp is now at v0.5.1.

…ty, edge case tests, full hash

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tomjwxf
Copy link
Copy Markdown
Contributor Author

tomjwxf commented Apr 2, 2026

Follow-up commit addressing all code review items:

  1. validate_structurevalidate_structure_only — renamed with expanded docstring clarifying no cryptographic verification
  2. Thread safetythreading.Lock added to _history, _verified, _decisions with all operations wrapped in lock context managers
  3. receipt_hashreceipt_ref — now uses full SHA-256 .hexdigest() (64 chars, no truncation)
  4. 20 edge case tests — empty/malformed receipts, boundary trust scores (0, 1000, exactly at floor), concurrent access (10 threads × 50 ops), malformed payloads

56 tests passing (36 original + 20 edge cases).

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Code Review Feedback for feat: ScopeBlind protect-mcp integration


🔴 CRITICAL: Security Issues

  1. Receipt Validation Delegation

    • The ReceiptVerifier class delegates cryptographic signature verification to external tools (@veritasacta/verify or protect-mcp runtime). While this is acceptable for modularity, the absence of in-library cryptographic verification introduces a risk of bypass if the external verifier is misconfigured or compromised.
    • Recommendation: Add an optional fallback mechanism to perform Ed25519 signature verification directly within the library using a Python cryptography library (e.g., pynacl).
  2. Thread Safety in ReceiptVerifier

    • The _verified list in ReceiptVerifier is accessed and modified using a threading lock. However, the lock does not guarantee atomicity for complex operations like appending and reading. This could lead to race conditions in highly concurrent environments.
    • Recommendation: Use thread-safe data structures like queue.Queue or collections.deque with maxlen for managing the _verified log.
  3. SpendingGate Utilization Band Validation

    • The evaluate_spend method in SpendingGate does not validate the utilization_band input against the UTILIZATION_BANDS set. This could allow invalid or malicious inputs to bypass checks.
    • Recommendation: Add explicit validation for utilization_band to ensure it is one of the allowed values (low, medium, high, exceeded).
  4. CedarPolicyBridge Trust Adjustment

    • The CedarPolicyBridge class adjusts trust scores (trust_bonus and deny_penalty) without bounds checking. This could lead to unintended behavior if the trust score exceeds the maximum (1000) or drops below zero.
    • Recommendation: Enforce bounds on adjusted_trust to ensure it remains within [0, 1000].

🟡 WARNING: Potential Breaking Changes

  1. Hard Dependency on protect-mcp

    • The integration assumes that protect-mcp is always available and functioning correctly. If protect-mcp introduces breaking changes or is unavailable, this integration will fail.
    • Recommendation: Add version pinning for protect-mcp in the documentation and provide fallback mechanisms or error handling for cases where protect-mcp is unavailable.
  2. Cedar Deny as Authoritative

    • The design enforces that a Cedar deny decision is final and cannot be overridden by trust scores. While this is a security feature, it could break workflows that rely on trust scores to override policy decisions.
    • Recommendation: Clearly document this behavior in the release notes and provide a configuration option to allow overrides if needed.

💡 Suggestions for Improvement

  1. Type Annotations

    • While the code includes type hints, some areas lack full type coverage. For example, the receipt parameter in ReceiptVerifier.validate_structure_only is typed as Dict[str, Any], but its structure is well-defined.
    • Recommendation: Use TypedDict or Pydantic models to define the expected structure of receipts for better type safety and validation.
  2. Pydantic Model Validation

    • The CedarDecision and other data structures could benefit from Pydantic models for validation and serialization. This would ensure stricter type safety and reduce the risk of malformed data.
    • Recommendation: Replace dataclass with Pydantic models where applicable.
  3. Logging and Observability

    • The library currently lacks logging for critical operations like receipt validation, Cedar policy evaluation, and spending gate decisions. This makes debugging and monitoring difficult.
    • Recommendation: Add structured logging (e.g., using structlog) for key operations to improve observability.
  4. Unit Test Coverage

    • While the PR mentions 36 tests, it is unclear if edge cases (e.g., malformed receipts, invalid utilization bands, extreme trust scores) are covered.
    • Recommendation: Add tests for edge cases and failure scenarios to ensure robustness.
  5. Backward Compatibility

    • The integration introduces new components but does not modify existing ones. However, the reliance on protect-mcp and the authoritative nature of Cedar deny decisions could indirectly affect existing workflows.
    • Recommendation: Provide a migration guide for users to adopt this integration without disrupting existing setups.
  6. Documentation

    • The documentation is comprehensive but could benefit from additional examples, such as how to handle errors when protect-mcp fails or how to configure the CedarPolicyBridge for different trust models.
    • Recommendation: Expand the README with advanced usage examples and troubleshooting tips.

Summary of Feedback

Category Severity Action
Security 🔴 CRITICAL Add in-library Ed25519 verification, thread-safe structures, and input validation.
Breaking Changes 🟡 WARNING Document Cedar deny behavior and dependency on protect-mcp.
Type Safety & Validation 💡 SUGGESTION Use Pydantic models and improve type annotations.
Observability 💡 SUGGESTION Add structured logging for critical operations.
Testing 💡 SUGGESTION Add edge case tests for malformed inputs and extreme scenarios.
Documentation 💡 SUGGESTION Expand README with advanced examples and troubleshooting.

By addressing these issues, the integration will be more robust, secure, and user-friendly. Let me know if you need further clarification on any of the points!

@tomjwxf
Copy link
Copy Markdown
Contributor Author

tomjwxf commented Apr 2, 2026

@imran-siddique — all items from both code reviews are addressed in commit 9a3be8a. Quick summary:

Issue Status
validate_structure naming ✅ Renamed to validate_structure_only with expanded docstring
Thread safety threading.Lock on all mutable state
Receipt hash truncation receipt_ref with full SHA-256 (64 chars)
Trust score bounds ✅ Already bounded: min(1000, ...) / max(0, ...) — now documented
Utilization band validation ✅ Already validated (line 401: if utilization_band not in self.UTILIZATION_BANDS)
Edge case tests ✅ 20 new tests (malformed receipts, boundary scores, 10-thread concurrency)

The second code review re-flagged items that were fixed in the follow-up commit. 56 tests passing.

Ready for human review when convenient. Happy to address any additional feedback.

Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid code quality with zero runtime deps — nice work! Two blocking issues:

  1. Broken upstream links — README links to github.com/scopeblind/scopeblind-gateway but that repo doesn't exist (only at github.com/tomjwxf/scopeblind-gateway). Links from a Microsoft repo must resolve.
  2. Empty signature passes as valid:True — ReceiptVerifier.validate_structure_only returns valid:True for receipts with empty signature strings. For a security adapter, this is misleading. Consider setting valid:False when signature is empty, or renaming the field.

Also worth addressing: unbounded list growth (memory leak), external company as sole author in pyproject.toml, and hardcoded stale version '0.4.6'.

… version

- Empty signature/publicKey now returns valid:False (security fix)
- All internal lists bounded to MAX_LOG=10000 (prevents memory leak)
- README links use correct case: ScopeBlind/scopeblind-gateway
- pyproject.toml adds Tom Farley as individual author
- Version reference updated from 0.4.6 to 0.5.2
- 56 tests passing
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review of PR: feat: ScopeBlind protect-mcp integration — Cedar policy enforcement + verifiable receipts


🔴 CRITICAL: Security Issues

  1. Receipt Verification Delegation:

    • The ReceiptVerifier class delegates cryptographic signature verification to external tools (@veritasacta/verify or protect-mcp runtime). While this is acceptable, the lack of direct verification in the Python code means that the integration is entirely reliant on external tools for security. If these tools are compromised or misconfigured, the system could be vulnerable to forged receipts.
    • Recommendation: Consider implementing a fallback mechanism for verifying Ed25519 signatures directly in Python using a library like pynacl. This would provide an additional layer of security and reduce reliance on external tools.
  2. Lack of Replay Protection for Receipts:

    • The ReceiptVerifier does not implement any mechanism to detect or prevent replay attacks. An attacker could reuse a valid receipt to bypass policy checks.
    • Recommendation: Introduce a mechanism to track and reject previously seen receipts. This could involve maintaining a cache of receipt hashes or timestamps, ensuring that each receipt is only used once.
  3. Potential Denial of Service (DoS) via Unbounded History Growth:

    • Both CedarPolicyBridge and ReceiptVerifier maintain in-memory logs (_history and _verified) with a maximum size of 10,000 entries. While there is a mechanism to truncate the logs, this approach may still be vulnerable to DoS attacks if an attacker floods the system with requests.
    • Recommendation: Add rate-limiting mechanisms to prevent abuse. Additionally, consider using a more robust storage mechanism (e.g., a database or a time-based eviction cache) for logs to handle larger volumes of data without risking memory exhaustion.
  4. Lack of Cryptographic Key Validation:

    • The validate_structure_only method in ReceiptVerifier does not validate the format or authenticity of the publicKey field in the receipt. This could allow an attacker to inject malformed or malicious keys.
    • Recommendation: Add validation for the publicKey field to ensure it adheres to the expected Ed25519 key format.
  5. Potential Timing Attacks:

    • The validate_structure_only method performs string comparisons (e.g., if not sig or not pk) without using constant-time comparison functions. This could make the system vulnerable to timing attacks.
    • Recommendation: Use constant-time comparison functions (e.g., hmac.compare_digest) for sensitive data like signatures and public keys.

🟡 WARNING: Potential Breaking Changes

  1. Cedar Deny as Authoritative:

    • The CedarPolicyBridge enforces Cedar deny decisions as authoritative, overriding any AGT trust scores. This introduces a hard constraint that could break existing workflows relying solely on trust scores.
    • Recommendation: Clearly document this behavior in the release notes and provide a migration guide for users to adapt their policies and workflows.
  2. Receipt Requirement:

    • The require_receipt flag in CedarPolicyBridge enforces the presence of a receipt for every evaluation. If enabled, this could break existing integrations that do not provide receipts.
    • Recommendation: Ensure this flag is disabled by default to maintain backward compatibility. Clearly document the implications of enabling this flag.

💡 Suggestions for Improvement

  1. Thread Safety:

    • The use of threading.Lock in CedarPolicyBridge and ReceiptVerifier ensures thread safety for the in-memory logs. However, consider using collections.deque with a maxlen parameter for automatic eviction of old entries, which would simplify the implementation and reduce the risk of errors.
  2. Type Annotations:

    • While the code includes some type annotations, they are not comprehensive. For example, the evaluate method in CedarPolicyBridge could benefit from more detailed type hints for the receipt parameter (e.g., Optional[Dict[str, Any]]).
    • Recommendation: Add type annotations for all methods and parameters to improve code clarity and leverage static type checking tools like mypy.
  3. Pydantic Models for Validation:

    • The CedarDecision and ReceiptVerifier classes manually validate data structures. Using Pydantic models would simplify validation, ensure type safety, and provide better error messages.
    • Recommendation: Refactor CedarDecision and ReceiptVerifier to use Pydantic models for data validation and parsing.
  4. Test Coverage:

    • While the PR mentions 36 tests, it is unclear if they cover all edge cases, especially for security-critical components like ReceiptVerifier and CedarPolicyBridge.
    • Recommendation: Add tests for edge cases, such as malformed receipts, missing fields, invalid signatures, and replay attacks. Use tools like pytest-cov to measure test coverage and ensure critical paths are thoroughly tested.
  5. Documentation:

    • The documentation is thorough and well-written, but it could benefit from additional examples, especially for advanced use cases like integrating with custom AGT policies or handling complex spending authority scenarios.
    • Recommendation: Expand the documentation with more detailed examples and a troubleshooting section for common issues.
  6. Dependency Management:

    • The pyproject.toml file does not specify any dependencies for protect-mcp or @veritasacta/verify. This could lead to runtime errors if these dependencies are not installed.
    • Recommendation: Explicitly list all required dependencies in the pyproject.toml file, including version constraints to ensure compatibility.
  7. Performance Considerations:

    • The CedarPolicyBridge and ReceiptVerifier classes perform JSON serialization (json.dumps) and SHA-256 hashing for receipt references. While this is generally efficient, it could become a bottleneck under high load.
    • Recommendation: Benchmark the performance of these operations under expected load conditions and optimize if necessary.
  8. Error Handling:

    • The evaluate method in CedarPolicyBridge and the validate_structure_only method in ReceiptVerifier return error messages as strings in the reason field. This approach may not be sufficient for debugging complex issues.
    • Recommendation: Use structured error objects or enums to provide more detailed and actionable error information.

Summary

This PR introduces a robust integration with ScopeBlind's protect-mcp, adding Cedar policy enforcement and verifiable decision receipts to AGT. While the implementation is well-structured and aligns with the project's goals, there are critical security concerns and potential breaking changes that need to be addressed before merging. Additionally, there are opportunities to improve type safety, test coverage, and documentation.

Action Items:

  1. Address the 🔴 CRITICAL security issues, especially around receipt verification and replay protection.
  2. Mitigate 🟡 WARNING breaking changes by providing clear documentation and ensuring backward compatibility.
  3. Implement the 💡 SUGGESTIONS to improve code quality, maintainability, and performance.

Let me know if you need further clarification or assistance!

@tomjwxf
Copy link
Copy Markdown
Contributor Author

tomjwxf commented Apr 3, 2026

@imran-siddique - thanks for the review. All items addressed in 6e538d4:

  1. Broken upstream links - fixed case: ScopeBlind/scopeblind-gateway (GitHub is case-sensitive in the UI but redirects correctly; I updated to match the canonical casing).

  2. Empty signature passes as valid:True - fixed. validate_structure_only now returns valid: False with reason "Empty signature or publicKey" when either field is an empty string. Test updated to match.

  3. Unbounded list growth - all three classes (CedarPolicyBridge, ReceiptVerifier, SpendingGate) now have MAX_LOG = 10000 with automatic truncation via _record() helper methods.

  4. External company as sole author - added "Tom Farley" as individual author alongside "ScopeBlind Pty Ltd" in pyproject.toml.

  5. Stale version - updated from 0.4.6 to 0.5.2 in scopeblind_context().

56 tests passing.

@tomjwxf
Copy link
Copy Markdown
Contributor Author

tomjwxf commented Apr 3, 2026

P.S. on the bot reviewer's recurring flag about "receipt verification delegation" -- this is by design, not a gap. The verification is deliberately external because that is what makes the receipts independently verifiable. If I embedded Ed25519 verification inside this adapter, the receipts would only be verifiable through this adapter. By delegating to @veritasacta/verify (Apache-2.0, zero ScopeBlind dependency), anyone can verify receipts offline with any Ed25519 implementation. The trust model is: verify the math, not the vendor. That said, adding an optional pynacl fallback for convenience (not security) is on my roadmap.

- Public key format validation: rejects keys that aren't valid Ed25519
  (64 hex chars or base64url). Catches malformed/injected keys early.
- Replay protection: bounded OrderedDict tracks seen receipt hashes.
  Same receipt submitted twice returns valid:False with replay:True.
  Configurable: replay_protection=True (default), max_seen_receipts=50000.
  Evicts oldest entries when window is full.
- Both features are on by default, configurable per-instance.
- 8 new tests (64 total, all passing): key format validation (too short,
  non-hex, valid hex, valid base64url), replay detection, replay disabled,
  bounded window eviction, different receipts not flagged as replay.

v0.1.1
@tomjwxf
Copy link
Copy Markdown
Contributor Author

tomjwxf commented Apr 3, 2026

Two improvements in 42c280a (v0.1.1):

  1. Ed25519 public key format validation -- validate_structure_only now rejects keys that aren't valid Ed25519 format (64 hex chars or base64url). Catches malformed or injected keys before any downstream processing.

  2. Replay protection -- bounded OrderedDict tracks seen receipt hashes. Same receipt submitted twice returns valid: False with replay: True. Window of 50,000 entries with oldest-first eviction. Configurable: ReceiptVerifier(replay_protection=True) (default on).

64 tests passing (8 new).

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This PR introduces a new integration package, scopeblind-protect-mcp, which bridges the ScopeBlind protect-mcp library with the AgentMesh framework. The integration adds runtime Cedar policy enforcement and cryptographically verifiable decision receipts to AgentMesh. The PR includes new components such as CedarPolicyBridge, ReceiptVerifier, SpendingGate, and scopeblind_context() for policy evaluation, receipt validation, and spending authority enforcement.

The PR is well-documented, with detailed explanations of the architecture, design principles, and usage examples. The code appears to be well-structured and adheres to Python best practices. However, there are some areas that require attention, especially regarding security, type safety, and potential backward compatibility issues.


🔴 CRITICAL: Security Issues

  1. Replay Protection for Receipts

    • The ReceiptVerifier class implements replay protection using an in-memory OrderedDict to track seen receipt hashes. However, this approach has limitations:
      • Memory Exhaustion: If the max_seen_receipts limit is reached, the oldest entries are evicted, potentially allowing replay attacks for those evicted receipts.
      • Persistence: The replay protection state is not persisted across process restarts, making it ineffective in distributed or long-running systems.
    • Recommendation: Use a persistent storage mechanism (e.g., Redis, database) for tracking seen receipts. This ensures replay protection is effective even in distributed environments or after restarts.
  2. Receipt Signature Verification

    • The ReceiptVerifier class does not perform cryptographic verification of Ed25519 signatures. While the PR mentions that this is delegated to @veritasacta/verify or the protect-mcp runtime, there is no enforcement or validation of this delegation.
    • Recommendation: Add a mechanism to ensure that cryptographic verification is always performed, either by integrating a Python Ed25519 library or by explicitly verifying that the protect-mcp runtime has performed the verification.
  3. Thread Safety

    • The CedarPolicyBridge and ReceiptVerifier classes use threading.Lock for thread safety. However, the use of locks can lead to potential deadlocks or performance bottlenecks if not handled carefully.
    • Recommendation: Consider using thread-safe data structures like queue.Queue or collections.deque for managing history and seen receipts. Alternatively, ensure that lock acquisition and release are properly managed to avoid deadlocks.
  4. Denial of Service (DoS) Risk

    • The validate_structure_only method in ReceiptVerifier processes untrusted input (e.g., receipts) and performs operations like JSON serialization and SHA-256 hashing. This could be exploited for DoS attacks by sending large or malformed receipts.
    • Recommendation: Add input size limits and validation checks to prevent processing excessively large or malformed receipts.

🟡 WARNING: Potential Breaking Changes

  1. Cedar Deny as Authoritative

    • The CedarPolicyBridge enforces Cedar deny decisions as authoritative, overriding any AGT trust scores. This behavior may conflict with existing AGT policies or workflows that rely solely on trust scores.
    • Recommendation: Clearly document this behavior and provide a configuration option to disable it if needed. This ensures backward compatibility for existing users.
  2. Receipt Requirement

    • The require_receipt flag in CedarPolicyBridge enforces the presence of a receipt for policy evaluation. Enabling this flag without proper preparation could break existing workflows.
    • Recommendation: Set require_receipt to False by default to avoid breaking changes. Clearly document the implications of enabling this flag.

💡 Suggestions for Improvement

  1. Type Safety

    • The code lacks type annotations for some method arguments and return values (e.g., CedarPolicyBridge.evaluate, CedarDecision.from_receipt, etc.).
    • Recommendation: Add type annotations to all public methods and classes to improve type safety and compatibility with static type checkers like mypy.
  2. Pydantic Model Validation

    • The CedarDecision and receipt structures are currently validated using custom logic. Using pydantic models would provide more robust validation and better error handling.
    • Recommendation: Replace custom validation logic with pydantic models for CedarDecision and receipt structures.
  3. Testing Coverage

    • While the PR mentions 36 tests, it is unclear if all edge cases are covered, especially for security-critical components like receipt validation and Cedar policy enforcement.
    • Recommendation: Add tests for edge cases, such as:
      • Invalid or malformed receipts.
      • Replay attacks with receipts.
      • Boundary conditions for trust scores and spending limits.
  4. Documentation

    • The documentation is comprehensive, but it could benefit from additional details on:
      • How to configure and deploy the integration in a production environment.
      • Examples of Cedar policies and their expected behavior in AGT.
    • Recommendation: Expand the documentation to include these details.
  5. Performance Optimization

    • The CedarPolicyBridge and ReceiptVerifier classes maintain in-memory logs and history, which could grow large over time.
    • Recommendation: Periodically persist logs to disk or a database and clear in-memory data to reduce memory usage.
  6. Dependency Management

    • The pyproject.toml file does not list protect-mcp as a dependency, even though it is a core part of this integration.
    • Recommendation: Add protect-mcp as a dependency in pyproject.toml to ensure it is installed automatically.

Summary of Actionable Feedback

Critical

  • 🔴 Implement persistent replay protection for receipts.
  • 🔴 Ensure cryptographic verification of Ed25519 signatures is always performed.
  • 🔴 Review and improve thread safety in CedarPolicyBridge and ReceiptVerifier.
  • 🔴 Add input size limits and validation to prevent DoS attacks.

Warnings

  • 🟡 Document and provide configuration options for Cedar deny being authoritative.
  • 🟡 Set require_receipt to False by default to avoid breaking changes.

Suggestions

  • 💡 Add type annotations to all public methods and classes.
  • 💡 Use pydantic models for CedarDecision and receipt validation.
  • 💡 Expand test coverage for edge cases and security-critical components.
  • 💡 Enhance documentation with deployment and policy examples.
  • 💡 Optimize memory usage by persisting logs and clearing in-memory data.
  • 💡 Add protect-mcp as a dependency in pyproject.toml.

Please address the critical issues and consider the warnings and suggestions before merging this PR. Let me know if you need further clarification or assistance!

Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v0.1.1 improvements look good: key validation + replay protection. Approving.

@imran-siddique imran-siddique merged commit a30db62 into microsoft:main Apr 3, 2026
5 of 7 checks passed
@tomjwxf
Copy link
Copy Markdown
Contributor Author

tomjwxf commented Apr 3, 2026

Thanks @imran-siddique for the thorough review and merge. Looking forward to contributing more to AGT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants