docs: add Google Colab notebooks for zero-friction trial#591
docs: add Google Colab notebooks for zero-friction trial#591imran-siddique merged 4 commits intomicrosoft:mainfrom
Conversation
|
Welcome to the Agent Governance Toolkit! Thanks for your first pull request. |
|
@microsoft-github-policy-service agree |
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review Summary
This pull request introduces three Google Colab notebooks to provide an interactive, zero-friction trial experience for the microsoft/agent-governance-toolkit. The notebooks cover policy enforcement, MCP security proxy, and multi-agent governance. While the notebooks are well-structured and provide clear instructions, there are several areas that require attention to ensure security, correctness, and maintainability.
🔴 CRITICAL
-
Insufficient Input Validation in MCP Security Scanner
- The
scan_tool_definitionfunction uses regular expressions to detect malicious patterns in tool descriptions. However, the regex patterns are not comprehensive and may lead to false negatives, allowing malicious tool definitions to bypass the trust gate. - Recommendation:
- Expand the detection patterns to cover additional attack vectors, such as obfuscated commands or encoded payloads.
- Consider integrating a more robust static analysis tool or library for detecting malicious patterns.
- The
-
Potential for Sandbox Escape in MCP Tool Definitions
- The
scan_tool_definitionfunction does not account for encoded or obfuscated payloads (e.g., Base64-encoded commands). This could allow attackers to bypass the trust gate by encoding malicious commands. - Recommendation: Add decoding and normalization steps to the scanner to handle obfuscated payloads. For example, decode Base64 strings and scan the decoded content.
- The
-
Improper Handling of Circuit Breaker State
- The
check_circuit_breakerfunction opens the circuit if the error rate or latency exceeds the SLO thresholds. However, there is no mechanism to reset the circuit after the issue is resolved, potentially leading to a denial of service. - Recommendation: Implement a circuit breaker reset mechanism, such as a cooldown period or a sliding window for error rate and latency calculations.
- The
-
Thread Safety Concerns
- The
AgentMetricsclass is not thread-safe, and concurrent updates to metrics (e.g.,total_calls,errors,total_latency_ms) could lead to race conditions in multi-threaded environments. - Recommendation: Use thread-safe data structures (e.g.,
threading.Lock) or atomic operations to ensure thread safety.
- The
🟡 WARNING
-
Backward Compatibility Risk
- The notebooks introduce new functionality and examples but do not explicitly test backward compatibility with existing APIs or configurations.
- Recommendation: Add automated tests to ensure that the new notebooks do not introduce breaking changes to the existing API or functionality.
-
Hardcoded Detection Patterns
- The detection patterns in the
scan_tool_definitionfunction are hardcoded, making it difficult to update or extend them without modifying the code. - Recommendation: Externalize the detection patterns into a configuration file (e.g., YAML or JSON) and load them dynamically. This will make it easier to update patterns without code changes.
- The detection patterns in the
💡 SUGGESTIONS
-
Improve Documentation
- While the notebooks are well-documented, consider adding more context about the scenarios being simulated. For example, explain the significance of MCP tool poisoning and how it relates to real-world threats.
- Include a section on how users can customize the detection patterns and SLOs to suit their specific use cases.
-
Add Unit Tests for Notebooks
- The functionality demonstrated in the notebooks (e.g.,
scan_tool_definition,check_circuit_breaker) should be covered by unit tests to ensure correctness and prevent regressions. - Recommendation: Extract reusable functions from the notebooks into a separate module and write unit tests for them using
pytest.
- The functionality demonstrated in the notebooks (e.g.,
-
Enhance Visualization
- The audit trail and health dashboard outputs are text-based, which may not be user-friendly for all users.
- Recommendation: Use visualization libraries like
matplotliborplotlyto create graphical representations of the audit trail and agent health metrics.
-
Optimize Performance
- The
scan_tool_definitionfunction iterates over all patterns for each category, which could become a bottleneck with a large number of patterns or tool definitions. - Recommendation: Optimize the pattern matching logic, possibly by compiling regex patterns in advance or using a more efficient matching algorithm.
- The
-
Add Security Tests
- The notebooks introduce security-critical functionality (e.g., MCP security scanner, trust gate, circuit breaker). However, there are no tests to validate their effectiveness against known attack vectors.
- Recommendation: Create a suite of security tests to validate the detection patterns and circuit breaker logic against a variety of malicious inputs.
Additional Notes
- The addition of the
Open in Colabbadges is a great touch for improving accessibility and user experience. - The notebooks are well-structured and provide clear step-by-step instructions, making them suitable for users who are new to the toolkit.
- The use of
dataclassforAgentSLOandAgentMetricsis a good practice for maintaining type safety and readability.
Action Items
- Address the CRITICAL issues related to input validation, sandbox escape, circuit breaker state, and thread safety.
- Mitigate the WARNING about backward compatibility and hardcoded detection patterns.
- Implement the SUGGESTIONS to improve documentation, testing, visualization, and performance.
Let me know if you need further clarification or assistance!
🤖 AI Agent: security-scanner — Security Review of Pull Request: Google Colab Notebooks for Zero-Friction TrialSecurity Review of Pull Request: Google Colab Notebooks for Zero-Friction TrialFindings:1. Prompt Injection Defense BypassSeverity: 🔴 CRITICAL
Attack Vector: Recommendation:
2. Policy Engine CircumventionSeverity: 🟠 HIGH Attack Vector: Recommendation:
3. Trust Chain WeaknessesSeverity: 🟡 MEDIUM Attack Vector: Recommendation:
4. Credential ExposureSeverity: 🔵 LOW Attack Vector: Recommendation:
5. Sandbox EscapeSeverity: 🔴 CRITICAL Attack Vector: Recommendation:
6. Deserialization AttacksSeverity: 🟠 HIGH Attack Vector: Recommendation:
7. Race ConditionsSeverity: 🟠 HIGH Attack Vector: Recommendation:
8. Supply Chain RisksSeverity: 🟡 MEDIUM Attack Vector: Recommendation:
Summary of Findings:
Final Recommendation:The notebooks provide valuable educational resources but introduce critical security risks that must be addressed before deployment. Prioritize fixes for prompt injection defenses, sandboxing, and race conditions to ensure the governance layer remains secure. |
|
@kanish5 could you please look at the critical issues pointed in the code review above, for each if they dont apply (as this is a playground notebook) please comment on these. |
imran-siddique
left a comment
There was a problem hiding this comment.
LGTM - all mandatory checks pass.
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review Summary
This pull request introduces three Google Colab notebooks designed to provide an interactive, zero-friction trial experience for the Agent Governance Toolkit. The notebooks cover policy enforcement, MCP security proxy, and multi-agent governance. While the notebooks are well-structured and provide valuable demonstrations, there are several areas that require attention to ensure security, correctness, and compliance with best practices.
🔴 CRITICAL
-
Potential Regular Expression Denial of Service (ReDoS) in MCP Security Proxy:
- The
DETECTION_PATTERNSin02_mcp_security_proxy.ipynbinclude regular expressions that could be exploited for ReDoS attacks. For example:These patterns contain nested quantifiers (r"ignore\s+(all\s+)?previous" r"disregard\s+(all\s+)?(above|prior|previous)"
\s+and(all\s+)?), which can lead to catastrophic backtracking if an attacker crafts malicious input. This could result in a denial-of-service vulnerability. - Action: Refactor the regex patterns to avoid nested quantifiers or use regex libraries that are resistant to ReDoS attacks, such as
regex(an alternative to Python'sremodule).
- The
-
Insufficient Validation of GovernancePolicy:
- In
01_policy_enforcement_101.ipynb, theGovernancePolicyobject is instantiated directly without validating theblocked_patternsor other fields. This could lead to runtime errors or unexpected behavior if invalid inputs are provided. - Action: Use Pydantic models to validate the
GovernancePolicyobject. For example:from pydantic import BaseModel, Field, validator from typing import List class GovernancePolicyModel(BaseModel): name: str blocked_patterns: List[str] require_human_approval: bool max_tool_calls: int = Field(ge=1) @validator('blocked_patterns', each_item=True) def validate_patterns(cls, pattern): try: re.compile(pattern) except re.error: raise ValueError(f"Invalid regex pattern: {pattern}") return pattern
- In
-
Hardcoded Detection Patterns:
- The detection patterns in
02_mcp_security_proxy.ipynbare hardcoded in the notebook. This makes it difficult to update or extend the patterns without modifying the notebook itself. - Action: Externalize the detection patterns into a configuration file (e.g.,
mcp-security.yaml) and load them dynamically in the notebook. This will also make it easier to manage and update the patterns in the future.
- The detection patterns in
-
Lack of Cryptographic Validation in MCP Security Proxy:
- The
02_mcp_security_proxy.ipynbnotebook does not include any cryptographic validation for the integrity or authenticity of the MCP tool definitions. This could allow an attacker to tamper with the tool definitions. - Action: Implement cryptographic signing and verification for MCP tool definitions using a library like
cryptography. Ensure that only signed and verified tool definitions are accepted.
- The
🟡 WARNING
-
Backward Compatibility Concerns:
- The notebooks introduce new functionality and examples, but they rely on existing APIs (e.g.,
GovernancePolicy,LangChainKernel,pre_execute, etc.). If these APIs are modified in the future, the notebooks may break. - Action: Add tests to ensure that the APIs used in the notebooks remain backward-compatible. Consider adding integration tests that run the notebooks as part of the CI/CD pipeline.
- The notebooks introduce new functionality and examples, but they rely on existing APIs (e.g.,
-
Thread Safety in Multi-Agent Governance:
- The
03_multi_agent_governance.ipynbnotebook simulates multiple agents but does not address potential thread safety issues in concurrent execution. For example, theAgentMetricsclass is not thread-safe. - Action: If the toolkit is intended to support concurrent execution, ensure that shared state (e.g.,
AgentMetrics) is thread-safe. Use thread-safe data structures or synchronization mechanisms likethreading.Lock.
- The
💡 SUGGESTIONS
-
Add Type Annotations:
- While the code in the notebooks is functional, it lacks type annotations in many places. Adding type annotations will improve code readability and help catch type-related issues during development.
- Example:
def scan_tool_definition(tool_name: str, description: str) -> dict:
-
Improve Documentation:
- The notebooks provide a good overview of the toolkit's capabilities, but they could benefit from additional context and explanations. For example:
- Explain the purpose of each step in more detail.
- Provide links to relevant sections of the toolkit's documentation.
- Include a glossary for terms like MCP, SLO, and circuit breaker.
- The notebooks provide a good overview of the toolkit's capabilities, but they could benefit from additional context and explanations. For example:
-
Add Unit Tests for Notebook Code:
- While the notebooks are designed for interactive use, the code within them should be tested to ensure correctness. Extract reusable functions into separate modules and write unit tests for them.
-
Use Logging Instead of Print Statements:
- The notebooks use
printstatements for output, which is fine for interactive use but not ideal for production code. Consider using theloggingmodule for better control over log levels and formatting.
- The notebooks use
-
Add Security Best Practices Section:
- Since this toolkit is security-focused, consider adding a section in the notebooks that highlights best practices for using the toolkit securely. For example:
- How to define secure governance policies.
- How to handle sensitive data in agent interactions.
- Since this toolkit is security-focused, consider adding a section in the notebooks that highlights best practices for using the toolkit securely. For example:
-
Notebook Execution Validation:
- Add a CI/CD step to validate that the notebooks execute without errors. This will ensure that the examples remain functional as the toolkit evolves.
Final Recommendation
- Address the 🔴 CRITICAL issues immediately, as they pose security risks.
- Resolve the 🟡 WARNING issues to ensure backward compatibility and thread safety.
- Consider implementing the 💡 SUGGESTIONS to improve code quality, usability, and maintainability.
Once these issues are resolved, the notebooks will provide a robust and secure demonstration of the Agent Governance Toolkit's capabilities.
Closes #255
Summary
Added 3 Google Colab notebooks for zero-friction trial experience:
notebooks/01_policy_enforcement_101.ipynb— define capabilities, evaluate actions, see violations blocked with audit trailnotebooks/02_mcp_security_proxy.ipynb— detect MCP tool poisoning patterns, trust gate simulationnotebooks/03_multi_agent_governance.ipynb— SLOs, circuit breakers, chaos testing across agent fleetDetails