fix(dotnet): fix GovernanceMetrics test isolation#417
fix(dotnet): fix GovernanceMetrics test isolation#417imran-siddique merged 5 commits intomicrosoft:mainfrom
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mandatory review rules before merging any PR: - Read actual diff (CI green is not sufficient) - Dependency confusion scan on all install commands - Verify __init__.py for new modules - Verify dependencies declared in pyproject.toml - No hardcoded secrets or plaintext config in pipelines - Verify PR has actual changes (additions > 0) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…crosoft#410, microsoft#409, microsoft#400) - agent_os.compat: NoOp fallbacks for optional toolkit dependency - agent_os.policies.budget: BudgetPolicy + BudgetTracker for token/cost/tool limits - agent_os.audit_logger: GovernanceAuditLogger with pluggable backends 16 tests passing. Closes microsoft#410, microsoft#409, microsoft#400. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…crosoft#398, microsoft#396, microsoft#395, microsoft#394) - event_bus.py: GovernanceEventBus with pub/sub for cross-gate composition - task_outcome.py: TaskOutcomeRecorder with severity scoring + recovery - diff_policy.py: DiffPolicy for git change scope enforcement - sandbox_provider.py: Pluggable SandboxProvider ABC + subprocess impl 22 tests passing. Closes microsoft#398, microsoft#396, microsoft#395, microsoft#394. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…fore baseline RecordDecision_DeniedIncrementsBlockedCounter failed with Expected 1, Actual 7 because the MeterListener captured counter increments from other test classes before the baseline was captured. Fix: call RecordObservableInstruments() before setting the baseline so all pre-existing measurements are flushed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🤖 AI Agent: breaking-change-detector — Summary🔍 API Compatibility ReportSummaryThis pull request introduces several new features and modules to the Findings
Migration GuideNo migration is necessary as no breaking changes were identified. However, downstream users should be aware of the new features and modules introduced, as they may require documentation updates or additional dependencies (e.g., Recommendations
✅ No breaking changes detected. |
🤖 AI Agent: docs-sync-checker — Issues Found📝 Documentation Sync ReportIssues Found
Suggestions
Additional Notes
ConclusionThe documentation is not fully in sync with the code changes. Please address the issues and suggestions listed above. |
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review Summary
This pull request introduces several new components to the agent-governance-toolkit repository, including audit logging, event bus, policy evaluation, budget tracking, and sandboxing functionality. While these additions are valuable, there are critical security concerns, potential breaking changes, and areas for improvement.
🔴 CRITICAL Issues
1. Sandbox Escape Vectors in SubprocessSandboxProvider
- The
SubprocessSandboxProviderimplementation lacks proper security isolation. It explicitly states that it does not provide container isolation, making it vulnerable to sandbox escape attacks. For example:- The subprocess inherits the environment variables, which could leak sensitive information.
- The subprocess runs with the same privileges as the parent process, enabling privilege escalation.
- No restrictions are placed on file system access, allowing the subprocess to modify or access sensitive files.
- Recommendation: Replace
SubprocessSandboxProviderwith a container-based sandbox (e.g., Docker or Firecracker) that enforces strict resource limits, file system isolation, and network restrictions.
2. Unvalidated Input in DiffPolicy
- The
DiffPolicyclass usesfnmatchfor path matching but does not validate inputs. Malicious paths (e.g.,../../etc/passwd) could bypass policy checks. - Recommendation: Sanitize and normalize file paths before evaluating them against
allowed_pathsandblocked_paths.
3. Potential Denial of Service in GovernanceEventBus
- The
GovernanceEventBusstores event history in memory (self._history). Without proper limits, this could lead to memory exhaustion if a large number of events are published. - Recommendation: Enforce stricter limits on
self._max_historyand implement a mechanism to drop old events when the limit is exceeded.
4. Missing Cryptographic Validation in AuditEntry
- The
AuditEntryclass does not include cryptographic integrity checks (e.g., HMAC or digital signatures). This could allow tampering with audit logs. - Recommendation: Add cryptographic signing to audit entries to ensure their integrity and authenticity.
5. Thread Safety in GovernanceEventBus
- The
GovernanceEventBusis not thread-safe. Concurrent access toself._handlersandself._historycould lead to race conditions. - Recommendation: Use thread-safe data structures (e.g.,
queue.Queue) or synchronization primitives (e.g.,threading.Lock) to ensure thread safety.
🟡 WARNING: Potential Breaking Changes
1. Dependency Addition (pydantic>=2.4.0)
- The addition of
pydantic>=2.4.0inpyproject.tomlintroduces a new dependency. If consumers of this library are using older versions of Pydantic, this could cause compatibility issues. - Recommendation: Clearly document the dependency change and test compatibility with older versions.
2. No-Op Fallbacks in agent_os/compat.py
- The fallback mechanism (
NoOpPolicyEvaluator,NoOpGovernanceMiddleware) changes behavior whenagent-os-kernelis not installed. This could lead to unexpected behavior for consumers relying on strict policy enforcement. - Recommendation: Log warnings prominently when falling back to no-op implementations and provide a mechanism for consumers to opt-in explicitly.
💡 Suggestions for Improvement
1. Audit Logging Enhancements
- Add support for structured logging formats (e.g., JSON) to facilitate integration with log aggregation tools.
- Include a configurable log rotation mechanism for the
JsonlFileBackendto prevent unbounded file growth.
2. Policy Engine Improvements
- Enhance
DiffPolicyto support regular expressions for more robust path matching. - Add support for policy chaining or composition to allow more complex policy definitions.
3. Budget Tracker
- Add support for real-time monitoring and alerts when budget limits are close to being exceeded.
- Include a mechanism to reset the tracker for long-running tasks.
4. Event Bus Wildcard Handling
- The wildcard handler in
GovernanceEventBus(self._handlers.get("*", [])) could lead to unintended behavior if misused. Add documentation and examples to clarify its use.
5. Testing
- Add unit tests for edge cases, such as:
- Sandbox execution with malicious commands.
- DiffPolicy evaluation with edge-case paths.
- GovernanceEventBus under high load or concurrent access.
6. Documentation
- Expand documentation for new classes and methods, including usage examples and security considerations.
- Add a security section to the README to highlight best practices for using sandboxing, audit logging, and policy evaluation.
Final Recommendations
- Address the 🔴 CRITICAL issues immediately, especially the sandbox escape vectors and thread safety concerns.
- Consider the 🟡 WARNING items to avoid breaking changes for consumers.
- Implement the 💡 SUGGESTIONS to improve functionality, usability, and security.
This pull request introduces valuable features but requires significant attention to security and backward compatibility before merging.
🤖 AI Agent: test-generator — `audit_logger.py`🧪 Test Coverage Analysis
|
🤖 AI Agent: security-scanner — Security Analysis of the Pull RequestSecurity Analysis of the Pull Request1. Prompt Injection Defense Bypass
2. Policy Engine Circumvention
3. Trust Chain Weaknesses
4. Credential Exposure
5. Sandbox Escape
6. Deserialization Attacks
7. Race Conditions
8. Supply Chain
Summary of Findings
Final RecommendationThe PR introduces critical security risks, particularly in the |
Fix \RecordDecision_DeniedIncrementsBlockedCounter\ test that failed with \Expected 1, Actual 7.
Root cause: MeterListener captured counter increments from other test classes before baseline was set. Fix: flush listener with \RecordObservableInstruments()\ before capturing baseline.
1 file, +2/-1