Skip to content

feat: add graceful degradation, budget policies, and audit logger (#410, #409, #400)#414

Merged
imran-siddique merged 3 commits intomicrosoft:mainfrom
imran-siddique:main
Mar 24, 2026
Merged

feat: add graceful degradation, budget policies, and audit logger (#410, #409, #400)#414
imran-siddique merged 3 commits intomicrosoft:mainfrom
imran-siddique:main

Conversation

@imran-siddique
Copy link
Member

3 new features with 16 tests:

#410 — Graceful degradation (\�gent_os.compat):
NoOp fallbacks so consumers can optionally depend on toolkit without try/except boilerplate.

#409 — BudgetPolicy (\�gent_os.policies.budget):
Token/cost/tool-call limits with BudgetTracker for utilization tracking.

#400 — GovernanceAuditLogger (\�gent_os.audit_logger):
Pluggable audit backends (JSONL file, in-memory, logging). Consolidates duplicate audit patterns.

4 files, +419 lines, 16 tests passing.

imran-siddique and others added 3 commits March 24, 2026 09:32
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mandatory review rules before merging any PR:
- Read actual diff (CI green is not sufficient)
- Dependency confusion scan on all install commands
- Verify __init__.py for new modules
- Verify dependencies declared in pyproject.toml
- No hardcoded secrets or plaintext config in pipelines
- Verify PR has actual changes (additions > 0)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…crosoft#410, microsoft#409, microsoft#400)

- agent_os.compat: NoOp fallbacks for optional toolkit dependency
- agent_os.policies.budget: BudgetPolicy + BudgetTracker for token/cost/tool limits
- agent_os.audit_logger: GovernanceAuditLogger with pluggable backends

16 tests passing. Closes microsoft#410, microsoft#409, microsoft#400.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file tests ci/cd CI/CD and workflows size/L Large PR (< 500 lines) labels Mar 24, 2026
@github-actions
Copy link

🤖 AI Agent: breaking-change-detector — Summary

🔍 API Compatibility Report

Summary

This pull request introduces three new features: graceful degradation, budget policies, and an audit logger. The changes are primarily additive, with no evidence of breaking changes or modifications to existing APIs. All new functionality is encapsulated in new modules or classes, and existing functionality remains untouched.

Findings

Severity Package Change Impact
🔵 agent-os Added GovernanceAuditLogger class New API, not breaking
🔵 agent-os Added BudgetPolicy and BudgetTracker New API, not breaking
🔵 agent-os Added NoOpPolicyEvaluator and NoOpGovernanceMiddleware New API, not breaking

Migration Guide

No migration steps are required as no breaking changes were introduced.

Additional Notes

  • The new features are well-documented and include comprehensive tests.
  • The pyproject.toml file was updated to include pydantic>=2.4.0 as a dependency, which is correctly declared.
  • The changes adhere to the repository's contribution guidelines, including licensing and code style.

No breaking changes detected.

@github-actions
Copy link

🤖 AI Agent: security-scanner — Security Analysis of PR Changes

Security Analysis of PR Changes

1. Prompt Injection Defense Bypass

No vulnerabilities detected. The changes do not involve any user input directly influencing the generation of prompts or bypassing policy guards. The NoOpPolicyEvaluator is a potential concern since it allows all actions, but it is explicitly designed for graceful degradation when the toolkit is unavailable. The get_evaluator function ensures that the real PolicyEvaluator is used when available, mitigating risks.

Rating: 🔵 LOW
Recommendation: Add explicit warnings in the documentation about the risks of using NoOpPolicyEvaluator in production environments.


2. Policy Engine Circumvention

The NoOpPolicyEvaluator and NoOpGovernanceMiddleware allow all actions and bypass governance policies. While this is intentional for graceful degradation, it could be exploited if an attacker disables the toolkit dependency or manipulates the environment to simulate its absence.

Rating: 🟠 HIGH
Attack Vector: An attacker could intentionally remove or disable the agent-os-kernel dependency, causing the system to fall back to the no-op implementations, effectively bypassing all policy enforcement.
Recommendation:

  • Log a critical warning when falling back to no-op implementations.
  • Add runtime checks to detect intentional tampering with dependencies.
  • Provide a configuration option to disable no-op fallbacks in production environments.

3. Trust Chain Weaknesses

No issues detected. The changes do not involve SPIFFE/SVID validation, certificate pinning, or other trust chain mechanisms.

Rating: 🔵 LOW


4. Credential Exposure

No hardcoded secrets or sensitive information were found in the code. The JsonlFileBackend writes audit logs to a file, but it does not expose sensitive information inappropriately.

Rating: 🔵 LOW


5. Sandbox Escape

No evidence of sandbox escape vulnerabilities. The changes do not involve container or process isolation mechanisms.

Rating: 🔵 LOW


6. Deserialization Attacks

The AuditEntry class uses json.dumps and json.loads for serialization and deserialization. These are safe operations as long as the input is controlled and not directly exposed to untrusted sources.

Rating: 🔵 LOW
Recommendation: Ensure that all inputs to AuditEntry are sanitized and validated before being serialized or deserialized.


7. Race Conditions

No time-of-check-to-time-of-use (TOCTOU) vulnerabilities were identified. The BudgetTracker class uses simple counters and checks, which are not thread-safe. If used in a multithreaded environment, this could lead to race conditions.

Rating: 🟡 MEDIUM
Attack Vector: Concurrent access to BudgetTracker methods like record_tokens or record_tool_call could lead to inconsistent state, potentially allowing resource limits to be bypassed.
Recommendation: Use thread-safe mechanisms like locks or atomic operations to ensure thread safety in BudgetTracker.


8. Supply Chain

The addition of the pydantic dependency is noted. pydantic is a well-known library, but its version should be pinned to avoid dependency confusion or supply chain attacks.

Rating: 🟠 HIGH
Attack Vector: If a malicious actor uploads a compromised version of pydantic to PyPI with a higher version number, it could be inadvertently installed.
Recommendation: Pin the pydantic dependency to a specific version or range in pyproject.toml (e.g., pydantic>=2.4.0,<2.5.0). Additionally, consider using a dependency scanner to monitor for vulnerabilities in third-party libraries.


Summary of Findings

  1. 🟠 Policy Engine Circumvention: Potential for abuse of NoOpPolicyEvaluator and NoOpGovernanceMiddleware to bypass policies.
  2. 🟡 Race Conditions: BudgetTracker is not thread-safe, which could lead to policy circumvention in multithreaded environments.
  3. 🟠 Supply Chain: pydantic dependency is not version-pinned, exposing the project to potential dependency confusion or malicious updates.

Recommendations

  1. Add critical warnings when falling back to no-op implementations and provide a configuration option to disable them in production.
  2. Implement thread-safety mechanisms in BudgetTracker to prevent race conditions.
  3. Pin the pydantic dependency to a specific version range in pyproject.toml.
  4. Consider adding runtime checks to detect tampering with dependencies.

These changes are critical to ensure the integrity and security of the agent-governance-toolkit, given its role as a security layer for downstream users.

@github-actions
Copy link

🤖 AI Agent: test-generator — `audit_logger.py`

🧪 Test Coverage Analysis

audit_logger.py

  • ✅ Existing coverage: The tests cover the AuditEntry class's to_json method, the InMemoryBackend write functionality, and the GovernanceAuditLogger's ability to log decisions to multiple backends.
  • ❌ Missing coverage: There is no coverage for edge cases such as handling of malformed AuditEntry data, concurrent writes to the backend, or testing the flush method's behavior.
  • 💡 Suggested test cases:
    1. test_audit_entry_invalid_data — Test how AuditEntry handles invalid data types for its fields.
    2. test_in_memory_backend_concurrent_writes — Simulate concurrent writes to the InMemoryBackend to check for race conditions.
    3. test_jsonl_file_backend_flush_behavior — Ensure that the flush method correctly handles file operations and does not lose data.

compat.py

  • ✅ Existing coverage: The tests verify that the NoOpPolicyEvaluator allows all actions and that the NoOpGovernanceMiddleware passes through calls without modification.
  • ❌ Missing coverage: There is no test for the behavior when the actual PolicyEvaluator is available, nor is there coverage for the fallback mechanism when the toolkit is not installed.
  • 💡 Suggested test cases:
    1. test_get_evaluator_fallback — Ensure that get_evaluator returns a NoOpPolicyEvaluator when the toolkit is not available.
    2. test_get_evaluator_real_evaluator — Test that get_evaluator returns the real PolicyEvaluator when the toolkit is available.
    3. test_no_op_policy_evaluator_behavior — Validate that the NoOpPolicyEvaluator correctly handles edge cases, such as when called with unexpected parameters.

budget.py

  • ✅ Existing coverage: The tests cover the basic functionality of BudgetPolicy and BudgetTracker, including limits for tokens, tool calls, costs, and duration.
  • ❌ Missing coverage: There is no coverage for edge cases such as exceeding multiple limits simultaneously, handling of negative values, or the behavior of the remaining and utilization methods when limits are not set.
  • 💡 Suggested test cases:
    1. test_tracker_exceed_multiple_limits — Test the BudgetTracker when exceeding both token and tool call limits at the same time.
    2. test_tracker_negative_values — Validate how BudgetTracker handles negative values for tokens, costs, and tool calls.
    3. test_tracker_remaining_with_no_limits — Ensure that the remaining method behaves correctly when no limits are set in the BudgetPolicy.

@github-actions
Copy link

🤖 AI Agent: docs-sync-checker — Issues Found

📝 Documentation Sync Report

Issues Found

  • JsonlFileBackend.__init__() in agent_os/audit_logger.py — missing docstring
  • InMemoryBackend.__init__() in agent_os/audit_logger.py — missing docstring
  • LoggingBackend.__init__() in agent_os/audit_logger.py — missing docstring
  • GovernanceAuditLogger.__init__() in agent_os/audit_logger.py — missing docstring
  • GovernanceAuditLogger.add_backend() in agent_os/audit_logger.py — missing docstring
  • GovernanceAuditLogger.log() in agent_os/audit_logger.py — missing docstring
  • GovernanceAuditLogger.log_decision() in agent_os/audit_logger.py — missing docstring
  • GovernanceAuditLogger.flush() in agent_os/audit_logger.py — missing docstring
  • NoOpPolicyEvaluator.__init__() in agent_os/compat.py — missing docstring
  • NoOpPolicyEvaluator.evaluate() in agent_os/compat.py — missing docstring
  • NoOpPolicyEvaluator.load_policies() in agent_os/compat.py — missing docstring
  • NoOpPolicyEvaluator.add_backend() in agent_os/compat.py — missing docstring
  • NoOpGovernanceMiddleware.__init__() in agent_os/compat.py — missing docstring
  • NoOpGovernanceMiddleware.__call__() in agent_os/compat.py — missing docstring
  • NoOpGovernanceMiddleware.wrap() in agent_os/compat.py — missing docstring
  • get_evaluator() in agent_os/compat.py — missing docstring
  • BudgetPolicy class in agent_os/policies/budget.py — missing class-level docstring
  • BudgetTracker.record_tokens() in agent_os/policies/budget.py — missing docstring
  • BudgetTracker.record_tool_call() in agent_os/policies/budget.py — missing docstring
  • BudgetTracker.record_cost() in agent_os/policies/budget.py — missing docstring
  • BudgetTracker.record_duration() in agent_os/policies/budget.py — missing docstring
  • BudgetTracker.is_exceeded() in agent_os/policies/budget.py — missing docstring
  • BudgetTracker.exceeded_reasons() in agent_os/policies/budget.py — missing docstring
  • BudgetTracker.remaining() in agent_os/policies/budget.py — missing docstring
  • BudgetTracker.utilization() in agent_os/policies/budget.py — missing docstring
  • ⚠️ packages/agentmesh-integrations/openclaw-skill/README.md — outdated pip install command; should reflect the new package name agent-governance-toolkit.
  • ⚠️ CHANGELOG.md — no entry for the new features: graceful degradation, budget policies, and audit logger.

Suggestions

  • 💡 Add docstrings for all the missing public methods and classes listed above. For example:
    • JsonlFileBackend.__init__(path: str | Path) -> None: Add a docstring explaining the purpose of the path parameter and what the constructor does.
    • BudgetTracker.record_tokens(count: int) -> None: Add a docstring explaining what count represents and how it updates the tracker.
  • 💡 Update the README.md in packages/agentmesh-integrations/openclaw-skill/ to reflect the correct pip install command for the agent-governance-toolkit package.
  • 💡 Add a new entry in CHANGELOG.md summarizing the new features introduced in this PR:
    • Graceful degradation helpers (compat.py)
    • Budget policies and tracking (policies/budget.py)
    • Governance audit logger with pluggable backends (audit_logger.py)

Type Hints

✅ All new public APIs have complete type annotations.

Example Code

✅ Example code in tests/test_new_features.py is consistent with the new API signatures.

Final Assessment

The documentation is not in sync. Please address the missing docstrings, update the README, and add a CHANGELOG entry before merging.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This pull request introduces three significant features: graceful degradation, budget policies, and audit logging. The changes are well-structured, and the provided tests cover a variety of scenarios. However, there are some areas of concern, particularly around security, type safety, and potential breaking changes. Below is a detailed review with actionable feedback.


🔴 CRITICAL: Security Issues

  1. Audit Log File Handling (audit_logger.py):

    • The JsonlFileBackend opens the file in append mode without any file locking mechanism. This could lead to race conditions and data corruption in concurrent environments.
      • Action: Use a file lock (e.g., filelock library) to ensure safe concurrent writes to the log file.
  2. Audit Log Metadata Handling (audit_logger.py):

    • The metadata field in AuditEntry is a free-form dictionary. If this data is user-controlled, it could lead to injection vulnerabilities (e.g., if logs are later parsed or displayed in unsafe ways).
      • Action: Sanitize or validate the metadata dictionary before logging it.
  3. Graceful Degradation (compat.py):

    • The NoOpPolicyEvaluator allows all actions by default. While this is expected behavior for graceful degradation, it could lead to unintended security bypasses if the toolkit is not installed or improperly configured.
      • Action: Emit a warning or error in critical logs when falling back to NoOpPolicyEvaluator.
  4. Audit Log Backend Protocol (audit_logger.py):

    • The AuditBackend protocol does not enforce a close() method, which could lead to resource leaks (e.g., file handles not being closed).
      • Action: Add a close() method to the AuditBackend protocol and ensure all backends implement it.

🟡 WARNING: Potential Breaking Changes

  1. Dependency Addition:

    • The addition of pydantic>=2.4.0 in pyproject.toml introduces a new dependency. While this is not inherently breaking, it could cause issues for downstream consumers if they rely on an older version of Pydantic.
      • Action: Clearly document this dependency addition in the release notes.
  2. Audit Logger API:

    • The GovernanceAuditLogger class introduces a new API for audit logging. If existing users have their own audit logging mechanisms, this could lead to conflicts or require migration.
      • Action: Ensure backward compatibility by providing a migration guide or maintaining support for older mechanisms.

💡 Suggestions for Improvement

  1. Thread Safety:

    • The GovernanceAuditLogger and its backends are not explicitly thread-safe. In multi-threaded environments, concurrent writes to the same backend (e.g., InMemoryBackend or JsonlFileBackend) could cause issues.
      • Action: Use thread-safe data structures (e.g., threading.Lock) to protect shared resources.
  2. Type Safety:

    • The metadata field in AuditEntry is a free-form dictionary, which could lead to type-related issues.
      • Action: Use Pydantic models for stricter validation of metadata.
  3. Graceful Degradation Logging:

    • The NoOpGovernanceMiddleware and NoOpPolicyEvaluator log debug messages when initialized. These logs might not be visible in production environments.
      • Action: Consider logging a warning or error in production environments when falling back to no-op implementations.
  4. BudgetTracker Utilization:

    • The utilization() method in BudgetTracker returns None for unconfigured limits. This could lead to confusion when interpreting the results.
      • Action: Return 0.0 for unconfigured limits to indicate no utilization.
  5. Test Coverage:

    • While the tests cover a wide range of scenarios, there are no tests for concurrent usage of the GovernanceAuditLogger and its backends.
      • Action: Add tests to simulate concurrent writes to the audit logger.
  6. Documentation:

    • The new features are not documented in the repository's main README or other user-facing documentation.
      • Action: Update the documentation to include usage examples for the new features.

Additional Observations

  1. JSONL File Backend:

    • The JsonlFileBackend does not handle file rotation or size limits, which could lead to unbounded log file growth.
      • Action: Add support for file rotation or document this limitation.
  2. Default Values in BudgetPolicy:

    • The BudgetPolicy class uses None as the default for all limits. This is fine, but it might be helpful to provide a method to check if a policy is "empty" (i.e., all limits are None).
      • Action: Add an is_empty() method to BudgetPolicy.
  3. AuditEntry Timestamp:

    • The timestamp field in AuditEntry uses datetime.now(timezone.utc). While this is generally fine, it might be better to use datetime.utcnow() for consistency.
      • Action: Replace datetime.now(timezone.utc) with datetime.utcnow().

Conclusion

The PR introduces valuable features, but there are critical security issues and areas for improvement. Addressing the identified issues will enhance the robustness, security, and usability of the new features.

@imran-siddique imran-siddique merged commit f3d0824 into microsoft:main Mar 24, 2026
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/cd CI/CD and workflows dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation size/L Large PR (< 500 lines) tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant