Skip to content

feat: add policy YAML validation command to CLI#665

Merged
imran-siddique merged 2 commits intomicrosoft:mainfrom
kanish5:feat/policy-yaml-validate
Apr 3, 2026
Merged

feat: add policy YAML validation command to CLI#665
imran-siddique merged 2 commits intomicrosoft:mainfrom
kanish5:feat/policy-yaml-validate

Conversation

@kanish5
Copy link
Copy Markdown
Contributor

@kanish5 kanish5 commented Apr 1, 2026

Closes #529

Summary

Added agentos policy validate <file> CLI command that:

  1. Parses the YAML file (reports syntax errors with line numbers from PyYAML)
  2. Validates against the policy JSON schema via jsonschema (best-effort)
  3. Runs Pydantic structural validation via existing PolicyDocument
  4. Reports errors with field locations (e.g. rules[2] -> action)
  5. Exits with non-zero code on failure (CI-friendly)

New commands

  • agentos policy validate <file> — validate a single policy file
  • agentos policy test <policy> <scenarios> — run scenario tests
  • agentos policy diff <file1> <file2> — compare two policies

Changes

  • packages/agent-os/src/agent_os/cli/__init__.py:
    • Enhanced cmd_validate with JSON Schema + field location error reporting
    • Added _validate_yaml_with_line_numbers() helper
    • Added _load_json_schema() helper
    • Added cmd_policy() dispatcher
    • Registered agentos policy subcommand in the CLI

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

Welcome to the Agent Governance Toolkit! Thanks for your first pull request.
Please ensure tests pass, code follows style (ruff check), and you have signed the CLA.
See our Contributing Guide.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

🤖 AI Agent: breaking-change-detector — Summary

🔍 API Compatibility Report

Summary

This pull request introduces new CLI commands for policy validation, testing, and comparison. It also enhances the existing cmd_validate function with additional validation capabilities. No breaking changes were detected in the public API of the agent-os package.

Findings

Severity Package Change Impact
🔵 agent-os Added agentos policy validate <file> New CLI command, additive
🔵 agent-os Added agentos policy test <policy> <scenarios> New CLI command, additive
🔵 agent-os Added agentos policy diff <file1> <file2> New CLI command, additive
🔵 agent-os Enhanced cmd_validate with JSON Schema validation and field location error reporting Additive
🔵 agent-os Added _validate_yaml_with_line_numbers() helper Internal helper, no impact
🔵 agent-os Added _load_json_schema() helper Internal helper, no impact
🔵 agent-os Added cmd_policy() dispatcher New function, additive

Migration Guide

No migration is required as no breaking changes were introduced.

Notes

  • The new CLI commands (policy validate, policy test, and policy diff) should be documented in the project's README or user guide.
  • The enhancements to cmd_validate improve error reporting and validation capabilities, which should be highlighted in the documentation.

No breaking changes detected.

@github-actions github-actions bot added the size/XL Extra large PR (500+ lines) label Apr 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

🤖 AI Agent: docs-sync-checker — Issues Found

📝 Documentation Sync Report

Issues Found

  • _load_json_schema() in agent_os/cli/__init__.py — missing docstring
  • _validate_yaml_with_line_numbers() in agent_os/cli/__init__.py — missing type hints for return values
  • ⚠️ packages/agent-os/README.md — no mention of the new policy subcommand and its subcommands (validate, test, diff)
  • ⚠️ CHANGELOG.md — no entry for the addition of the agentos policy subcommand and its features
  • ⚠️ examples/ — no updates to include examples of using the new policy subcommands

Suggestions

  • 💡 Add a docstring for _load_json_schema() to explain its purpose, parameters, and return value.
  • 💡 Add type hints for the return values of _validate_yaml_with_line_numbers(). For example, specify the types of the tuple elements (e.g., Tuple[List[str], List[str]]).
  • 💡 Update packages/agent-os/README.md to include a new section describing the policy subcommand and its usage, including examples for validate, test, and diff.
  • 💡 Add an entry to CHANGELOG.md summarizing the addition of the agentos policy subcommand and its features.
  • 💡 Add example usage of the policy subcommands (validate, test, diff) to the examples/ directory.

Additional Notes

  • The new cmd_policy() function and its subcommands (validate, test, diff) are well-documented with docstrings and type hints.
  • The cmd_validate() function has a detailed docstring and type hints, which is good.
  • The _validate_yaml_with_line_numbers() helper function has a detailed docstring but is missing type hints for its return values.
  • The _load_json_schema() helper function is missing a docstring entirely.

Action Items

  1. Add missing docstrings and type hints for _load_json_schema() and _validate_yaml_with_line_numbers().
  2. Update packages/agent-os/README.md to document the new policy subcommand and its subcommands.
  3. Add a new entry to CHANGELOG.md summarizing the addition of the agentos policy subcommand and its features.
  4. Add example usage of the policy subcommands to the examples/ directory.

Once these issues are addressed, the documentation will be in sync with the code changes.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This pull request introduces a new CLI command, agentos policy validate, which validates YAML policy files against a JSON schema and performs Pydantic validation. It also includes new interactive Jupyter notebooks for demonstrating policy enforcement, MCP security, and multi-agent governance. While the PR introduces valuable features, there are some critical security concerns, potential breaking changes, and areas for improvement.


🔴 CRITICAL

  1. Insufficient Validation of YAML Input

    • Issue: The _validate_yaml_with_line_numbers() function does not sanitize or validate the YAML input for potential malicious payloads. YAML files can include arbitrary Python objects, which could lead to code execution vulnerabilities.
    • Impact: This could allow an attacker to execute arbitrary code by injecting malicious payloads into the YAML file.
    • Recommendation: Use yaml.safe_load() instead of yaml.load() to prevent the execution of arbitrary code. If safe_load() is already being used, ensure that it is explicitly mentioned in the code or documentation.
  2. Potential Bypass of JSON Schema Validation

    • Issue: The JSON schema validation is described as "best-effort," which implies that it might not catch all invalid inputs. This could lead to security vulnerabilities if the policy engine relies on the validity of the schema for enforcement.
    • Impact: A malicious or malformed policy file could bypass validation and introduce vulnerabilities into the system.
    • Recommendation: Ensure that the JSON schema is comprehensive and rigorously tested. Consider adding unit tests to validate edge cases and ensure the schema is robust.
  3. Regex Injection in MCP Security Scanner

    • Issue: The scan_tool_definition function uses user-provided input (description) directly in regex operations. If the regex patterns are not properly sanitized, this could lead to regex injection vulnerabilities.
    • Impact: An attacker could craft a malicious tool description to exploit the regex engine, potentially causing denial of service (ReDoS) or other unexpected behavior.
    • Recommendation: Use a library like re.escape() to sanitize user-provided input before using it in regex operations.
  4. Lack of Thread Safety in Circuit Breaker Implementation

    • Issue: The simulate_call and check_circuit_breaker functions modify shared AgentMetrics objects without any synchronization mechanisms. This could lead to race conditions in a multi-threaded environment.
    • Impact: Inconsistent or incorrect metrics could lead to incorrect circuit breaker behavior, potentially allowing agents to operate in unsafe conditions.
    • Recommendation: Use thread-safe data structures or synchronization primitives (e.g., threading.Lock) to ensure thread safety when modifying shared state.

🟡 WARNING

  1. Backward Compatibility
    • Issue: The addition of the agentos policy subcommand introduces new functionality to the CLI. While this is not a breaking change, it is important to ensure that existing commands and workflows are not affected.
    • Recommendation: Verify that the new subcommand does not interfere with existing CLI commands. Add tests to ensure backward compatibility.

💡 SUGGESTIONS

  1. Error Reporting

    • Observation: The error reporting for the agentos policy validate command is functional but could be improved for user experience.
    • Suggestion: Consider adding color-coded output (e.g., using the colorama library) to make errors and warnings more visually distinct. Additionally, provide suggestions for fixing common errors when possible.
  2. Documentation

    • Observation: The new CLI commands and Jupyter notebooks are valuable additions, but the documentation does not appear to have been updated to reflect these changes.
    • Suggestion: Update the README and any relevant documentation to include details about the new CLI commands and how to use the Jupyter notebooks.
  3. Testing

    • Observation: While the PR mentions that the agentos policy validate command is "CI-friendly," there is no evidence of automated tests for this functionality.
    • Suggestion: Add unit tests and integration tests for the new CLI commands, especially for edge cases in YAML parsing, JSON schema validation, and Pydantic validation.
  4. Notebook Security

    • Observation: The Jupyter notebooks include code that simulates potentially dangerous operations (e.g., exec('/bin/sh') in the MCP Security Proxy notebook).
    • Suggestion: Add clear warnings in the notebook headers to indicate that the code is for educational purposes only and should not be used in production environments without proper safeguards.
  5. Code Quality

    • Observation: The _validate_yaml_with_line_numbers() and _load_json_schema() helper functions are not documented.
    • Suggestion: Add docstrings to these functions to improve code readability and maintainability.
  6. Error Handling

    • Observation: The agentos policy validate command exits with a non-zero code on failure, but the specific exit codes are not documented.
    • Suggestion: Document the possible exit codes and their meanings in the CLI help text or documentation.

Conclusion

This PR introduces useful features and enhancements to the agentos CLI and the Jupyter notebooks. However, there are critical security issues that must be addressed before merging. Additionally, improvements to documentation, testing, and error reporting would enhance the overall quality of the contribution.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

🤖 AI Agent: test-generator — `packages/agent-os/src/agent_os/cli/__init__.py`

🧪 Test Coverage Analysis

packages/agent-os/src/agent_os/cli/__init__.py

  • Existing coverage:

    • Basic validation of policy YAML files using the cmd_validate function.
    • CLI argument parsing for the validate command.
    • Basic error handling for missing files and invalid YAML syntax.
  • Missing coverage:

    • Validation of policies against the JSON schema using _validate_yaml_with_line_numbers and _load_json_schema.
    • Handling of specific validation errors, such as missing required fields (version, name), invalid rule structures, and invalid actions.
    • Strict mode behavior (treating warnings as errors).
    • Validation of edge cases, such as malformed YAML, empty files, or files with unexpected top-level fields.
    • New agentos policy subcommands (validate, test, diff) and their dispatch logic in cmd_policy.
  • 💡 Suggested test cases:

    1. test_validate_valid_policy — Test cmd_validate with a valid policy file to ensure it passes without errors or warnings.
    2. test_validate_missing_required_fields — Test cmd_validate with a policy file missing required fields (version, name) to ensure errors are reported.
    3. test_validate_invalid_yaml_syntax — Test cmd_validate with a malformed YAML file to ensure syntax errors are caught and reported with line numbers.
    4. test_validate_empty_file — Test cmd_validate with an empty file to ensure it reports an appropriate error.
    5. test_validate_invalid_rule_structure — Test cmd_validate with a policy file containing invalid rule structures (e.g., non-dictionary rules, missing name or action fields) to ensure errors are reported.
    6. test_validate_invalid_action — Test cmd_validate with a policy file containing invalid action values in rules to ensure errors are reported.
    7. test_validate_strict_mode — Test cmd_validate with strict mode enabled to ensure warnings are treated as errors.
    8. test_policy_validate_command — Test the agentos policy validate <file> subcommand to ensure it correctly validates a policy file and reports errors/warnings.
    9. test_policy_test_command — Test the agentos policy test <policy> <scenarios> subcommand to ensure it correctly runs scenario tests against a policy.
    10. test_policy_diff_command — Test the agentos policy diff <file1> <file2> subcommand to ensure it correctly compares two policy files and reports differences.
    11. test_validate_unknown_top_level_fields — Test cmd_validate with a policy file containing unknown top-level fields to ensure warnings are reported.
    12. test_validate_json_schema_errors — Test _validate_yaml_with_line_numbers with a policy file that violates the JSON schema to ensure errors are reported with correct field locations.
    13. test_validate_version_format_warning — Test cmd_validate with a policy file containing an invalid version format to ensure a warning is reported.
    14. test_validate_file_not_found — Test cmd_validate with a non-existent file to ensure an appropriate error is reported.
    15. test_validate_multiple_files — Test cmd_validate with multiple policy files (valid and invalid) to ensure all errors and warnings are reported correctly.
    16. test_policy_command_dispatch — Test cmd_policy to ensure it correctly dispatches to the appropriate subcommand (validate, test, diff) based on CLI arguments.

These test cases will ensure comprehensive coverage of the new agentos policy CLI functionality and the enhanced validation logic.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

🤖 AI Agent: security-scanner — Security Review of `feat: add policy YAML validation command to CLI`

Security Review of feat: add policy YAML validation command to CLI

1. Prompt Injection Defense Bypass

  • Risk: The agentos policy validate command parses YAML files and validates them against a JSON schema. If the YAML file contains malicious payloads (e.g., embedded code or malicious strings), there is a risk of prompt injection if these payloads are later used in AI agent prompts without proper sanitization.
  • Severity: 🟠 HIGH
  • Attack Vector: An attacker could craft a malicious policy YAML file with embedded prompt injection payloads. If the validated policy is later used in an AI agent's prompt without sanitization, it could lead to unintended behavior or security breaches.
  • Recommendation: Implement strict sanitization and escaping of all user-provided input fields, especially those that are likely to be used in AI prompts. Consider adding a specific validation step to detect and reject potentially malicious input patterns (e.g., prompt injection attempts).

2. Policy Engine Circumvention

  • Risk: The validation process relies on a JSON schema and additional checks for policy correctness. However, there is no explicit enforcement of schema integrity or prevention of schema tampering.
  • Severity: 🟠 HIGH
  • Attack Vector: If an attacker gains access to the policy_schema.json file or can replace it with a malicious version, they could bypass critical policy validation checks.
  • Recommendation:
    • Ensure the policy_schema.json file is bundled securely and verify its integrity at runtime using a cryptographic hash or signature.
    • Consider embedding the schema directly into the codebase to prevent tampering.
    • Add unit tests to verify that the schema is loaded and applied correctly.

3. Trust Chain Weaknesses

  • Risk: The jsonschema library is used for JSON Schema validation, but there is no verification of the library's integrity or version.
  • Severity: 🟡 MEDIUM
  • Attack Vector: If a compromised or outdated version of jsonschema is used, it could lead to incorrect validation or vulnerabilities.
  • Recommendation: Pin the jsonschema dependency to a specific, secure version in the requirements.txt or equivalent dependency file. Regularly audit and update dependencies to ensure they are free from known vulnerabilities.

4. Credential Exposure

  • Risk: The code does not appear to handle any sensitive credentials directly. However, the validation process may log sensitive information from the policy files.
  • Severity: 🟡 MEDIUM
  • Attack Vector: If a policy file contains sensitive information (e.g., API keys, secrets), these could be exposed in logs or error messages during validation.
  • Recommendation:
    • Avoid logging the full content of policy files or sensitive fields.
    • Mask or redact sensitive information in error messages and logs.

5. Sandbox Escape

  • Risk: The YAML parsing uses yaml.safe_load, which is a secure method for loading YAML files. This mitigates the risk of arbitrary code execution during deserialization.
  • Severity: 🔵 LOW
  • Attack Vector: None identified, as yaml.safe_load is used.
  • Recommendation: No immediate action required. Ensure that yaml.safe_load is consistently used and not replaced with yaml.load.

6. Deserialization Attacks

  • Risk: The use of yaml.safe_load mitigates the risk of deserialization attacks.
  • Severity: 🔵 LOW
  • Attack Vector: None identified.
  • Recommendation: Continue using yaml.safe_load for YAML parsing. Avoid using yaml.load or other unsafe deserialization methods.

7. Race Conditions

  • Risk: The validation process does not appear to involve concurrent operations or shared resources that could lead to race conditions.
  • Severity: 🔵 LOW
  • Attack Vector: None identified.
  • Recommendation: No immediate action required.

8. Supply Chain Risks

  • Risk: The code introduces dependencies on jsonschema and PyYAML. These dependencies are not pinned to specific versions, which increases the risk of supply chain attacks or compatibility issues.
  • Severity: 🟠 HIGH
  • Attack Vector: An attacker could exploit a vulnerability in an unpinned or outdated version of jsonschema or PyYAML to execute arbitrary code or bypass validation.
  • Recommendation:
    • Pin all dependencies to specific versions in the requirements.txt file.
    • Use a dependency scanning tool (e.g., Dependabot, Snyk, or pip-audit) to detect and address vulnerabilities in third-party libraries.

Additional Observations

  1. Error Handling:

    • The _validate_yaml_with_line_numbers function appends error messages to a list but does not raise exceptions for critical issues. This could lead to silent failures in some cases.
    • Recommendation: Consider raising exceptions for critical errors (e.g., missing required fields) to ensure they are not overlooked.
  2. Strict Mode:

    • The --strict mode treats unknown fields as warnings rather than errors.
    • Recommendation: Consider making strict mode the default behavior or providing a way to enforce stricter validation in production environments.
  3. Testing:

    • The PR does not include any new unit tests for the added functionality.
    • Recommendation: Add comprehensive unit tests to cover all validation scenarios, including edge cases and malformed inputs.

Summary of Findings

Finding Severity Recommendation
Prompt injection defense bypass 🟠 HIGH Sanitize and escape user-provided input to prevent prompt injection.
Policy engine circumvention 🟠 HIGH Secure the JSON schema file and verify its integrity at runtime.
Trust chain weaknesses 🟡 MEDIUM Pin jsonschema to a specific, secure version.
Credential exposure 🟡 MEDIUM Mask sensitive fields in logs and error messages.
Sandbox escape 🔵 LOW No immediate action required; yaml.safe_load is used.
Deserialization attacks 🔵 LOW Continue using yaml.safe_load.
Race conditions 🔵 LOW No immediate action required.
Supply chain risks 🟠 HIGH Pin dependencies to specific versions and use a dependency scanning tool.

Overall Risk Assessment: 🟠 HIGH

This PR introduces useful functionality but also presents several security risks, particularly around prompt injection, policy engine circumvention, and supply chain vulnerabilities. Addressing these issues is critical to ensure the security of the toolkit and its downstream users.

@kanish5
Copy link
Copy Markdown
Contributor Author

kanish5 commented Apr 1, 2026

@microsoft-github-policy-service agree

@kanish5 kanish5 force-pushed the feat/policy-yaml-validate branch from 059e587 to 415c4af Compare April 1, 2026 00:34
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Feedback on the Pull Request

🔴 CRITICAL Issues

  1. Improper Handling of jsonschema Import Failure:

    • In _validate_yaml_with_line_numbers, the jsonschema library is imported inside the function, and if the import fails, the code silently skips JSON Schema validation. This could lead to undetected policy violations if the library is not installed.
    • Recommendation: Make jsonschema a required dependency in the setup.py or pyproject.toml file. If this is not feasible, explicitly log an error and exit with a non-zero status if the library is not installed.
  2. Insufficient Validation of Policy Schema:

    • The _validate_yaml_with_line_numbers function performs some basic checks (e.g., required fields, rule structure), but these checks are not comprehensive. For example:
      • The rules field is only checked for being a list, but the individual rule objects are not fully validated against the schema.
      • The action field is validated against a hardcoded list of valid actions, but this list may not stay in sync with the JSON Schema.
    • Recommendation: Use jsonschema exclusively for schema validation and avoid duplicating validation logic in _validate_yaml_with_line_numbers. This ensures consistency and reduces the risk of discrepancies between the schema and the code.
  3. Potential for Sandbox Escape in YAML Parsing:

    • The yaml.safe_load function is used for parsing YAML files. While this is safer than yaml.load, it is still vulnerable to certain attacks, such as those exploiting YAML tags.
    • Recommendation: Use a library like ruamel.yaml or a stricter YAML parser to avoid potential sandbox escape vulnerabilities.
  4. Error Reporting for YAML Parsing:

    • The error messages for YAML parsing issues are not sanitized. If an attacker provides a malicious YAML file, the error message could potentially leak sensitive information or be used for injection attacks.
    • Recommendation: Sanitize error messages before displaying them to the user.

🟡 WARNING Issues

  1. Backward Compatibility Risk:

    • The addition of the agentos policy subcommand introduces a new CLI structure. If users were relying on the existing validate command, this change could break their workflows.
    • Recommendation: Provide a deprecation warning for the old validate command and ensure that it continues to work for a transitional period.
  2. Potential Breaking Changes in Policy Validation:

    • The new validation logic introduces stricter checks (e.g., required fields, rule structure). Policies that previously passed validation may now fail.
    • Recommendation: Clearly document these changes in the release notes and provide a migration guide for users to update their policies.

💡 Suggestions for Improvement

  1. Improve Error Messages:

    • The error messages in _validate_yaml_with_line_numbers are functional but could be more user-friendly. For example, instead of rules[2] -> action, consider using a more descriptive format like Rule 2: Invalid action field.
    • Recommendation: Use a consistent and user-friendly format for error messages.
  2. Add Unit Tests for CLI Commands:

    • The new agentos policy subcommands (validate, test, diff) should be thoroughly tested to ensure they work as expected.
    • Recommendation: Add unit tests for each subcommand, covering both success and failure scenarios.
  3. Thread Safety in Policy Validation:

    • The _validate_yaml_with_line_numbers function and related logic do not appear to have any thread safety issues, but this should be explicitly verified if the CLI is expected to handle concurrent requests (e.g., in a CI/CD pipeline).
    • Recommendation: Add tests to verify thread safety in concurrent execution scenarios.
  4. Support for JSON Input:

    • The agentos policy validate command currently supports YAML files but does not explicitly mention support for JSON files.
    • Recommendation: Add explicit support for JSON files and update the documentation accordingly.
  5. Improve Documentation:

    • The docstrings for the new functions and methods are detailed, but the overall documentation for the agentos policy subcommands could be improved.
    • Recommendation: Add examples and usage instructions for the new subcommands in the CLI documentation.
  6. Use of pathlib for File Handling:

    • The code uses Path.read_text() and Path.write_text() for file I/O, which is good. However, there are some instances where open() is still used (e.g., in cmd_validate).
    • Recommendation: Use pathlib consistently for file handling.
  7. Consider Adding a Dry-Run Mode:

    • For the agentos policy test and agentos policy diff subcommands, a dry-run mode could be useful for users who want to preview the results without making changes.
    • Recommendation: Add a --dry-run flag to these subcommands.
  8. Logging Improvements:

    • The CLI commands use print() for output, which is fine for user-facing messages but not ideal for logging errors or debugging information.
    • Recommendation: Use the logging module for error and debug messages.

Summary of Changes Needed

Type Description
🔴 CRITICAL Ensure jsonschema is a required dependency or handle its absence explicitly.
🔴 CRITICAL Use jsonschema exclusively for schema validation to avoid duplication.
🔴 CRITICAL Use a stricter YAML parser to prevent sandbox escape vulnerabilities.
🔴 CRITICAL Sanitize error messages to prevent information leakage or injection attacks.
🟡 WARNING Maintain backward compatibility for the validate command.
🟡 WARNING Document stricter validation rules and provide a migration guide.
💡 SUGGESTION Improve error messages, documentation, and add unit tests for new commands.
💡 SUGGESTION Add support for JSON input and a dry-run mode for certain subcommands.
💡 SUGGESTION Use pathlib consistently and improve logging practices.

By addressing these issues and suggestions, the new feature will be more robust, secure, and user-friendly.

Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Found 3 blocking issues:

  1. cmd_serve is undefined — NameError crashes the entire CLI (not just serve). Remove from commands dict until implemented.
  2. Metrics subparser deleted — agentos metrics is now broken. Restore the metrics subparser registration.
  3. Orphaned code on success path — cmd_validate falls through to old code referencing undefined all_valid. Add explicit return 0 and remove leftover results loop.

Also: --json output support was silently dropped (breaking for CI consumers), and --strict flag is referenced but never registered in argparse.

The YAML validation logic itself is well-done (yaml.safe_load, graceful jsonschema fallback). Just needs the wiring fixes above.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This pull request introduces a new agentos policy validate <file> CLI command, along with related subcommands for testing and diffing policy files. The implementation includes YAML parsing, JSON Schema validation, and Pydantic-based structural validation. The PR also includes updates to the crewai_adapter.py and google_adk_adapter.py files to add docstrings and enhance policy enforcement mechanisms.

The changes are well-structured and provide a comprehensive validation pipeline for policy files. However, there are a few areas that require attention, particularly around security, backward compatibility, and potential improvements.


🔴 CRITICAL Issues

  1. JSON Schema Validation Dependency Handling

    • The _validate_yaml_with_line_numbers function silently skips JSON Schema validation if the jsonschema library is not installed. This could lead to incomplete validation and potential security bypasses.
    • Action: Explicitly check for the presence of jsonschema during initialization or command execution and raise an error if it is missing. Alternatively, make it a required dependency.
  2. Error Handling in Policy Validation

    • In the cmd_validate function, exceptions during YAML parsing or validation are caught and logged but not re-raised. This could lead to silent failures in CI pipelines if the error is not properly propagated.
    • Action: Ensure that critical errors (e.g., YAML parsing issues) result in a non-zero exit code and are clearly communicated to the user.
  3. Policy Violation Handling

    • The _default_violation_handler in google_adk_adapter.py only logs policy violations. This could allow violations to go unnoticed in production environments.
    • Action: Consider implementing stricter handling for policy violations, such as halting execution or triggering alerts, depending on the severity of the violation.

🟡 WARNING: Potential Breaking Changes

  1. CLI Argument Changes

    • The addition of new CLI commands (policy validate, policy test, policy diff) and changes to the validate command (e.g., --strict flag) could potentially break existing workflows or scripts that rely on the old CLI behavior.
    • Action: Clearly document these changes in the release notes and provide migration guidance for users.
  2. Behavior of validate Command

    • The validate command now defaults to validating all YAML files in the .agents/ directory if no files are specified. This behavior change could lead to unexpected results for users who previously relied on the command's old behavior.
    • Action: Add a deprecation warning for the old behavior and provide a clear transition path.

💡 Suggestions for Improvement

  1. Error Reporting

    • The error messages generated during validation could include more actionable details, such as suggestions for fixing the issue.
    • Suggestion: Enhance error messages to include links to documentation or examples for resolving common issues.
  2. Test Coverage

    • Ensure that the new CLI commands and validation logic are thoroughly tested, including edge cases like empty files, invalid YAML, and unsupported schema versions.
    • Suggestion: Add unit tests and integration tests for the new functionality.
  3. Strict Mode Behavior

    • The --strict flag currently treats unknown top-level fields as warnings. Consider making this behavior configurable to allow users to decide whether unknown fields should be treated as errors.
    • Suggestion: Add a configuration option to control the behavior of strict mode.
  4. Performance Optimization

    • The cmd_validate function reads and parses each file multiple times (e.g., once for YAML parsing and again for validation). This could be optimized to reduce redundant I/O operations.
    • Suggestion: Parse the file once and reuse the parsed content for validation.
  5. Documentation

    • The new CLI commands and their options should be documented in the project's README or user guide.
    • Suggestion: Update the documentation to include examples and use cases for the new commands.
  6. Backward Compatibility

    • Consider adding a fallback mechanism for users who may not have the policy_schema.json file available.
    • Suggestion: Provide a default schema or a way to fetch it dynamically if it is missing.

Final Notes

The PR introduces valuable functionality for policy validation and governance, which is critical for maintaining security and compliance. Addressing the critical issues and warnings will ensure that the changes are robust and user-friendly. The suggestions provided can further enhance the usability and maintainability of the code.

@kanish5 kanish5 force-pushed the feat/policy-yaml-validate branch from a108d53 to 4acbfab Compare April 3, 2026 03:01
@github-actions github-actions bot added the size/L Large PR (< 500 lines) label Apr 3, 2026
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review of feat: add policy YAML validation command to CLI

🔴 CRITICAL

  1. Unvalidated Input in YAML Parsing:

    • The _validate_yaml_with_line_numbers() function directly processes the content dictionary parsed from the YAML file without verifying its structure beforehand. This could lead to unexpected behavior or security vulnerabilities if the input is maliciously crafted.
    • Recommendation: Add a pre-validation step to ensure that the content dictionary conforms to expected types and structure before proceeding with further validation. For example:
      if not isinstance(content, dict):
          raise ValueError(f"{filepath}: Top-level content must be a dictionary.")
  2. Potential Sandbox Escape via YAML Parsing:

    • The yaml.safe_load() function is used for parsing YAML files, which is generally safer than yaml.load(). However, it is still susceptible to certain edge cases, such as maliciously crafted YAML files that could exploit the Python runtime.
    • Recommendation: Consider using a stricter YAML parser or explicitly validating the parsed content against a known schema before processing it further.
  3. JSON Schema Validation Error Handling:

    • The jsonschema.Draft7Validator is used for schema validation, but there is no explicit handling for potential exceptions (e.g., jsonschema.exceptions.SchemaError or jsonschema.exceptions.ValidationError) that could occur during validation.
    • Recommendation: Add explicit exception handling for jsonschema errors to ensure that the program does not crash unexpectedly.

🟡 WARNING

  1. Backward Compatibility:
    • The addition of the agentos policy subcommand and its subcommands (validate, test, diff) introduces new CLI functionality. While this is a feature addition, it could potentially break existing workflows if users rely on the previous behavior of the validate command.
    • Recommendation: Clearly document the changes in the release notes and provide a deprecation warning for any old commands that are being replaced.

💡 SUGGESTIONS

  1. Error Messages for Users:

    • The error messages generated by _validate_yaml_with_line_numbers() are detailed, but they could be overwhelming for users. Consider categorizing errors and warnings more clearly in the output, or providing a summary of the issues at the end of the validation process.
  2. Strict Mode Behavior:

    • In strict mode, unknown top-level fields are reported as warnings. However, the behavior of treating warnings as errors is not implemented. If strict mode is enabled, warnings should also result in a non-zero exit code.
    • Recommendation: Modify the return logic in cmd_validate() to account for warnings when args.strict is True.
  3. Unit Tests:

    • Ensure that there are comprehensive unit tests for the new functionality, including edge cases such as:
      • Invalid YAML syntax
      • Missing required fields
      • Invalid field types
      • Unknown fields in strict mode
      • Valid policies
    • Recommendation: Add tests for these scenarios if they are not already covered.
  4. Performance Considerations:

    • The _validate_yaml_with_line_numbers() function performs multiple passes over the content dictionary. For large policy files, this could impact performance.
    • Recommendation: Optimize the validation logic to minimize redundant iterations over the same data structure.
  5. Error Reporting Consistency:

    • The error messages for JSON Schema validation and manual checks use different formats. For example:
      • JSON Schema: "{filepath}: [{location}] {ve.message}"
      • Manual checks: "{filepath}: Missing required field: '{field}'"
    • Recommendation: Standardize the error message format for consistency.
  6. Logging and Debugging:

    • The current implementation uses print() statements for output. While this is fine for CLI tools, it may be beneficial to integrate a logging framework (e.g., Python's logging module) for better debugging and log management.
    • Recommendation: Replace print() with a logging framework and provide different log levels (e.g., info, warning, error).
  7. Documentation:

    • The new agentos policy subcommands (validate, test, diff) should be documented in the project's README or CLI documentation. This will help users understand the new functionality and how to use it effectively.
  8. Type Annotations:

    • The _validate_yaml_with_line_numbers() function uses type annotations, but the filepath parameter is annotated as Path, while the function uses it as a string in error messages. This could lead to type-related issues.
    • Recommendation: Use str instead of Path for the filepath parameter, or explicitly convert Path to str when constructing error messages.
  9. Default Behavior for agentos policy validate:

    • The cmd_validate() function defaults to validating all YAML files in the .agents/ directory if no files are specified. This behavior might be unexpected for users who are not aware of the default.
    • Recommendation: Prompt the user to confirm the default behavior or provide a clear warning message.
  10. Dependency Management:

    • The jsonschema library is used for JSON Schema validation, but it is not clear if it is listed as a required dependency in the requirements.txt or pyproject.toml.
    • Recommendation: Ensure that jsonschema is listed as a required dependency and document its purpose in the project.

Summary

  • 🔴 CRITICAL: Address unvalidated input, potential sandbox escape, and JSON Schema error handling.
  • 🟡 WARNING: Ensure backward compatibility and document changes clearly.
  • 💡 SUGGESTIONS: Improve error reporting, add unit tests, optimize performance, and enhance documentation.

Let me know if you'd like me to review specific parts of the code or provide additional recommendations!

@kanish5
Copy link
Copy Markdown
Contributor Author

kanish5 commented Apr 3, 2026

Hi @imran-siddique — addressed all 3 blocking issues in the latest commit:

  1. Removed cmd_serve from dispatch table (was causing NameError)
  2. Restored metrics subparser registration (agentos metrics works again)
  3. Removed orphaned results/all_valid code, added explicit return 0 on success path
  4. Added --strict flag to validate_parser (was referenced but unregistered)

Also rebased against the latest main to resolve the adapter file conflicts. Ready for re-review!

Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 3 blocking issues addressed. LGTM.

@imran-siddique imran-siddique merged commit 558d1b7 into microsoft:main Apr 3, 2026
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large PR (< 500 lines) size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add policy YAML validation command to CLI

2 participants