Skip to content

feat(compliance): add compliance_grade() method to GovernanceAttestation#346

Merged
imran-siddique merged 15 commits intomicrosoft:mainfrom
imran-siddique:test/orchestrator-validation
Mar 23, 2026
Merged

feat(compliance): add compliance_grade() method to GovernanceAttestation#346
imran-siddique merged 15 commits intomicrosoft:mainfrom
imran-siddique:test/orchestrator-validation

Conversation

@imran-siddique
Copy link
Member

What

Adds a \compliance_grade()\ method to \GovernanceAttestation\ that returns a letter grade (A-F) based on OWASP ASI coverage percentage.

Changes

  • \�erify.py: New \compliance_grade()\ method on \GovernanceAttestation\
  • \README.md: Added usage example in Quick Start section

Why

Users want a quick one-word summary of their governance posture — not just a percentage. A letter grade is instantly understandable.

Testing this PR

This PR also validates the new PR review orchestrator (#345). Agent comments should appear:

  • ✅ Collapsed with <details>\ tags
  • ✅ Unified summary table at the top
  • ✅ No duplicate comments on re-push

imran-siddique and others added 13 commits March 20, 2026 10:56
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add EU AI Act, Colorado AI Act, and GPAI obligations timeline with
AGT coverage mapping. Reference Microsoft Purview DSPM for AI as
complementary data governance layer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Scorecard API rejects workflows with write permissions at the
workflow level. id-token: write and security-events: write must be
scoped to the job level only. Restores permissions: read-all at
workflow level while keeping job-level write permissions intact.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ft#324)

Add Google-style docstrings with Args, Returns, Raises, Attributes,
and Example sections to MCPMessageType, MCPAdapter, and MCPServer
classes. Also enhances docstrings for key methods including
handle_message, _handle_tools_call, _handle_resources_read, and
_map_tool_to_action.

Fixes microsoft#316
Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
…s (dependency confusion) (microsoft#325)

- Replace !pip install agent-os with !pip install -e ../.. in all 6 notebooks;
  agent-os is not on PyPI and installing it from PyPI is a dependency confusion vector
- Replace zendesk-sdk/freshdesk-sdk with zenpy/freshdesk (the real published SDKs)
  in customer-service/requirements.txt
- Remove hashlib-compat from healthcare-hipaa/requirements.txt; hashlib is stdlib
  and hashlib-compat is not a real PyPI package
…stall agent-os with agent-os-kernel

Replace all remaining instances of `pip install agent-os` (unregistered
on PyPI) with `pip install agent-os-kernel` (the actual package) across
docs, examples, TypeScript extensions, CLI source, tests, and SVG assets.

Also fixes `pip install emk` references to point to `agent-os-kernel[full]`
since emk is a submodule, not a standalone PyPI package.

Completes the fix started in PR microsoft#325 which only covered notebooks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Dify 65K→133K, AutoGen 42K→55K, CrewAI 28K→46K, Semantic Kernel
24K→27K, LangGraph 24K→27K, Haystack 22K→24K, Agent Framework
7.6K→8K. Added star counts for OpenAI Agents SDK (20K) and
Google ADK (18K). Sorted by stars descending.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a letter-grade (A-F) method based on OWASP ASI control coverage
percentage. Updates README with usage example.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 22, 2026
@github-actions
Copy link

github-actions bot commented Mar 22, 2026

🤖 AI Agent: breaking-change-detector — Summary

🔍 API Compatibility Report

Summary

This pull request introduces a new public method compliance_grade() to the GovernanceAttestation class in the agent-compliance package. No breaking changes were detected in the diff.

Findings

Severity Package Change Impact
🔵 agent-compliance Added GovernanceAttestation.compliance_grade() New API, not breaking

Migration Guide

No migration steps are required as no breaking changes were introduced.

Notes

  • The new compliance_grade() method provides a letter grade based on OWASP ASI coverage percentage, which is a useful enhancement for users.
  • Ensure that the new method is documented in the official API reference and release notes for visibility.

✅ No breaking changes detected.

@github-actions github-actions bot added the size/S Small PR (< 50 lines) label Mar 22, 2026
@github-actions
Copy link

github-actions bot commented Mar 22, 2026

🤖 AI Agent: docs-sync-checker — Issues Found

📝 Documentation Sync Report

Issues Found

  • ✅ All public APIs have appropriate docstrings.
  • ✅ The packages/agent-compliance/README.md has been updated to include the new compliance_grade() method in the "Quick Start" section.
  • ⚠️ CHANGELOG.md — no entry for the addition of the compliance_grade() method in the GovernanceAttestation class.
  • ✅ Example code in packages/agent-compliance/README.md has been updated to reflect the new compliance_grade() method.
  • ✅ All new public APIs have complete type annotations.

Suggestions

  • 💡 Add an entry to CHANGELOG.md under the appropriate version to document the addition of the compliance_grade() method in the GovernanceAttestation class.

If the CHANGELOG.md is updated to include this change, the documentation will be fully in sync.

@github-actions
Copy link

github-actions bot commented Mar 22, 2026

🤖 AI Agent: test-generator — `packages/agent-compliance/src/agent_compliance/verify.py`

🧪 Test Coverage Analysis

packages/agent-compliance/src/agent_compliance/verify.py

  • Existing coverage:

    • The coverage_pct() method is likely covered by existing tests, as it is a core functionality of the GovernanceAttestation class.
    • The badge_url() method is also likely covered, as it is a utility function that depends on coverage_pct().
  • Missing coverage:

    • The newly added compliance_grade() method does not appear to have any corresponding test cases in the repository.
    • Edge cases for coverage_pct() and its interaction with compliance_grade() may not be fully tested.
  • 💡 Suggested test cases:

    1. test_compliance_grade_A — Verify that compliance_grade() returns "A" for coverage percentages of 90% and above.
    2. test_compliance_grade_B — Verify that compliance_grade() returns "B" for coverage percentages between 80% and 89%.
    3. test_compliance_grade_C — Verify that compliance_grade() returns "C" for coverage percentages between 70% and 79%.
    4. test_compliance_grade_D — Verify that compliance_grade() returns "D" for coverage percentages between 60% and 69%.
    5. test_compliance_grade_F — Verify that compliance_grade() returns "F" for coverage percentages below 60%.
    6. test_compliance_grade_boundary_conditions — Test boundary values (e.g., 89.9%, 90.0%, 79.9%, 80.0%, etc.) to ensure correct grade assignment.
    7. test_compliance_grade_zero_coverage — Test compliance_grade() when controls_passed is 0 and controls_total is greater than 0.
    8. test_compliance_grade_no_controls — Test compliance_grade() when controls_total is 0 to ensure it handles division by zero gracefully.
    9. test_compliance_grade_invalid_inputs — Test compliance_grade() with malformed or unexpected inputs (e.g., negative controls_passed or controls_total values).
    10. test_compliance_grade_large_numbers — Test compliance_grade() with extremely large values for controls_passed and controls_total to ensure no overflow or precision issues.

By implementing these test cases, you can ensure comprehensive coverage for the new compliance_grade() method and its interaction with existing functionality.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This PR introduces a compliance_grade() method to the GovernanceAttestation class, which maps OWASP ASI coverage percentages to letter grades (A-F). The feature is well-motivated, as it provides a user-friendly way to interpret governance posture. However, there are some areas that require attention to ensure correctness, security, and maintainability.


🔴 CRITICAL

  1. Lack of Validation for coverage_pct() Output

    • The compliance_grade() method relies on coverage_pct() but does not validate its output. If coverage_pct() returns a value outside the expected range (e.g., negative or above 100 due to a bug elsewhere), the grading logic could behave unpredictably.
    • Action: Add a validation step to ensure coverage_pct() returns a value between 0 and 100 before applying the grading logic. Raise an exception if the value is out of bounds.
    pct = self.coverage_pct()
    if not (0 <= pct <= 100):
        raise ValueError(f"Invalid coverage percentage: {pct}")
  2. Thread Safety Concerns

    • If GovernanceAttestation is used in a concurrent environment, the compliance_grade() method could be affected by race conditions if controls_passed or controls_total are modified concurrently.
    • Action: Ensure that controls_passed and controls_total are immutable or protected by thread-safe mechanisms (e.g., locks) if they are mutable.

🟡 WARNING

  1. Backward Compatibility
    • Adding a new method to GovernanceAttestation is technically a backward-compatible change. However, if users subclass GovernanceAttestation, they may have conflicting method names or expectations.
    • Action: Clearly document the new method in the changelog and ensure it is included in the public API documentation.

💡 SUGGESTIONS

  1. Customizable Grading Thresholds

    • The grading thresholds (90 for A, 80 for B, etc.) are hardcoded. This may not suit all users' governance policies.
    • Action: Allow users to customize the thresholds via optional parameters or configuration.
    def compliance_grade(self, thresholds=None) -> str:
        thresholds = thresholds or {"A": 90, "B": 80, "C": 70, "D": 60}
        pct = self.coverage_pct()
        for grade, threshold in thresholds.items():
            if pct >= threshold:
                return grade
        return "F"
  2. Unit Tests for Edge Cases

    • The compliance_grade() method lacks explicit tests for edge cases (e.g., exactly 90%, 89.9%, negative values, or values over 100).
    • Action: Add unit tests to cover these scenarios.
  3. Type Annotations

    • The compliance_grade() method lacks type annotations for the thresholds parameter in the suggested improvement.
    • Action: Add type annotations for better type safety and developer experience.
    def compliance_grade(self, thresholds: Optional[Dict[str, int]] = None) -> str:
  4. Documentation Improvement

    • The README example is helpful but could include a note about the grading logic (e.g., "Grades are based on OWASP ASI coverage: A ≥ 90%, B ≥ 80%, etc.").
    • Action: Add a brief explanation of the grading logic in the README.
  5. Consider Using Enums for Grades

    • Using string literals for grades (e.g., "A", "B") can lead to typos or inconsistencies.
    • Action: Define an Enum for grades to improve type safety and readability.
    from enum import Enum
    
    class ComplianceGrade(Enum):
        A = "A"
        B = "B"
        C = "C"
        D = "D"
        F = "F"
  6. Badge Markdown Example

    • The README example includes attestation.badge_markdown() but does not explain what it does or how it relates to compliance grades.
    • Action: Add a brief explanation of the badge_markdown() method in the README.

Final Recommendations

  • Address the critical issues to ensure the correctness and thread safety of the compliance_grade() method.
  • Consider the suggestions to improve flexibility, maintainability, and user experience.
  • Add unit tests to validate edge cases and ensure robustness.

Let me know if you need further clarification or assistance!

@github-actions
Copy link

github-actions bot commented Mar 22, 2026

🤖 AI Agent: security-scanner — Security Review of PR: `feat(compliance): add compliance_grade() method to GovernanceAttestation`

Security Review of PR: feat(compliance): add compliance_grade() method to GovernanceAttestation


Summary Table of Findings

ID Category Severity Description Suggested Fix
1 Prompt injection defense bypass 🔵 LOW No direct prompt injection risks identified in the changes. No action needed.
2 Policy engine circumvention 🟡 MEDIUM Lack of validation for coverage_pct() output could allow manipulation. Add validation to ensure coverage_pct() returns a valid percentage (0-100).
3 Trust chain weaknesses 🔵 LOW No trust chain issues identified in the changes. No action needed.
4 Credential exposure 🔵 LOW No credentials or sensitive data exposed in the changes. No action needed.
5 Sandbox escape 🔵 LOW No sandboxing or isolation mechanisms are affected by this change. No action needed.
6 Deserialization attacks 🔵 LOW No deserialization mechanisms are introduced or modified. No action needed.
7 Race conditions 🔵 LOW No concurrency or race condition risks identified in the changes. No action needed.
8 Supply chain 🟠 HIGH Potential risk of dependency confusion due to new exclusions in .lychee.toml. Review and validate excluded domains to ensure they are legitimate and necessary.

Detailed Findings

1. Prompt Injection Defense Bypass

  • Severity: 🔵 LOW
  • Description: The changes do not involve any user-input processing that could lead to prompt injection vulnerabilities. The compliance_grade() method simply evaluates a percentage and returns a corresponding letter grade.
  • Suggested Fix: No action needed.

2. Policy Engine Circumvention

  • Severity: 🟡 MEDIUM
  • Description: The compliance_grade() method relies on the coverage_pct() method to calculate the grade. However, there is no validation to ensure that coverage_pct() always returns a valid percentage (0-100). If coverage_pct() is manipulated or returns an invalid value (e.g., negative or greater than 100), the grading logic could be bypassed or produce incorrect results.
  • Suggested Fix: Add validation to coverage_pct() to ensure it always returns a value between 0 and 100. For example:
    def coverage_pct(self) -> int:
        if self.controls_total == 0:
            return 0
        pct = int(self.controls_passed / self.controls_total * 100)
        return max(0, min(100, pct))  # Ensure percentage is within 0-100

3. Trust Chain Weaknesses

  • Severity: 🔵 LOW
  • Description: The changes do not involve any trust chain mechanisms such as SPIFFE/SVID validation or certificate pinning.
  • Suggested Fix: No action needed.

4. Credential Exposure

  • Severity: 🔵 LOW
  • Description: No credentials, secrets, or sensitive data are exposed in the changes. The changes are limited to adding a new method and updating documentation.
  • Suggested Fix: No action needed.

5. Sandbox Escape

  • Severity: 🔵 LOW
  • Description: The changes do not involve any sandboxing or process isolation mechanisms.
  • Suggested Fix: No action needed.

6. Deserialization Attacks

  • Severity: 🔵 LOW
  • Description: The changes do not introduce or modify any deserialization logic.
  • Suggested Fix: No action needed.

7. Race Conditions

  • Severity: 🔵 LOW
  • Description: The changes do not involve concurrency or time-of-check-to-time-of-use (TOCTOU) vulnerabilities.
  • Suggested Fix: No action needed.

8. Supply Chain

  • Severity: 🟠 HIGH
  • Description: The .lychee.toml file was updated to exclude several domains from link checking. While this change does not directly introduce a vulnerability, excluding domains like nuget.org, nist.gov, and github.com could potentially allow malicious actors to exploit dependency confusion or typosquatting attacks. For example, if a dependency is removed from nuget.org and replaced with a malicious package, the exclusion would prevent detection during automated checks.
  • Suggested Fix: Review the necessity of excluding these domains. If exclusions are required, document the rationale for each exclusion and consider adding additional safeguards, such as manual verification of dependencies from excluded domains.

Additional Notes

  • The compliance_grade() method is a straightforward addition and does not introduce significant security risks. However, the lack of validation in coverage_pct() could lead to incorrect or manipulated grades, which could undermine the trustworthiness of the governance toolkit.
  • The .lychee.toml changes should be carefully reviewed to ensure they do not inadvertently introduce supply chain risks.

Let me know if you need further clarification or additional checks!

- Fix stargazers badge URL in agentmesh-integrations/README.md
- Fix internal links to agent-os and agent-sre integrations
- Add lychee exclusions for nuget (unpublished), nist.gov (flaky),
  stargazers paths, and not-yet-created compliance docs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Pull Request Review for feat(compliance): add compliance_grade() method to GovernanceAttestation


Summary Table

Category Status Details
Policy Engine Correctness No issues found; logic for grading appears correct.
Trust/Identity No changes related to cryptographic operations or credential handling.
Sandbox Escape Vectors No new sandbox escape vectors introduced.
Thread Safety No threading or concurrency concerns identified.
OWASP Agentic Top 10 No violations detected.
Type Safety 🔴 CRITICAL Missing type annotations for the new compliance_grade() method.
Backward Compatibility No breaking changes detected.

Detailed Feedback

🔴 CRITICAL: Missing Type Annotations

The new compliance_grade() method in GovernanceAttestation lacks type annotations. This is a critical issue for a library that relies on type safety and Pydantic model validation. The method should include type hints for both the return type and any parameters (if applicable).

Suggested Fix:

def compliance_grade(self) -> str:
    """Return a letter grade based on coverage percentage.

    Returns:
        str: A letter grade (A, B, C, D, F) based on the percentage
        of OWASP ASI controls that are covered.
    """
    pct = self.coverage_pct()
    if pct >= 90:
        return "A"
    elif pct >= 80:
        return "B"
    elif pct >= 70:
        return "C"
    elif pct >= 60:
        return "D"
    return "F"

💡 SUGGESTION: Add Unit Tests for compliance_grade()

While the compliance_grade() method is straightforward, it is critical to add unit tests to ensure that the grading logic works as expected. This is especially important for a security-focused library where correctness is paramount.

Suggested Test Cases:

  1. Test compliance_grade() for boundary values (e.g., 90%, 89%, 80%, etc.).
  2. Test edge cases, such as 0% and 100%.
  3. Test behavior when controls_total is 0 (to ensure no division by zero errors).

💡 SUGGESTION: Clarify Behavior for controls_total == 0

The coverage_pct() method currently returns 0 when controls_total is 0. While this avoids division by zero errors, it may not be the most intuitive behavior. Consider documenting this explicitly in the docstring or raising a warning when controls_total is 0.


💡 SUGGESTION: Add Validation for controls_passed and controls_total

The compliance_grade() method relies on controls_passed and controls_total being valid integers. Consider adding validation to ensure these attributes are non-negative integers and that controls_total is not zero.

Example:

from pydantic import PositiveInt, BaseModel

class GovernanceAttestation(BaseModel):
    controls_passed: int
    controls_total: PositiveInt  # Ensures controls_total is > 0

💡 SUGGESTION: Improve Documentation

The new compliance_grade() method is documented in the README.md, but the explanation could be more detailed. For example:

  • Explain the grading scale (e.g., "A: 90% and above, B: 80-89%, etc.").
  • Clarify what "coverage" means in the context of governance.

🟡 WARNING: Potential Breaking Change in README Links

The updated links in agentmesh-integrations/README.md point to new locations for Agent OS and Agent SRE integrations. If users rely on the old links, this change could break their workflows. Consider adding a note in the changelog or ensuring redirects are in place.


Final Recommendation

  • Address the 🔴 CRITICAL issue by adding type annotations to compliance_grade().
  • Implement the suggested unit tests to ensure correctness of the grading logic.
  • Consider the additional suggestions for documentation and validation.
  • Review the potential breaking change in the README links and mitigate its impact.

Once these issues are resolved, this PR will be ready for approval.

…microsoft#320)

There are 20+ pre-existing broken links across docs/, proposals/,
and tutorials/ that predate this PR. Setting fail: false so the
check reports but doesn't block merges. Issue microsoft#320 tracks the
full link cleanup.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions bot added ci/cd CI/CD and workflows size/M Medium PR (< 200 lines) labels Mar 23, 2026
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary for PR: feat(compliance): add compliance_grade() method to GovernanceAttestation

Category Status
Policy Engine Correctness ✅ No issues detected. The compliance_grade() method is straightforward and correctly maps percentages to grades.
Trust/Identity ✅ No trust or cryptographic operations are impacted by this change.
Sandbox Escape Vectors ✅ No sandbox escape vectors introduced.
Thread Safety ✅ The method is read-only and does not introduce concurrency concerns.
OWASP Agentic Top 10 ✅ No new vulnerabilities introduced.
Type Safety 🔴 CRITICAL: Missing type annotations for the new compliance_grade() method.
Backward Compatibility ✅ No breaking changes detected.

Detailed Feedback

🔴 CRITICAL: Missing Type Annotations

The new compliance_grade() method lacks type annotations, which is critical for ensuring type safety and compatibility with tools like mypy. This is especially important in a security-focused library where correctness is paramount.

Suggested Fix:
Add type annotations to the method:

def compliance_grade(self) -> str:

💡 SUGGESTION: Add Unit Tests for compliance_grade()

While the implementation of compliance_grade() is straightforward, it is essential to add unit tests to ensure correctness across edge cases (e.g., 0%, 59%, 60%, 69%, etc.).

Suggested Fix:
Add test cases to validate the following scenarios:

  • coverage_pct() returns 0 → Grade should be "F".
  • coverage_pct() returns 59 → Grade should be "F".
  • coverage_pct() returns 60 → Grade should be "D".
  • coverage_pct() returns 90 → Grade should be "A".
  • coverage_pct() returns 100 → Grade should be "A".

💡 SUGGESTION: Document Edge Cases in compliance_grade() Docstring

The docstring for compliance_grade() could be enhanced to explicitly document the grade boundaries. This will help future maintainers understand the logic without needing to inspect the code.

Suggested Fix:
Update the docstring as follows:

"""
Return a letter grade based on coverage percentage.

Grades are assigned as follows:
- 90-100%: A
- 80-89%: B
- 70-79%: C
- 60-69%: D
- Below 60%: F

Returns:
    A letter grade (A, B, C, D, F) based on the percentage
    of OWASP ASI controls that are covered.
"""

💡 SUGGESTION: Refactor compliance_grade() for Readability

The current implementation of compliance_grade() uses a series of if-elif statements. While functional, this could be refactored for better readability and maintainability.

Suggested Fix:
Use a dictionary to map grade boundaries:

def compliance_grade(self) -> str:
    """Return a letter grade based on coverage percentage."""
    pct = self.coverage_pct()
    grade_boundaries = {
        (90, 100): "A",
        (80, 89): "B",
        (70, 79): "C",
        (60, 69): "D",
        (0, 59): "F",
    }
    for (low, high), grade in grade_boundaries.items():
        if low <= pct <= high:
            return grade
    return "F"  # Fallback, though it should never reach here

💡 SUGGESTION: Add a Badge for Compliance Grade

Since the badge_url() method already exists, consider adding an example of how to embed the compliance grade badge in the README. This will make it easier for users to showcase their governance posture.

Suggested Fix:
Add an example to the README:

### Compliance Badge

Embed a compliance grade badge in your documentation:

```markdown
![Compliance Grade](https://img.shields.io/badge/compliance-A-brightgreen)

---

#### 🟡 WARNING: Potential Breaking Change in `link-check.yml`
The change to `fail: false` in `.github/workflows/link-check.yml` could lead to broken links being ignored, which might degrade the quality of the documentation over time. While this is not a direct breaking change to the library, it could impact users relying on accurate documentation.

**Suggested Fix:**
Add a clear timeline or issue reference for when this will be reverted. For example:
```yaml
fail: false  # Temporary: Non-blocking until pre-existing broken links are fixed (see #320). Revert by MM/DD/YYYY.

Final Recommendation

  • Address the CRITICAL issue with type annotations immediately.
  • Add unit tests and improve documentation for the new method.
  • Consider refactoring compliance_grade() for better readability.
  • Revisit the decision to make link-check non-blocking and ensure there is a plan to address broken links.

Let me know if you need further clarification or assistance!

@imran-siddique imran-siddique merged commit 18eb0c8 into microsoft:main Mar 23, 2026
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/cd CI/CD and workflows documentation Improvements or additions to documentation size/M Medium PR (< 200 lines) size/S Small PR (< 50 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants