fix(rust-sdk): replace unwrap() with poison-resilient lock handling#567
fix(rust-sdk): replace unwrap() with poison-resilient lock handling#567imran-siddique wants to merge 7 commits intomicrosoft:mainfrom
Conversation
- mcp-proxy: shebang must be line 1 (TS18026) - copilot, mcp-server: typescript ^6.0.2 → ^5.7.0 (eslint <6.0.0) - NuGet: replace ESRP Sign+Release with NuGetCommand@2 push via NuGet.org service connection Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…crosoft#557, microsoft#558) Upstream bug fixes from AzureClaw vendored agentmesh-sdk: TypeScript SDK (identity.ts): - Add stripKeyPrefix() — strips ed25519:/x25519: prefixes before base64 decoding (fixes key decode failures with typed key formats) - Add safeBase64Decode() — convenience wrapper for safe prefix-aware decoding - Update fromJSON() to use safeBase64Decode() so serialized keys with type prefixes round-trip correctly - Export both functions from index.ts Proto (registration.proto): - Add GovernanceService with EvaluatePolicy, RecordAudit, GetTrustScore RPCs for language-agnostic governance integration - Add PolicyRequest, PolicyDecision, AuditEntry, AuditAck, TrustQuery, TrustScoreResult message types Tests: - 12 new tests in key-prefix.test.ts covering prefix stripping, safe decode, and fromJSON round-trip with prefixed keys Closes microsoft#557, Closes microsoft#558 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ration Implements Phase 1 (P0) and Phase 2 (P1) of issue microsoft#556: Phase 1 — Core governance (P0): - policy.rs: YAML-based policy evaluation with 4-way decisions (allow/deny/requires-approval/rate-limit), wildcard patterns, condition matching, and scoped capability rules - trust.rs: Trust scoring (0-1000, 5 tiers) with configurable reward/penalty, threshold checks, and optional JSON persistence - audit.rs: SHA-256 hash-chain audit logging with tamper detection, integrity verification, and filtered queries - types.rs: Shared types (PolicyDecision, TrustTier, TrustScore, AuditEntry, GovernanceResult) Phase 2 — Identity (P1): - identity.rs: Ed25519 agent identity via ed25519-dalek with DID generation, sign/verify, and JSON serialization Unified client (lib.rs): - AgentMeshClient combining all modules with execute_with_governance pipeline (policy → audit → trust update) Includes 30 unit tests + 1 doc-test, all passing. Follows Go SDK patterns and aligns with AzureClaw governance.rs. Closes microsoft#556 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace all non-test unwrap() calls on Mutex/RwLock with unwrap_or_else(|e| e.into_inner()) to prevent cascading panics if a thread panics while holding a lock. Replace try_into().unwrap() in identity.rs verify() methods with let-else patterns that return false on conversion failure instead of panicking. Changes across 4 files (22 lines): - audit.rs: 4 Mutex::lock().unwrap() → unwrap_or_else - policy.rs: 3 RwLock + 1 Mutex unwrap() → unwrap_or_else - trust.rs: 7 RwLock unwrap() → unwrap_or_else - identity.rs: 3 try_into().unwrap() → let-else pattern All 31 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🤖 AI Agent: security-scanner — Security Review of PR: `fix(rust-sdk): replace unwrap() with poison-resilient lock handling`Security Review of PR:
|
| Finding | Severity | Recommendation |
|---|---|---|
| Potential Race Condition in Governance Policy Evaluation | 🔴 CRITICAL | Implement atomicity in policy evaluation and enforcement. |
Trust Chain Weakness in VerifyPeerTrust RPC |
🟠 HIGH | Add robust identity validation (e.g., SPIFFE/SVID, certificate pinning). |
| Credential Exposure in Audit Logging | 🟠 HIGH | Validate and sanitize metadata field; encrypt sensitive data in audit logs. |
Deserialization Attack Risk in serde_yaml |
🟠 HIGH | Replace serde_yaml with a maintained alternative or update to the latest version. |
| Potential Supply Chain Risk in Dependencies | 🟡 MEDIUM | Audit and update dependencies; use tools to detect vulnerabilities and typosquatting. |
General Observations
- The replacement of
unwrap()withunwrap_or_elseis a positive change that improves the resilience of the Rust SDK against panics caused by poisoned locks. This is a good step toward improving the overall robustness of the system. - The addition of governance-related gRPC messages and services is a significant enhancement, but it introduces new security risks that must be addressed before deployment.
- The
Cargo.lockfile includes a large number of dependencies, which increases the attack surface. Regular dependency audits are essential for maintaining the security of the project.
Action Items
- Address the race condition in
EvaluatePolicyby implementing atomic policy evaluation and enforcement. - Strengthen the trust validation mechanism in
VerifyPeerTrustto prevent impersonation attacks. - Sanitize and validate the
metadatafield inAuditEntryto prevent credential exposure. - Replace or update the
serde_yamldependency to mitigate deserialization attack risks. - Perform a full dependency audit and update outdated or deprecated libraries.
Let me know if you need further assistance or clarification!
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review Summary
This pull request addresses a critical issue by replacing unsafe unwrap() calls with safer alternatives in the Rust SDK. The changes aim to improve the resilience of the codebase against runtime panics caused by poisoned locks or failed conversions. Additionally, the PR introduces new governance-related gRPC services and messages in the registration.proto file. While the changes are generally positive, there are a few areas that require further attention.
🔴 CRITICAL
-
Tamper-Evident Audit Log Implementation:
- The
AuditEntrymessage in theregistration.protofile includes fields forhashandprevious_hashto create a tamper-evident chain. However, there is no information about how these hashes are generated, verified, or stored. Without a clear implementation, this feature might be vulnerable to tampering or replay attacks. - Actionable Recommendation: Provide documentation or implementation details about how the hash chain is constructed and verified. Ensure that cryptographic best practices are followed, such as using a secure hash function (e.g., SHA-256) and salting to prevent hash collisions.
- The
-
Concurrency Safety in Rust SDK:
- While replacing
unwrap()withunwrap_or_elseimproves panic safety, it is unclear if the fallback logic inunwrap_or_elseproperly handles poisoned locks. If the fallback logic simply retries the lock acquisition without addressing the underlying issue, it may lead to undefined behavior or deadlocks. - Actionable Recommendation: Audit the fallback logic in
unwrap_or_elseto ensure it handles poisoned locks appropriately. Consider usingstd::sync::PoisonErrorto recover from poisoned locks safely.
- While replacing
-
Trust Score Calculation:
- The
TrustScoreResultmessage includes anoverall_scorefield (0-1000) and atierfield (e.g., "Untrusted", "Trusted"). However, there is no information about how these scores are calculated or whether they are tamper-proof. - Actionable Recommendation: Document the trust score calculation methodology and ensure that it is resistant to manipulation. If cryptographic techniques are used, ensure they are implemented securely.
- The
🟡 WARNING
- Breaking Changes in gRPC API:
- The addition of the
GovernanceServiceand its associated messages inregistration.protointroduces new RPCs. While this does not break existing functionality, it is a significant change to the public API. - Actionable Recommendation: Clearly communicate these changes in the release notes and ensure that downstream consumers of the gRPC API are aware of the new services.
- The addition of the
💡 SUGGESTIONS
-
Test Coverage:
- The PR mentions that all 31 tests pass, but it is unclear if new tests were added to cover the changes in lock handling and the new governance-related gRPC services.
- Actionable Recommendation: Add unit tests to specifically validate the new
unwrap_or_elselogic for handling poisoned locks. Additionally, create integration tests for the new governance-related gRPC services.
-
Error Handling in gRPC Services:
- The
GovernanceServiceRPCs (e.g.,EvaluatePolicy,RecordAudit,GetTrustScore) do not specify how errors are communicated to clients. For example, what happens if the policy engine fails to evaluate a policy or if the audit log cannot be written? - Actionable Recommendation: Define error codes or messages in the gRPC API to handle failure scenarios gracefully. Consider using gRPC status codes (e.g.,
INVALID_ARGUMENT,INTERNAL, etc.) to standardize error handling.
- The
-
Documentation:
- The new governance-related gRPC services and their associated messages lack detailed documentation. For example, the
contextfield inPolicyRequestis described as "Additional context for policy evaluation," but it is unclear what kind of data is expected. - Actionable Recommendation: Provide detailed documentation for each field in the new gRPC messages and services. Include examples of typical usage scenarios to help developers understand how to use the API.
- The new governance-related gRPC services and their associated messages lack detailed documentation. For example, the
-
Backward Compatibility for Rust SDK:
- The PR states that there are no behavioral changes, but it is important to ensure that the changes to lock handling do not introduce subtle differences in behavior, especially in edge cases.
- Actionable Recommendation: Perform a thorough review of the Rust SDK's public API to confirm that the changes do not unintentionally alter its behavior. Consider adding regression tests for edge cases.
-
Code Style and Consistency:
- While the PR replaces
unwrap()withunwrap_or_else, it is important to ensure that the fallback logic is consistent across the codebase. - Actionable Recommendation: Use a centralized utility function for handling poisoned locks and failed conversions. This will ensure consistency and make future audits easier.
- While the PR replaces
Final Assessment
The PR addresses a critical issue by improving panic safety in the Rust SDK and introduces useful governance-related gRPC services. However, there are critical concerns regarding the implementation of the tamper-evident audit log and concurrency safety in the Rust SDK. Additionally, the changes to the gRPC API should be clearly communicated to downstream consumers to avoid potential integration issues.
- Approval Status: Changes are not approved until the critical issues are addressed.
- Follow-Up: Address the critical issues, add tests for the new functionality, and provide detailed documentation for the gRPC API changes.
Summary
Addresses CI review findings from PR #563 -- eliminates all non-test unwrap() calls in the Rust SDK to prevent cascading panics from poisoned locks.
Fix
All 31 tests pass. No behavioral changes -- only panic safety.
Depends on #563