feat(slo): add SLI persistence backends + CalibrationDeltaSLI (#645)#650
Conversation
🤖 AI Agent: breaking-change-detector — Summary🔍 API Compatibility ReportSummaryThis pull request introduces significant enhancements to the Findings
Migration GuideNo migration is required as the changes are backward-compatible. Existing code using Conclusion✅ No breaking changes detected. This pull request is safe to merge without impacting downstream users. |
🤖 AI Agent: contributor-guide — Welcome to the project! 🎉Welcome to the project! 🎉Hi there, and welcome to the microsoft/agent-governance-toolkit community! Thank you so much for taking the time to contribute to this project. We’re thrilled to have you here and appreciate your effort in improving the toolkit. Your contribution is a valuable addition to the project, and we’re excited to review your work. What you did well ✅
Suggestions for Improvement ✨While your PR is excellent, here are a few suggestions to align it even more closely with our project conventions and best practices:
Next Steps 🚀
If you have any questions or need help with anything, don’t hesitate to ask. We’re here to support you! 😊 Thank you again for your contribution — we’re looking forward to collaborating with you! 🚀 |
🤖 AI Agent: docs-sync-checker — Issues Found📝 Documentation Sync ReportIssues Found
Suggestions
Additional Notes
Action Items
Let me know if you need help drafting any of the suggested updates! |
🤖 AI Agent: test-generator — `packages/agent-sre/src/agent_sre/slo/__init__.py`🧪 Test Coverage Analysis
|
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review Summary
This pull request introduces a persistence mechanism for SLI measurements, enabling durable storage across agent restarts. It also adds a new CalibrationDeltaSLI class to track calibration drift, which is a valuable addition for monitoring agent reliability. The implementation is well-structured, with comprehensive tests covering both the persistence backends and the new SLI functionality. However, there are some areas that require attention, particularly around thread safety, security, and backward compatibility.
🔴 CRITICAL: Security Issues
-
SQLite Injection Risk:
- The
SQLiteMeasurementStoreuses raw SQL queries with parameterized inputs, which is generally safe. However, thedb_pathparameter is directly passed tosqlite3.connect()without validation. If this parameter is user-controlled, it could lead to arbitrary file creation or access. - Action: Validate
db_pathto ensure it is not maliciously crafted (e.g., disallow paths like/etc/passwdorfile:///URLs).
- The
-
Metadata Handling:
- Metadata is serialized as JSON and stored in SQLite. While this is efficient, it could be exploited if untrusted data is injected into the metadata. For example, malicious JSON could be crafted to exploit downstream consumers of this data.
- Action: Sanitize metadata before storage and validate it during retrieval.
-
Thread Safety:
- The
InMemoryMeasurementStoreis explicitly thread-unsafe. This could lead to race conditions if multiple threads attempt to record or query measurements simultaneously. - Action: Add locking mechanisms (e.g.,
threading.Lock) to ensure thread safety for the in-memory store.
- The
🟡 WARNING: Backward Compatibility
-
_measurementsAlias:- The backward-compatible alias
self._measurementsinSLIpoints toself._store._rowsfor theInMemoryMeasurementStore. This is a clever workaround but could lead to subtle bugs if external code relies on direct manipulation of_measurements. - Action: Deprecate direct access to
_measurementsand provide a clear migration path for external code.
- The backward-compatible alias
-
Public API Changes:
- The addition of the
storeparameter to all SLI subclasses is backward-compatible because it defaults toNone. However, this introduces a new optional argument that could affect downstream subclass implementations if they override__init__. - Action: Document this change prominently in the release notes.
- The addition of the
💡 Suggestions for Improvement
-
Performance Optimization:
- The
SQLiteMeasurementStoreopens a new connection for each operation unless the database is in-memory. This could be a performance bottleneck for high-frequency operations. - Suggestion: Implement connection pooling or reuse connections for file-based databases.
- The
-
Type Safety:
- The
_Rowclass uses__slots__for memory efficiency but does not enforce type safety. Consider usingPydanticmodels for_Rowto ensure strict validation of its attributes. - Suggestion: Replace
_Rowwith a Pydantic model.
- The
-
CalibrationDeltaSLI Documentation:
- The
CalibrationDeltaSLIclass is well-documented, but its compliance method could benefit from additional explanation about how the compliance fraction is calculated. - Suggestion: Expand the docstring for
compliance()to clarify its behavior.
- The
-
Test Coverage:
- While the tests are comprehensive, they do not cover concurrent access scenarios for the
SQLiteMeasurementStore. This could be important for multi-threaded or multi-process environments. - Suggestion: Add tests simulating concurrent writes and reads to the SQLite store.
- While the tests are comprehensive, they do not cover concurrent access scenarios for the
Final Assessment
This pull request is a significant improvement to the SLI framework, addressing a critical gap in persistence and introducing valuable new functionality. However, the identified security risks and thread safety concerns must be addressed before merging. Once resolved, this feature will be a robust addition to the toolkit.
Action Items
- 🔴 Address SQLite injection risk and metadata sanitization.
- 🔴 Add thread safety mechanisms to
InMemoryMeasurementStore. - 🟡 Document backward compatibility changes and deprecate
_measurements. - 💡 Optimize SQLite connection handling and improve type safety with Pydantic.
- 💡 Expand test coverage for concurrent access scenarios.
Let me know if you need further clarification or assistance!
🤖 AI Agent: security-scanner — Security Review of PR #645: `feat(slo): add SLI persistence backends + CalibrationDeltaSLI`Security Review of PR #645:
|
| Category | Rating | Action Required |
|---|---|---|
| Prompt Injection Defense Bypass | 🔵 LOW | None |
| Policy Engine Circumvention | 🟡 MEDIUM | Encrypt SQLite database; add integrity checks for stored measurements. |
| Trust Chain Weaknesses | 🟠 HIGH | Restrict database paths to user home or temp directories; remove CWD allowance. |
| Credential Exposure | 🔵 LOW | None |
| Sandbox Escape | 🔵 LOW | None |
| Deserialization Attacks | 🟠 HIGH | Validate JSON metadata using a schema validation library. |
| Race Conditions | 🟡 MEDIUM | Enable SQLite WAL mode; document multi-process limitations. |
| Supply Chain Risks | 🔵 LOW | None |
Recommended Actions
- Encrypt SQLite Database: Use
sqlcipheror similar to encrypt the database file at rest. - Integrity Checks: Add HMAC-based integrity checks for stored measurements.
- Restrict Database Paths: Remove the allowance for CWD in
_validate_db_pathand enforce stricter path validation. - Validate JSON Metadata: Use a schema validation library like
jsonschemato enforce constraints on metadata. - Enable SQLite WAL Mode: Improve concurrency and document limitations for multi-process environments.
By addressing these issues, the changes can be made more robust and secure for downstream users.
…eview feedback - Thread-safe InMemoryMeasurementStore (threading.Lock on all mutations) - db_path validation in SQLiteMeasurementStore (_validate_db_path rejects non-file URIs) - CalibrationDeltaSLI: full docstrings on __init__, record_prediction, current_value, compliance - CalibrationDeltaSLI.current_value() overrides base to return latest aggregate delta (not mean of history) - Tests moved to tests/unit/test_sli_persistence.py (25/25 pass, +3 concurrent-append + validation tests) - CHANGELOG.md [Unreleased] section added - All SLI subclasses accept optional store= kwarg and pass through to super()
775198c to
cbde66f
Compare
|
Thanks for the thorough bot review — addressed all RED and YELLOW items in cbde66f: 🔴 CRITICAL → Fixed:
🟡 WARNING → Fixed:
💡 Suggestions → Partially addressed:
Tests: 25/25 passing ( |
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Pull Request Review: feat(slo): add SLI persistence backends + CalibrationDeltaSLI (#645)
🔴 CRITICAL: Security Issues
-
SQLite Path Validation Weakness:
- The
_validate_db_path()function attempts to validate paths but does not sufficiently guard against malicious inputs. For example, paths likefile:///etc/passwdare allowed, which could lead to unauthorized access or overwriting sensitive files. - Action: Implement stricter validation to ensure paths are confined to safe directories (e.g., user home or application-specific directories). Consider using a whitelist approach for allowed directories.
- The
-
Potential SQL Injection in SQLite Queries:
- While parameterized queries are used (
conn.execute(_INSERT, ...)), thedb_pathitself is passed directly tosqlite3.connect()without sanitization. Ifdb_pathis manipulated, it could lead to unintended database connections or file manipulations. - Action: Sanitize
db_pathfurther to ensure it cannot be exploited.
- While parameterized queries are used (
-
Thread Safety in SQLiteMemoryStore:
- The
SQLiteMeasurementStoreuses a persistent connection for:memory:databases but does not explicitly guard against concurrent access. SQLite'scheck_same_thread=Falsedisables thread checks but does not ensure thread safety. - Action: Add explicit locking mechanisms (e.g.,
threading.Lock) for operations on theself._mem_connconnection.
- The
🟡 WARNING: Breaking Changes
- Backward Compatibility Alias (
_measurements):- The alias
self._measurementspoints toself._store._rowsonly forInMemoryMeasurementStore. This could lead to subtle bugs if external code relies on direct list manipulation for persistence. - Action: Deprecate direct manipulation of
_measurementsand provide a clear migration path for users.
- The alias
💡 Suggestions for Improvement
-
CalibrationDeltaSLI Compliance Logic:
- The compliance calculation (
good / len(values)) assumes all measurements are equally weighted. Consider adding support for weighted compliance based on metadata (e.g., session importance).
- The compliance calculation (
-
SQLite Connection Pooling:
- For file-based SQLite databases, opening a new connection per operation (
_connect()) may lead to performance bottlenecks under high load. Consider implementing connection pooling.
- For file-based SQLite databases, opening a new connection per operation (
-
Test Coverage for Edge Cases:
- Add tests for edge cases in
_validate_db_path():- Paths with unusual characters (e.g.,
file://../etc/passwd). - Extremely long paths.
- Invalid schemes (e.g.,
ftp://).
- Paths with unusual characters (e.g.,
- Add tests for edge cases in
-
Documentation Enhancements:
- Expand the docstring for
CalibrationDeltaSLIto include examples of real-world use cases and implications of calibration drift.
- Expand the docstring for
-
Type Safety:
- The
_Rowclass uses__slots__but does not enforce type safety on its attributes. Consider usingPydanticmodels for stricter validation.
- The
-
Performance Optimization:
- The
InMemoryMeasurementStore.query()method filters rows using a list comprehension. For large datasets, this could become a bottleneck. Consider using a more efficient data structure likecollections.dequeor a sorted list.
- The
Summary
- Security: 🔴 Address SQLite path validation and thread safety issues immediately.
- Backward Compatibility: 🟡 Ensure external code relying on
_measurementsis notified of potential changes. - Improvements: 💡 Optimize performance, enhance documentation, and expand test coverage.
This PR introduces valuable features like persistent SLI storage and calibration tracking but requires critical fixes to ensure security and robustness.
|
@microsoft-github-policy-service agree |
… lock Addresses second AI code-reviewer pass (10:09 UTC): CRITICAL fixes: - _validate_db_path: reject file:///etc/passwd and other file:// URIs that resolve outside safe directories (home, temp, cwd). Also reject remote file://hostname/path forms and paths > 4096 chars. - SQLiteMeasurementStore._mem_conn: add self._mem_lock (threading.Lock) that wraps every _connect() yield for :memory: databases, preventing concurrent-access races under check_same_thread=False. New tests (7 added, 31 total, 31/31 passing): - test_file_uri_etc_passwd_raises - test_file_uri_inside_home_accepted - test_remote_file_uri_raises - test_path_outside_safe_dirs_raises - test_excessively_long_path_raises - test_thread_safety_in_memory_store (8 threads x 50 appends)
|
Thanks for the thorough second pass. Both CRITICAL items from the 10:09 UTC review are addressed in 1. SQLite path validation hardened ( The previous check allowed
New tests: 2. Added New test: Test totals: 31/31 passing (was 25/25). On the non-CRITICAL suggestions:
|
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Pull Request Review: feat(slo): add SLI persistence backends + CalibrationDeltaSLI (#645)
This PR introduces persistence backends for SLI measurements and adds a new CalibrationDeltaSLI class to track calibration drift. Below is the detailed review:
🔴 CRITICAL: Security Issues
-
SQLite Path Validation:
- The
_validate_db_path()function attempts to restrict database paths to safe directories, but the logic is incomplete:- It uses
str(resolved).startswith(str(root))for path validation, which is vulnerable to directory traversal attacks (e.g.,/safe/../unsafe). - It does not account for symbolic links, which could allow paths outside the intended directories.
- It uses
- Fix: Use
Path.is_relative_to()(Python 3.9+) or compare resolved paths directly usingPath.samefile()to ensure the path is strictly confined to the allowed directories.
- The
-
SQLite Injection:
- The
SQLiteMeasurementStoreuses parameterized queries (?placeholders), which is good practice. However, themetadatafield is stored as JSON without validation. Malicious input could exploit JSON parsing vulnerabilities in downstream consumers. - Fix: Validate
metadatabefore storing it. For example, ensure it is a dictionary with expected key-value types.
- The
-
Thread Safety in SQLite:
- The
SQLiteMeasurementStoreopens a new connection per operation for file-based databases. While SQLite itself is thread-safe, concurrent writes could lead to race conditions or database locks. - Fix: Use a connection pool or a threading lock to serialize access to the database.
- The
🟡 WARNING: Potential Breaking Changes
- Backward Compatibility of
_measurements:- The
_measurementsattribute is now an alias pointing to the in-memory store's rows. While this preserves compatibility for direct list access, it could lead to unexpected behavior if external code mutates_measurementsdirectly (e.g.,sli._measurements.append(...)). - Fix: Document this change clearly in the release notes and deprecate direct access to
_measurementsin favor of the newstoreabstraction.
- The
💡 Suggestions for Improvement
-
CalibrationDeltaSLI Compliance Calculation:
- The
compliance()method calculates the fraction of measurements at or belowtarget_delta. This is useful but could be extended to include additional statistics, such as the mean and standard deviation of calibration deltas within the window. - Suggestion: Add optional parameters to
compliance()for returning richer statistics.
- The
-
Test Coverage:
- While the PR includes 23 tests for persistence and
CalibrationDeltaSLI, edge cases like invalidmetadata, concurrent access toSQLiteMeasurementStore, and large-scale performance tests are missing. - Suggestion: Add tests for these scenarios to ensure robustness.
- While the PR includes 23 tests for persistence and
-
Performance Optimization:
- The
InMemoryMeasurementStoreuses a list for storage, which could become inefficient for large datasets due to linear-time filtering inquery(). - Suggestion: Consider using a more efficient data structure, such as a sorted list or a dictionary indexed by name and timestamp.
- The
-
Documentation:
- The PR introduces significant changes to the SLI subsystem, but the documentation does not cover migration paths for existing users or best practices for choosing a persistence backend.
- Suggestion: Update the README and API docs with detailed examples and guidance.
-
Error Handling:
- The
SQLiteMeasurementStoredoes not handle database errors (e.g., disk full, permission issues) gracefully. This could lead to unhandled exceptions in production. - Suggestion: Add error handling for database operations and provide meaningful error messages.
- The
Summary
This PR significantly improves the SLI subsystem by adding persistence and a new calibration metric. However, it introduces critical security issues related to path validation and thread safety, which must be addressed before merging. Additionally, there are minor backward compatibility concerns and opportunities for improvement in performance, documentation, and test coverage.
Action Items:
- Fix
_validate_db_path()to prevent directory traversal and symbolic link issues. - Validate
metadatabefore storing it in SQLite. - Ensure thread-safe access to SQLite databases.
- Document
_measurementschanges and deprecate direct access. - Add tests for edge cases and performance scenarios.
- Improve error handling and documentation.
Once these issues are resolved, the PR will be ready for merge.
Fixes #645
SLI._measurementswas a plain Python list that silently reset on every agent restart, making 7d and 30d SLO windows useless in practice.What this adds
agent_sre/slo/persistence.py(new)MeasurementStoreInMemoryMeasurementStore_measurementslist behaviourSQLiteMeasurementStore:memory:keeps one connection alive; file-based opens per-operationagent_sre/slo/indicators.py(updated)SLI.__init__now accepts an optionalstore: MeasurementStorekwarg (default:InMemoryMeasurementStore)record()routes throughstore.append();values_in_window()throughstore.query()storethrough tosuper().__init__— no call-site changes neededCalibrationDeltaSLI(new): tracks|avg_predicted_confidence − actual_success_rate|per session. Surfaces systematic over- or under-confidence — typically the earliest cross-session drift signal. Maps to the PDRcalibration_deltaaxis (DOI: 10.5281/zenodo.19339987).tests/test_sli_persistence.py(new) — 23 tests, all passingInMemoryMeasurementStore: append, window filter, name isolation, clearSQLiteMeasurementStore: same suite + metadata JSON roundtrip + ascending order guaranteeCalibrationDeltaSLI: perfect calibration, overconfidence detection, compliance threshold, invalid confidence guard, emptycollect(), SQLite backend, registry registrationBackward compatibility
Zero breaking changes. Existing SLI subclasses work unmodified. The
storeparameter is optional everywhere; default behaviour is byte-for-byte identical to the current implementation.Usage