Skip to content

feat(slo): add SLI persistence backends + CalibrationDeltaSLI (#645)#650

Merged
imran-siddique merged 2 commits intomicrosoft:mainfrom
nanookclaw:fix/sli-persistence-645
Mar 31, 2026
Merged

feat(slo): add SLI persistence backends + CalibrationDeltaSLI (#645)#650
imran-siddique merged 2 commits intomicrosoft:mainfrom
nanookclaw:fix/sli-persistence-645

Conversation

@nanookclaw
Copy link
Copy Markdown
Contributor

Fixes #645

SLI._measurements was a plain Python list that silently reset on every agent restart, making 7d and 30d SLO windows useless in practice.

What this adds

agent_sre/slo/persistence.py (new)

Class Description
MeasurementStore Abstract base — pluggable backend contract
InMemoryMeasurementStore Default; identical to the original _measurements list behaviour
SQLiteMeasurementStore Durable; survives restarts; :memory: keeps one connection alive; file-based opens per-operation

agent_sre/slo/indicators.py (updated)

  • SLI.__init__ now accepts an optional store: MeasurementStore kwarg (default: InMemoryMeasurementStore)
  • record() routes through store.append(); values_in_window() through store.query()
  • All built-in subclasses pass store through to super().__init__ — no call-site changes needed
  • CalibrationDeltaSLI (new): tracks |avg_predicted_confidence − actual_success_rate| per session. Surfaces systematic over- or under-confidence — typically the earliest cross-session drift signal. Maps to the PDR calibration_delta axis (DOI: 10.5281/zenodo.19339987).

tests/test_sli_persistence.py (new) — 23 tests, all passing

  • InMemoryMeasurementStore: append, window filter, name isolation, clear
  • SQLiteMeasurementStore: same suite + metadata JSON roundtrip + ascending order guarantee
  • Integration: measurements survive SLI recreation (simulated restart across two store instances)
  • CalibrationDeltaSLI: perfect calibration, overconfidence detection, compliance threshold, invalid confidence guard, empty collect(), SQLite backend, registry registration

Backward compatibility

Zero breaking changes. Existing SLI subclasses work unmodified. The store parameter is optional everywhere; default behaviour is byte-for-byte identical to the current implementation.

Usage

from agent_sre.slo import TaskSuccessRate, SQLiteMeasurementStore

# Persistent — survives restarts
store = SQLiteMeasurementStore(db_path="~/.agent/sli.db")
sli = TaskSuccessRate(store=store)

# In-memory (existing behaviour, unchanged)
sli = TaskSuccessRate()  # works exactly as before

@github-actions github-actions bot added tests agent-sre agent-sre package size/XL Extra large PR (500+ lines) labels Mar 31, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 31, 2026

🤖 AI Agent: breaking-change-detector — Summary

🔍 API Compatibility Report

Summary

This pull request introduces significant enhancements to the agent-sre package, including new persistence backends for SLI measurements and a new CalibrationDeltaSLI class. The changes are additive and maintain backward compatibility, with no breaking changes identified.

Findings

Severity Package Change Impact
🔵 agent-sre Added MeasurementStore ABC and backends New API for SLI persistence; no impact on existing code.
🔵 agent-sre Added CalibrationDeltaSLI class New SLI type; does not affect existing functionality.
🔵 agent-sre Updated SLI.__init__ to accept store Optional parameter; default behavior unchanged.
🔵 agent-sre Updated built-in SLI subclasses to support store Backward-compatible; default behavior preserved.
🔵 agent-sre Added new exports in __init__.py New exports (MeasurementStore, SQLiteMeasurementStore, etc.); additive.

Migration Guide

No migration is required as the changes are backward-compatible. Existing code using SLI and its subclasses will continue to work without modification. Developers can optionally use the new persistence backends by passing a store parameter to SLI subclasses.

Conclusion

No breaking changes detected. This pull request is safe to merge without impacting downstream users.

@github-actions
Copy link
Copy Markdown

🤖 AI Agent: contributor-guide — Welcome to the project! 🎉

Welcome to the project! 🎉

Hi there, and welcome to the microsoft/agent-governance-toolkit community! Thank you so much for taking the time to contribute to this project. We’re thrilled to have you here and appreciate your effort in improving the toolkit. Your contribution is a valuable addition to the project, and we’re excited to review your work.


What you did well ✅

  1. Clear and Detailed PR Description: Your pull request description is thorough and well-structured. You’ve clearly outlined the problem, the solution, and the changes you’ve made. This makes it much easier for reviewers to understand your contribution.

  2. Backward Compatibility: It's fantastic that you ensured zero breaking changes. This is crucial for maintaining a smooth upgrade path for existing users.

  3. Comprehensive Testing: Including 23 tests that cover both unit and integration scenarios is excellent. The tests are well-organized and ensure that the new functionality is robust and reliable.

  4. Documentation: Your usage examples and explanation of the new CalibrationDeltaSLI are clear and informative. The reference to the DOI for further reading is a nice touch.

  5. Code Quality: The code is well-structured, and the use of abstract base classes for the persistence backends is a great design choice. The SQLite implementation is clean and thoughtfully designed, especially the handling of in-memory databases.


Suggestions for Improvement ✨

While your PR is excellent, here are a few suggestions to align it even more closely with our project conventions and best practices:

  1. Linting:
    We use ruff for linting with specific rules (E, F, W). Please ensure your code passes the linter. You can run ruff check . locally to verify. If you encounter any issues, you can use ruff --fix . to automatically fix many of them.

  2. Test Placement:
    In this project, tests are typically placed in the packages/{name}/tests/ directory. Since your tests are for the agent_sre package, they should be moved to packages/agent-sre/tests/. This ensures consistency across the repository.

  3. Commit Message Style:
    We follow the Conventional Commits specification. Your PR title is already in the correct format (feat(slo): ...), but please ensure all your individual commit messages follow this convention as well. For example:

    • feat(slo): add SLI persistence backends
    • test(slo): add tests for SQLiteMeasurementStore
  4. Security Considerations:
    Since this PR introduces a persistence layer, it’s important to consider security implications. For example:

    • Are there any risks of SQL injection in the SQLiteMeasurementStore implementation? While the use of parameterized queries (?) mitigates this risk, it’s worth double-checking.
    • Are there any sensitive data fields in the metadata that should be encrypted before storage? If so, we might need to add encryption support.
  5. Documentation Updates:
    While your PR description provides a great overview, it would be helpful to update the project documentation to include:

    • Details about the new CalibrationDeltaSLI class in the relevant section of the documentation.
    • Instructions for configuring and using the SQLiteMeasurementStore in the QUICKSTART.md file.

Next Steps 🚀

  1. Address Feedback: Please review the suggestions above and make any necessary changes to your PR.
  2. Run Linter: Ensure your code passes ruff linting (ruff check .).
  3. Update Documentation: Add relevant details about the new features to the documentation files (CONTRIBUTING.md and QUICKSTART.md).
  4. Push Changes: Once you’ve made the updates, push them to your branch. This will automatically update the pull request.
  5. Follow-Up Review: After you’ve addressed the feedback, we’ll review your PR again and provide further guidance if needed.

If you have any questions or need help with anything, don’t hesitate to ask. We’re here to support you! 😊

Thank you again for your contribution — we’re looking forward to collaborating with you! 🚀

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 31, 2026

🤖 AI Agent: docs-sync-checker — Issues Found

📝 Documentation Sync Report

Issues Found

  • CalibrationDeltaSLI in agent_sre/slo/indicators.py — missing docstring for compliance() method.
  • InMemoryMeasurementStore in agent_sre/slo/persistence.py — missing docstring for __init__() method.
  • SQLiteMeasurementStore in agent_sre/slo/persistence.py — missing docstring for __init__() method.
  • ⚠️ packages/agent-sre/README.md — no mention of the new CalibrationDeltaSLI or persistence backends (MeasurementStore, InMemoryMeasurementStore, SQLiteMeasurementStore).
  • ⚠️ CHANGELOG.md — entry is present but could be improved with more details about the new CalibrationDeltaSLI and its purpose.
  • ⚠️ examples/ — no examples added for CalibrationDeltaSLI or persistence backends.

Suggestions

  • 💡 Add a docstring for CalibrationDeltaSLI.compliance() explaining its purpose, parameters, and return value.
  • 💡 Add a docstring for InMemoryMeasurementStore.__init__() explaining its purpose and thread-safety.
  • 💡 Add a docstring for SQLiteMeasurementStore.__init__() explaining the db_path parameter and its validation rules.
  • 💡 Update packages/agent-sre/README.md to include:
    • A description of the new CalibrationDeltaSLI and its use case.
    • Instructions for using the new persistence backends (MeasurementStore, InMemoryMeasurementStore, SQLiteMeasurementStore).
  • 💡 Expand the CHANGELOG.md entry to include a brief explanation of the purpose and benefits of the new CalibrationDeltaSLI and persistence backends.
  • 💡 Add example code in examples/ demonstrating:
    • How to use CalibrationDeltaSLI in a real-world scenario.
    • How to configure and use the SQLiteMeasurementStore for persistent SLI storage.

Additional Notes

  • The type hints for the new public APIs are complete and well-defined.
  • The new CalibrationDeltaSLI class has a detailed class-level docstring, but individual methods like compliance() lack specific docstrings.
  • The PR includes comprehensive tests for the new functionality, which is excellent.

Action Items

  1. Add missing docstrings for the identified methods.
  2. Update the README.md to reflect the new behavior and features.
  3. Expand the CHANGELOG.md entry for better clarity.
  4. Add example usage for CalibrationDeltaSLI and persistence backends in the examples/ directory.

Let me know if you need help drafting any of the suggested updates!

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 31, 2026

🤖 AI Agent: test-generator — `packages/agent-sre/src/agent_sre/slo/__init__.py`

🧪 Test Coverage Analysis

packages/agent-sre/src/agent_sre/slo/__init__.py

  • Existing coverage: This file primarily serves as an entry point for importing modules and classes. Since it does not contain any logic beyond imports and __all__ definitions, it does not require direct testing. The functionality of the imported modules and classes is tested elsewhere.
  • Missing coverage: None.
  • 💡 Suggested test cases: None.

packages/agent-sre/src/agent_sre/slo/indicators.py

  • Existing coverage:

    • Core SLI class methods (record, values_in_window, current_value) are indirectly tested through the test cases for the SLI subclasses.
    • All existing SLI subclasses (TaskSuccessRate, ToolCallAccuracy, ResponseLatency, CostPerTask, PolicyCompliance, DelegationChainDepth, HallucinationRate) are covered by tests.
    • New CalibrationDeltaSLI class is explicitly tested in tests/test_sli_persistence.py with cases for perfect calibration, overconfidence detection, compliance threshold, invalid confidence guard, and empty collect().
  • Missing coverage:

    • Edge cases for CalibrationDeltaSLI:
      • Handling of predicted_confidence values outside the valid range [0, 1].
      • Behavior when record_prediction is called with actual_success values other than True or False.
      • Behavior when record_prediction is called with metadata containing unexpected data types or oversized payloads.
      • Handling of extreme values for predicted_confidence and actual_success (e.g., very high or low values, or rapid oscillations).
    • Concurrency scenarios for InMemoryMeasurementStore when used with SLI subclasses.
    • Behavior of values_in_window and current_value when the store is empty or contains only outdated measurements.
  • 💡 Suggested test cases:

    1. test_calibration_delta_sli_invalid_confidence — Verify that CalibrationDeltaSLI.record_prediction raises a ValueError when predicted_confidence is outside the range [0, 1].
    2. test_calibration_delta_sli_invalid_success — Verify that CalibrationDeltaSLI.record_prediction raises a TypeError or ValueError when actual_success is not a boolean.
    3. test_calibration_delta_sli_large_metadata — Test CalibrationDeltaSLI.record_prediction with a large metadata dictionary to ensure it is handled correctly.
    4. test_calibration_delta_sli_extreme_values — Test CalibrationDeltaSLI with extreme values for predicted_confidence (e.g., 0.0, 1.0) and verify the correctness of the computed calibration delta.
    5. test_in_memory_store_concurrency — Simulate concurrent append and query operations on InMemoryMeasurementStore to ensure thread safety.
    6. test_empty_store_values_in_window — Verify that values_in_window returns an empty list when the store is empty or contains only outdated measurements.
    7. test_empty_store_current_value — Verify that current_value returns None when the store is empty or contains only outdated measurements.

packages/agent-sre/src/agent_sre/slo/persistence.py

  • Existing coverage:

    • InMemoryMeasurementStore:
      • Append, query, and clear operations are tested in tests/test_sli_persistence.py.
      • Thread safety is tested.
    • SQLiteMeasurementStore:
      • Append, query, and clear operations are tested.
      • Metadata JSON roundtrip and ascending order guarantee are tested.
      • Integration tests ensure measurements persist across SLI recreation.
    • _validate_db_path:
      • Tests for valid paths, invalid paths, and edge cases are included.
  • Missing coverage:

    • Edge cases for SQLiteMeasurementStore:
      • Handling of database connection errors (e.g., invalid file paths, permission issues).
      • Behavior when the database file is corrupted or inaccessible.
      • Performance under high write/read concurrency.
    • Edge cases for _validate_db_path:
      • Behavior when the path is exactly at the maximum length limit.
      • Behavior when the path contains special characters or unusual encodings.
  • 💡 Suggested test cases:

    1. test_sqlite_store_connection_error — Simulate a database connection error (e.g., invalid file path or permission issue) and verify that the appropriate exception is raised.
    2. test_sqlite_store_corrupted_db — Simulate a corrupted database file and verify that the store handles it gracefully (e.g., raises an exception or attempts recovery).
    3. test_sqlite_store_concurrent_access — Simulate high concurrency with multiple threads performing simultaneous append and query operations on SQLiteMeasurementStore.
    4. test_validate_db_path_max_length — Verify that _validate_db_path correctly handles paths that are exactly at the maximum length limit.
    5. test_validate_db_path_special_characters — Test _validate_db_path with paths containing special characters or unusual encodings to ensure proper handling.

Summary

  • The new functionality introduced in this PR is well-covered by the existing tests in tests/test_sli_persistence.py.
  • However, there are some edge cases and concurrency scenarios that are not yet covered, particularly for CalibrationDeltaSLI and SQLiteMeasurementStore.
  • Implementing the suggested test cases will improve the robustness and reliability of the new features.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This pull request introduces a persistence mechanism for SLI measurements, enabling durable storage across agent restarts. It also adds a new CalibrationDeltaSLI class to track calibration drift, which is a valuable addition for monitoring agent reliability. The implementation is well-structured, with comprehensive tests covering both the persistence backends and the new SLI functionality. However, there are some areas that require attention, particularly around thread safety, security, and backward compatibility.


🔴 CRITICAL: Security Issues

  1. SQLite Injection Risk:

    • The SQLiteMeasurementStore uses raw SQL queries with parameterized inputs, which is generally safe. However, the db_path parameter is directly passed to sqlite3.connect() without validation. If this parameter is user-controlled, it could lead to arbitrary file creation or access.
    • Action: Validate db_path to ensure it is not maliciously crafted (e.g., disallow paths like /etc/passwd or file:/// URLs).
  2. Metadata Handling:

    • Metadata is serialized as JSON and stored in SQLite. While this is efficient, it could be exploited if untrusted data is injected into the metadata. For example, malicious JSON could be crafted to exploit downstream consumers of this data.
    • Action: Sanitize metadata before storage and validate it during retrieval.
  3. Thread Safety:

    • The InMemoryMeasurementStore is explicitly thread-unsafe. This could lead to race conditions if multiple threads attempt to record or query measurements simultaneously.
    • Action: Add locking mechanisms (e.g., threading.Lock) to ensure thread safety for the in-memory store.

🟡 WARNING: Backward Compatibility

  1. _measurements Alias:

    • The backward-compatible alias self._measurements in SLI points to self._store._rows for the InMemoryMeasurementStore. This is a clever workaround but could lead to subtle bugs if external code relies on direct manipulation of _measurements.
    • Action: Deprecate direct access to _measurements and provide a clear migration path for external code.
  2. Public API Changes:

    • The addition of the store parameter to all SLI subclasses is backward-compatible because it defaults to None. However, this introduces a new optional argument that could affect downstream subclass implementations if they override __init__.
    • Action: Document this change prominently in the release notes.

💡 Suggestions for Improvement

  1. Performance Optimization:

    • The SQLiteMeasurementStore opens a new connection for each operation unless the database is in-memory. This could be a performance bottleneck for high-frequency operations.
    • Suggestion: Implement connection pooling or reuse connections for file-based databases.
  2. Type Safety:

    • The _Row class uses __slots__ for memory efficiency but does not enforce type safety. Consider using Pydantic models for _Row to ensure strict validation of its attributes.
    • Suggestion: Replace _Row with a Pydantic model.
  3. CalibrationDeltaSLI Documentation:

    • The CalibrationDeltaSLI class is well-documented, but its compliance method could benefit from additional explanation about how the compliance fraction is calculated.
    • Suggestion: Expand the docstring for compliance() to clarify its behavior.
  4. Test Coverage:

    • While the tests are comprehensive, they do not cover concurrent access scenarios for the SQLiteMeasurementStore. This could be important for multi-threaded or multi-process environments.
    • Suggestion: Add tests simulating concurrent writes and reads to the SQLite store.

Final Assessment

This pull request is a significant improvement to the SLI framework, addressing a critical gap in persistence and introducing valuable new functionality. However, the identified security risks and thread safety concerns must be addressed before merging. Once resolved, this feature will be a robust addition to the toolkit.


Action Items

  1. 🔴 Address SQLite injection risk and metadata sanitization.
  2. 🔴 Add thread safety mechanisms to InMemoryMeasurementStore.
  3. 🟡 Document backward compatibility changes and deprecate _measurements.
  4. 💡 Optimize SQLite connection handling and improve type safety with Pydantic.
  5. 💡 Expand test coverage for concurrent access scenarios.

Let me know if you need further clarification or assistance!

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 31, 2026

🤖 AI Agent: security-scanner — Security Review of PR #645: `feat(slo): add SLI persistence backends + CalibrationDeltaSLI`

Security Review of PR #645: feat(slo): add SLI persistence backends + CalibrationDeltaSLI

This pull request introduces a persistence layer for SLI (Service Level Indicator) measurements and a new CalibrationDeltaSLI metric. While the changes are well-documented and include extensive test coverage, the critical nature of this library demands a thorough security review. Below are the findings:


1. Prompt Injection Defense Bypass

No direct prompt injection vectors were introduced in this PR. The changes focus on SLI persistence and calibration metrics, which do not directly interact with user-provided prompts or natural language inputs.

Rating: 🔵 LOW
Action: None required.


2. Policy Engine Circumvention

The changes do not directly impact the policy engine or its enforcement. However, the introduction of persistence backends could indirectly affect policy enforcement if the stored data is tampered with or corrupted.

Rating: 🟡 MEDIUM
Attack Vector: If an attacker gains access to the SQLite database file, they could modify SLI measurements to manipulate compliance metrics or bypass policy thresholds.
Recommendation:

  • Encrypt the SQLite database file at rest using a library like sqlcipher.
  • Add integrity checks (e.g., HMAC) for stored measurements to detect tampering.

3. Trust Chain Weaknesses

No trust chain mechanisms (e.g., SPIFFE/SVID validation, certificate pinning) are directly affected by this PR. However, the _validate_db_path function introduces a potential trust issue.

Rating: 🟠 HIGH
Attack Vector: The _validate_db_path function allows paths within the current working directory (CWD). If an attacker can manipulate the CWD (e.g., via a directory traversal vulnerability), they could redirect the database to an unintended location, potentially exposing sensitive data.
Recommendation:

  • Remove the allowance for paths within the CWD. Restrict database paths to user home or system temp directories only.
  • Consider adding an explicit allowlist for safe directories.

4. Credential Exposure

No credentials or secrets are exposed in the changes. The SQLite backend does not log sensitive data, and the in-memory store is ephemeral.

Rating: 🔵 LOW
Action: None required.


5. Sandbox Escape

The changes do not introduce any new subprocess calls, file system operations, or unsafe code execution paths that could lead to a sandbox escape.

Rating: 🔵 LOW
Action: None required.


6. Deserialization Attacks

The SQLite backend uses JSON to store metadata. While JSON is safer than formats like pickle, it still requires validation.

Rating: 🟠 HIGH
Attack Vector: If an attacker injects malicious JSON into the SQLite database, it could lead to unexpected behavior or crashes when the JSON is deserialized.
Recommendation:

  • Validate the structure and content of deserialized JSON objects.
  • Use a JSON schema validation library like jsonschema to enforce strict typing and constraints on metadata.

7. Race Conditions

The InMemoryMeasurementStore uses a threading.Lock to ensure thread safety, which mitigates race conditions in a single-process, multi-threaded environment. However, the SQLiteMeasurementStore opens a new connection per operation, which could lead to race conditions in multi-threaded or multi-process environments.

Rating: 🟡 MEDIUM
Attack Vector: Concurrent writes to the SQLite database could lead to data corruption or loss of measurements.
Recommendation:

  • Use SQLite's PRAGMA journal_mode=WAL to enable write-ahead logging, which improves concurrency.
  • Document that the SQLite backend is not suitable for multi-process use without additional synchronization.

8. Supply Chain Risks

The PR does not introduce new dependencies, mitigating the risk of dependency confusion or typosquatting attacks.

Rating: 🔵 LOW
Action: None required.


Summary of Findings

Category Rating Action Required
Prompt Injection Defense Bypass 🔵 LOW None
Policy Engine Circumvention 🟡 MEDIUM Encrypt SQLite database; add integrity checks for stored measurements.
Trust Chain Weaknesses 🟠 HIGH Restrict database paths to user home or temp directories; remove CWD allowance.
Credential Exposure 🔵 LOW None
Sandbox Escape 🔵 LOW None
Deserialization Attacks 🟠 HIGH Validate JSON metadata using a schema validation library.
Race Conditions 🟡 MEDIUM Enable SQLite WAL mode; document multi-process limitations.
Supply Chain Risks 🔵 LOW None

Recommended Actions

  1. Encrypt SQLite Database: Use sqlcipher or similar to encrypt the database file at rest.
  2. Integrity Checks: Add HMAC-based integrity checks for stored measurements.
  3. Restrict Database Paths: Remove the allowance for CWD in _validate_db_path and enforce stricter path validation.
  4. Validate JSON Metadata: Use a schema validation library like jsonschema to enforce constraints on metadata.
  5. Enable SQLite WAL Mode: Improve concurrency and document limitations for multi-process environments.

By addressing these issues, the changes can be made more robust and secure for downstream users.

…eview feedback

- Thread-safe InMemoryMeasurementStore (threading.Lock on all mutations)
- db_path validation in SQLiteMeasurementStore (_validate_db_path rejects non-file URIs)
- CalibrationDeltaSLI: full docstrings on __init__, record_prediction, current_value, compliance
- CalibrationDeltaSLI.current_value() overrides base to return latest aggregate delta (not mean of history)
- Tests moved to tests/unit/test_sli_persistence.py (25/25 pass, +3 concurrent-append + validation tests)
- CHANGELOG.md [Unreleased] section added
- All SLI subclasses accept optional store= kwarg and pass through to super()
@nanookclaw nanookclaw force-pushed the fix/sli-persistence-645 branch from 775198c to cbde66f Compare March 31, 2026 10:08
@nanookclaw
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough bot review — addressed all RED and YELLOW items in cbde66f:

🔴 CRITICAL → Fixed:

  • Thread safety: InMemoryMeasurementStore now uses threading.Lock on all mutations and reads. Test added: test_thread_safety_concurrent_appends (8 threads × 50 appends, no torn state).
  • db_path validation: Added _validate_db_path() which raises ValueError for non-file URI schemes (e.g. http://, ftp://). file:// and relative paths pass through normally. Tests: test_invalid_uri_scheme_raises, test_ftp_uri_scheme_raises.
  • Metadata handling: Metadata is parameterized via ? placeholders in all SQL queries — no concatenation. JSON serialization is stdlib json.dumps/loads (no eval). No sanitization layer added beyond parameterized queries, which is the correct defense for SQLite.

🟡 WARNING → Fixed:

  • _measurements alias: Backward-compat alias documented explicitly in the docstring. External code relying on direct list mutation still works with the default InMemoryMeasurementStore because the alias points into _store._rows.

💡 Suggestions → Partially addressed:

  • Test placement: Tests moved from top-level tests/ to tests/unit/test_sli_persistence.py per project convention.
  • CHANGELOG: [Unreleased] section added to packages/agent-sre/CHANGELOG.md.
  • CalibrationDeltaSLI docstrings: Full docstrings added to __init__, record_prediction, current_value, compliance, and collect.
  • CalibrationDeltaSLI.current_value(): Overrides base SLI to return the latest recorded aggregate delta (not the mean of history). The running aggregate is what converges — averaging it would give a misleadingly high number during a well-calibrated phase.
  • Connection pooling: Not addressed — for the file-based use case, per-operation connections are intentional for multi-process safety (SQLite WAL). For :memory: we already keep a persistent connection.
  • Pydantic for _Row: Not added — \_Row is an internal transport type using __slots__ for memory efficiency; Pydantic would add an external dependency to a zero-dep module.

Tests: 25/25 passing (python3 -m pytest packages/agent-sre/tests/unit/test_sli_persistence.py). All 22 existing indicator tests still pass.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 31, 2026
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Pull Request Review: feat(slo): add SLI persistence backends + CalibrationDeltaSLI (#645)


🔴 CRITICAL: Security Issues

  1. SQLite Path Validation Weakness:

    • The _validate_db_path() function attempts to validate paths but does not sufficiently guard against malicious inputs. For example, paths like file:///etc/passwd are allowed, which could lead to unauthorized access or overwriting sensitive files.
    • Action: Implement stricter validation to ensure paths are confined to safe directories (e.g., user home or application-specific directories). Consider using a whitelist approach for allowed directories.
  2. Potential SQL Injection in SQLite Queries:

    • While parameterized queries are used (conn.execute(_INSERT, ...)), the db_path itself is passed directly to sqlite3.connect() without sanitization. If db_path is manipulated, it could lead to unintended database connections or file manipulations.
    • Action: Sanitize db_path further to ensure it cannot be exploited.
  3. Thread Safety in SQLiteMemoryStore:

    • The SQLiteMeasurementStore uses a persistent connection for :memory: databases but does not explicitly guard against concurrent access. SQLite's check_same_thread=False disables thread checks but does not ensure thread safety.
    • Action: Add explicit locking mechanisms (e.g., threading.Lock) for operations on the self._mem_conn connection.

🟡 WARNING: Breaking Changes

  1. Backward Compatibility Alias (_measurements):
    • The alias self._measurements points to self._store._rows only for InMemoryMeasurementStore. This could lead to subtle bugs if external code relies on direct list manipulation for persistence.
    • Action: Deprecate direct manipulation of _measurements and provide a clear migration path for users.

💡 Suggestions for Improvement

  1. CalibrationDeltaSLI Compliance Logic:

    • The compliance calculation (good / len(values)) assumes all measurements are equally weighted. Consider adding support for weighted compliance based on metadata (e.g., session importance).
  2. SQLite Connection Pooling:

    • For file-based SQLite databases, opening a new connection per operation (_connect()) may lead to performance bottlenecks under high load. Consider implementing connection pooling.
  3. Test Coverage for Edge Cases:

    • Add tests for edge cases in _validate_db_path():
      • Paths with unusual characters (e.g., file://../etc/passwd).
      • Extremely long paths.
      • Invalid schemes (e.g., ftp://).
  4. Documentation Enhancements:

    • Expand the docstring for CalibrationDeltaSLI to include examples of real-world use cases and implications of calibration drift.
  5. Type Safety:

    • The _Row class uses __slots__ but does not enforce type safety on its attributes. Consider using Pydantic models for stricter validation.
  6. Performance Optimization:

    • The InMemoryMeasurementStore.query() method filters rows using a list comprehension. For large datasets, this could become a bottleneck. Consider using a more efficient data structure like collections.deque or a sorted list.

Summary

  • Security: 🔴 Address SQLite path validation and thread safety issues immediately.
  • Backward Compatibility: 🟡 Ensure external code relying on _measurements is notified of potential changes.
  • Improvements: 💡 Optimize performance, enhance documentation, and expand test coverage.

This PR introduces valuable features like persistent SLI storage and calibration tracking but requires critical fixes to ensure security and robustness.

@nanookclaw
Copy link
Copy Markdown
Contributor Author

@microsoft-github-policy-service agree

… lock

Addresses second AI code-reviewer pass (10:09 UTC):

CRITICAL fixes:
- _validate_db_path: reject file:///etc/passwd and other file:// URIs
  that resolve outside safe directories (home, temp, cwd). Also reject
  remote file://hostname/path forms and paths > 4096 chars.
- SQLiteMeasurementStore._mem_conn: add self._mem_lock (threading.Lock)
  that wraps every _connect() yield for :memory: databases, preventing
  concurrent-access races under check_same_thread=False.

New tests (7 added, 31 total, 31/31 passing):
- test_file_uri_etc_passwd_raises
- test_file_uri_inside_home_accepted
- test_remote_file_uri_raises
- test_path_outside_safe_dirs_raises
- test_excessively_long_path_raises
- test_thread_safety_in_memory_store (8 threads x 50 appends)
@nanookclaw
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough second pass. Both CRITICAL items from the 10:09 UTC review are addressed in 7351b2d:

1. SQLite path validation hardened (_validate_db_path)

The previous check allowed file:///etc/passwd because it stripped file:// before resolving. Now:

  • file:// URIs are parsed before Path.resolve(): the path component is extracted and validated independently
  • Remote file://hostname/path forms are rejected outright (not just stripped)
  • The resolved path is checked against an allowlist of safe root directories: Path.home(), tempfile.gettempdir(), and Path.cwd()
  • Paths exceeding 4 096 characters are rejected before any filesystem access

New tests: test_file_uri_etc_passwd_raises, test_file_uri_inside_home_accepted, test_remote_file_uri_raises, test_path_outside_safe_dirs_raises, test_excessively_long_path_raises

2. :memory: connection thread safety

Added self._mem_lock = threading.Lock() to SQLiteMeasurementStore.__init__. The _connect() context manager acquires this lock for every :memory: yield — reads, writes, and commits are now fully serialised on the shared connection.

New test: test_thread_safety_in_memory_store (8 threads × 50 appends, 400 rows verified, zero races).

Test totals: 31/31 passing (was 25/25).

On the non-CRITICAL suggestions:

  • Connection pooling: per-op file connections remain intentional for multi-process safety (no shared state across processes). Noted in docstring.
  • Pydantic for _Row: _Row is an internal transport type; keeping it stdlib-only avoids a hard dependency on Pydantic in the persistence layer. Happy to revisit if the project adopts Pydantic universally.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Pull Request Review: feat(slo): add SLI persistence backends + CalibrationDeltaSLI (#645)

This PR introduces persistence backends for SLI measurements and adds a new CalibrationDeltaSLI class to track calibration drift. Below is the detailed review:


🔴 CRITICAL: Security Issues

  1. SQLite Path Validation:

    • The _validate_db_path() function attempts to restrict database paths to safe directories, but the logic is incomplete:
      • It uses str(resolved).startswith(str(root)) for path validation, which is vulnerable to directory traversal attacks (e.g., /safe/../unsafe).
      • It does not account for symbolic links, which could allow paths outside the intended directories.
    • Fix: Use Path.is_relative_to() (Python 3.9+) or compare resolved paths directly using Path.samefile() to ensure the path is strictly confined to the allowed directories.
  2. SQLite Injection:

    • The SQLiteMeasurementStore uses parameterized queries (? placeholders), which is good practice. However, the metadata field is stored as JSON without validation. Malicious input could exploit JSON parsing vulnerabilities in downstream consumers.
    • Fix: Validate metadata before storing it. For example, ensure it is a dictionary with expected key-value types.
  3. Thread Safety in SQLite:

    • The SQLiteMeasurementStore opens a new connection per operation for file-based databases. While SQLite itself is thread-safe, concurrent writes could lead to race conditions or database locks.
    • Fix: Use a connection pool or a threading lock to serialize access to the database.

🟡 WARNING: Potential Breaking Changes

  1. Backward Compatibility of _measurements:
    • The _measurements attribute is now an alias pointing to the in-memory store's rows. While this preserves compatibility for direct list access, it could lead to unexpected behavior if external code mutates _measurements directly (e.g., sli._measurements.append(...)).
    • Fix: Document this change clearly in the release notes and deprecate direct access to _measurements in favor of the new store abstraction.

💡 Suggestions for Improvement

  1. CalibrationDeltaSLI Compliance Calculation:

    • The compliance() method calculates the fraction of measurements at or below target_delta. This is useful but could be extended to include additional statistics, such as the mean and standard deviation of calibration deltas within the window.
    • Suggestion: Add optional parameters to compliance() for returning richer statistics.
  2. Test Coverage:

    • While the PR includes 23 tests for persistence and CalibrationDeltaSLI, edge cases like invalid metadata, concurrent access to SQLiteMeasurementStore, and large-scale performance tests are missing.
    • Suggestion: Add tests for these scenarios to ensure robustness.
  3. Performance Optimization:

    • The InMemoryMeasurementStore uses a list for storage, which could become inefficient for large datasets due to linear-time filtering in query().
    • Suggestion: Consider using a more efficient data structure, such as a sorted list or a dictionary indexed by name and timestamp.
  4. Documentation:

    • The PR introduces significant changes to the SLI subsystem, but the documentation does not cover migration paths for existing users or best practices for choosing a persistence backend.
    • Suggestion: Update the README and API docs with detailed examples and guidance.
  5. Error Handling:

    • The SQLiteMeasurementStore does not handle database errors (e.g., disk full, permission issues) gracefully. This could lead to unhandled exceptions in production.
    • Suggestion: Add error handling for database operations and provide meaningful error messages.

Summary

This PR significantly improves the SLI subsystem by adding persistence and a new calibration metric. However, it introduces critical security issues related to path validation and thread safety, which must be addressed before merging. Additionally, there are minor backward compatibility concerns and opportunities for improvement in performance, documentation, and test coverage.

Action Items:

  1. Fix _validate_db_path() to prevent directory traversal and symbolic link issues.
  2. Validate metadata before storing it in SQLite.
  3. Ensure thread-safe access to SQLite databases.
  4. Document _measurements changes and deprecate direct access.
  5. Add tests for edge cases and performance scenarios.
  6. Improve error handling and documentation.

Once these issues are resolved, the PR will be ready for merge.

@imran-siddique imran-siddique merged commit 04b9f4d into microsoft:main Mar 31, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-sre agent-sre package documentation Improvements or additions to documentation size/XL Extra large PR (500+ lines) tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(slo): SLI measurements are in-memory only — agent restarts silently erase 30d windows; add SLI persistence + CalibrationDeltaSLI

2 participants