feat+test: add drift detection in record_decision and 57 unit tests for FrameTracer/DivergenceDetector by acailic · Pull Request #209 · acailic/agent_debugger

acailic · 2026-06-06T20:53:44Z

Summary

Closes test: add unit tests for FrameTracer and DivergenceDetector (zero coverage) #208
SDK behavior changes (prerequisite for tests):
- Makes evidence keyword-only in record_decision (after *) — callers can now omit it without positional issues
- Adds drift detection inside record_decision: advances past non-decision events in baseline before comparing, uses clamped confidence, and includes both chosen_action and action aliases in drift event dict
- Initializes _drift_events and _drift_compare_index on restore() in TraceContext
Tests: adds tests/test_frame_tracer_divergence.py with 57 unit tests covering two previously untested modules: frame_tracer.py and divergence_detector.py
Every other research SDK module already had coverage; these two were the only gap

Note on scope

The SDK behavior changes are prerequisites for the tests to exercise drift detection correctly. The drift detection changes were also addressed in PR #207; this PR builds on that baseline.

Makes `evidence` an optional keyword argument (default `None`, treated as `[]`) in `RecordingMixin.record_decision`. All existing callers already pass evidence explicitly so this is non-breaking. Also adds lightweight drift-event collection to `record_decision` and wires `_drift_events`/`_drift_compare_index` onto `TraceContext.restore` so the previously-skipped drift-emission test now passes. Closes #205 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ison fixes - Add `*` after `chosen_action` in `record_decision` to make `evidence` and remaining params keyword-only, preventing accidental positional use and protecting existing positional callers - Use clamped `event.confidence` instead of raw `confidence` in drift event_dict to match what is actually persisted - Add `action` alias alongside `chosen_action` in drift event_dict so baselines using either key are matched - Advance `_drift_compare_index` to the next decision event in the baseline (skipping non-decision events) to prevent index misalignment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Covers agent_debugger_sdk/core/frame_tracer.py and divergence_detector.py which previously had zero test coverage despite being 600+ line modules. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Adds missing unit test coverage for two previously untested SDK research modules (FrameTracer + DivergenceDetector), and adjusts drift-tracking plumbing used during checkpoint restore/replay.

Changes:

Add tests/test_frame_tracer_divergence.py with 57 unit tests covering frame_tracer.py and divergence_detector.py.
Update replay-depth integration test to align with current restore post-checkpoint filtering / traces API response shape.
Extend drift tracking state on restore and add drift detection/collection during record_decision.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
tests/test_replay_depth_l3.py	Updates drift replay test to use timestamp filtering + `/traces` response shape and asserts drift is collected on context.
tests/test_frame_tracer_divergence.py	New test suite providing coverage for FrameTracer + DivergenceDetector public helpers and serialization.
agent_debugger_sdk/core/recorders.py	Makes `evidence` optional, adds drift detection during decision recording, and collects drift events.
agent_debugger_sdk/core/context/trace_context.py	Initializes `_drift_events` and `_drift_compare_index` on restore to support drift tracking.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

acailic · 2026-06-06T22:19:32Z

    async def record_decision(
        self,
        reasoning: str,
        confidence: float,
-        evidence: list[dict[str, Any]],
        chosen_action: str,
+        *,
+        evidence: list[dict[str, Any]] | None = None,


The * placement after chosen_action is intentional — it makes evidence and all subsequent params keyword-only while keeping reasoning, confidence, and chosen_action as positional. This matches the fix applied in PR #207 (commit f380508) and is the goal of this change: callers can now omit evidence without reordering. Callers using the old signature (reasoning, confidence, evidence_list, chosen_action) would break, but that signature was already changed in PR #207; this PR carries the same convention.

acailic · 2026-06-06T22:19:28Z

+            drift_index = getattr(self, "_drift_compare_index", 0)
+            event_dict = {
+                "event_type": "decision",
+                "data": {
+                    "chosen_action": chosen_action,
+                    "action": chosen_action,
+                    "confidence": event.confidence,
+                },
+            }
+            drift = drift_detector.compare(event_dict, drift_index)
+            # Advance to the next decision event in the baseline, skipping non-decision events
+            next_index = drift_index + 1
+            original_events = getattr(drift_detector, "original_events", [])
+            while next_index < len(original_events) and original_events[next_index].get("event_type") != "decision":
+                next_index += 1
+            self._drift_compare_index = next_index


Fixed in acdacd5. Before calling compare(), we now advance drift_index forward past any non-decision events in original_events. This ensures the comparison always targets an actual decision event position, preventing silent missed drift when non-decision events appear before the first (or any subsequent) decision in the baseline.

acailic · 2026-06-06T22:19:35Z

+        # Detect drift against the original execution if a detector is active
+        drift_detector = getattr(self, "_drift_detector", None)
+        if drift_detector is not None:
+            drift_index = getattr(self, "_drift_compare_index", 0)
+            event_dict = {


Acknowledged. The SDK behavior changes (evidence keyword-only + drift detection in record_decision) are prerequisites for the tests to exercise correctly — the tests validate this runtime behavior. The PR description has been noted; if desired, the title can be updated to reflect the dual scope (runtime fix + test coverage).

PR title and description updated to reflect the dual scope: the SDK behavior changes (keyword-only evidence, drift detection in record_decision, drift state init on restore()) are now called out explicitly alongside the unit tests.

…ift compare Previously drift_compare_index could point at a non-decision event at the start of the baseline (e.g. after restore), causing compare() to silently miss the first decision's drift. Now advance past non-decision events before comparing, then advance to the next decision for the subsequent call. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

acailic and others added 3 commits June 5, 2026 14:48

test: add 57 unit tests for FrameTracer and DivergenceDetector (#208)

8020e78

Covers agent_debugger_sdk/core/frame_tracer.py and divergence_detector.py which previously had zero test coverage despite being 600+ line modules. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings June 6, 2026 20:53

Copilot started reviewing on behalf of acailic June 6, 2026 20:53 View session

Copilot AI reviewed Jun 6, 2026

View reviewed changes

acailic changed the title ~~test: add 57 unit tests for FrameTracer and DivergenceDetector~~ feat+test: add drift detection in record_decision and 57 unit tests for FrameTracer/DivergenceDetector Jun 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat+test: add drift detection in record_decision and 57 unit tests for FrameTracer/DivergenceDetector#209

feat+test: add drift detection in record_decision and 57 unit tests for FrameTracer/DivergenceDetector#209
acailic wants to merge 4 commits into
mainfrom
feat/issue-208-frame-tracer-divergence-tests

acailic commented Jun 6, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

acailic Jun 6, 2026

Uh oh!

acailic Jun 6, 2026

Uh oh!

acailic Jun 6, 2026

Uh oh!

acailic Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

acailic commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Note on scope

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

acailic Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

acailic Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

acailic Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

acailic Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

acailic commented Jun 6, 2026 •

edited

Loading