fix(hitl): unify when-to-gate (shadow) + supersede keying across chat/voice by swaroopvarma1 · Pull Request #838 · juspay/clairvoyance

swaroopvarma1 · 2026-06-16T10:34:22Z

What

The two behavior-changing slices of the HITL "one policy, two transports" unification — both chat↔voice divergences resolved for correctness + consistency, with no new machinery. (The additive deny-on-terminal slice ships separately as the chat terminal-sweep PR.)

1. When-to-gate — shadow-gating

The approval config lives on a global function. In a node that defines a same-named per-node function, the LLM calls the per-node one (it shadows the global) — which the author never gated. Chat gated it anyway by flat name; voice didn't (its wrapper gates only globals). A documented divergence.

Fix: chat's partition is now node-aware (_partition_gated_calls) — a name that resolves to a per-node function in the current node stays ungated, matching voice. The gated global is still gated in every node that doesn't shadow it, so non-shadow nodes are byte-for-byte unchanged. build_approval_map's warning now describes the consistent behavior.

2. Supersede keying

Voice's ApprovalManager keyed supersede by function_name, so two distinct parallel calls to the same function (e.g. add_to_cart(A) + add_to_cart(B)) had the second silently drop the first.

Fix: key supersede by function_name + an args fingerprint. A true re-call (same fn, same args) still supersedes — so an async re-call can't double-execute a refund — but distinct parallel calls each keep their own card. (This is the safe middle between "keep function-name keying" = drops distinct calls, and "remove supersede" = allows double-execution.)

Why these choices

The HITL policy is already half-unified (gate_call + the status vocabulary are shared). For each forked decision I picked the option that's correct, fail-safe, and lowest-complexity:

Shadow: don't gate what the author didn't mark, and stay consistent across channels — without node-awareness in voice (out of scope) or un-gating a real global.
Supersede: the args fingerprint distinguishes a re-call from distinct parallel calls — the only signal that actually separates them — so neither failure mode (drop vs double-execute) occurs.

Verification

tests/test_chat_gate_partition.py — shadow / non-shadow / function_name alias / empty node / ungated.
tests/test_approval_manager.py — updated: identical re-call supersedes; distinct parallel args do NOT (the dropped-call fix).
pyrefly 0 errors; full suite 447 passed; field_reference coverage green.

…/voice Two HITL policy decisions that diverged between chat and voice — resolved for correctness + cross-channel consistency, no new machinery. 1. WHEN-TO-GATE (shadow-gating). A gated GLOBAL function whose name is shadowed by a same-named per-node function diverged: chat gated it by flat name (gating the per-node function the author never marked), voice did not (its wrapper gates only globals). Make chat's partition NODE-AWARE (_partition_gated_calls): a name that resolves to a per-node function in the current node stays UNGATED — the gated global is still gated in every node that doesn't shadow it. Chat now matches voice. The build_approval_map warning is updated to describe the new, consistent behavior. Non-shadow nodes are byte-for-byte unchanged. 2. SUPERSEDE keying. The voice ApprovalManager keyed supersede by function_name, so two DISTINCT parallel calls to the same function (e.g. two add_to_cart with different args) had the second silently drop the first. Key by function_name + an args fingerprint instead: a true re-call (same fn, same args) still supersedes — so an async re-call can't double-execute — but distinct parallel calls each keep their own card. Both are part of the HITL "one policy, two transports" unification; the deny-on-terminal slice (chat terminal sweep) ships in its own PR. Tests: tests/test_chat_gate_partition.py (shadow / non-shadow / alias / empty node); tests/test_approval_manager.py updated (identical re-call supersedes; distinct parallel args do NOT). pyrefly 0 errors; full suite 447 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-16T10:34:31Z

Warning

Review limit reached

@swaroopvarma1, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 58 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7d624b9d-866f-498b-b4e1-7ab91c2ec85a

📥 Commits

Reviewing files that changed from the base of the PR and between 58d3fbc and 7410624.

📒 Files selected for processing (5)

app/ai/voice/agents/breeze_buddy/agent/approval.py
app/ai/voice/agents/breeze_buddy/chat/agent.py
app/ai/voice/agents/breeze_buddy/template/approval.py
tests/test_approval_manager.py
tests/test_chat_gate_partition.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/hitl-gate-policy-unify

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tara-ag

Review Summary

Files reviewed: 5
New issues: 0

Changes Overview

This PR unifies two behavior-changing aspects of HITL "one policy, two transports":

Shadow-gating fix (_partition_gated_calls): Chat now correctly handles per-node function shadowing of gated globals, matching voice behavior. A per-node function with the same name as a gated global is now correctly left ungated in nodes that define it.
Supersede keying fix (_dedupe_key): Voice's ApprovalManager now keys supersede by function_name + args fingerprint instead of just function_name. This fixes the bug where two distinct parallel calls to the same function (e.g., add_to_cart(A) + add_to_cart(B)) would have the second silently drop the first.

Analysis

Security: ✅ No concerns

No SQL queries in changed code (in-memory dict operations only)
No new endpoints or auth changes
JSON serialization uses default=str fallback consistently with existing codebase patterns

Correctness: ✅ Well-implemented

_dedupe_key() uses stable JSON (sort_keys=True) with defensive fallback to function-name-only on serialization errors
_partition_gated_calls() correctly handles both name and function_name keys for per-node functions
Edge cases handled: empty/missing node functions, ungated calls, function name aliases

Testing: ✅ Comprehensive

test_chat_gate_partition.py: 5 test cases covering shadow/non-shadow/alias/empty/ungated scenarios
test_approval_manager.py: Updated existing test and added new test for distinct parallel args

Documentation: ✅ Clear

Docstrings explain the rationale for both changes
Warning message in build_approval_map updated to reflect unified behavior

Verification

pyrefly: 0 errors (per PR description)
Full suite: 447 passed (per PR description)
No migration files modified (compliant with append-only rule)

Decision: Approve - Clean, focused fix with good test coverage and clear documentation.

narsimhaReddyJuspay · 2026-06-16T15:01:43Z

+    Non-shadow nodes are unaffected — a gated global is still gated. This
+    keeps chat consistent with voice, whose wrapper gates only globals.
+    """
+    node_fn_names = {


🟥 [CRITICAL — functional] The node-aware shadow logic never fires in production — the chat half of this PR is a no-op

node_fn_names only collects functions where isinstance(f, dict). But at runtime the node's functions are FlowsFunctionSchema dataclass objects, not plain dicts — the same file confirms this: _dispatch_tool_call (lines 1137-1141) does isinstance(fn, FlowsFunctionSchema) + fn.name, and _tools_schema (lines 1197-1198) does fn.to_function_schema() if isinstance(fn, FlowsFunctionSchema) else fn. FlowsFunctionSchema has no .get(). So node_fn_names is always empty, c.function_name not in node_fn_names is always True, and partitioning collapses to the old function_name in approval_map behavior.

Net: the chat/voice divergence this PR sets out to fix still exists — a gated global that is shadowed by a same-named per-node function is STILL gated on chat while voice runs it ungated. (Reproduced: with a plain-dict node gated=[]; with a FlowsFunctionSchema node gated=['issue_refund'].)

Fix: match the idiom used by the sibling helpers:

node_fn_names = {fn.name for fn in (node.get("functions") or []) if isinstance(fn, FlowsFunctionSchema)}

(Drop the function_name alias fallback — the function_name→name rename happens earlier in the builder, before the schema exists.)

narsimhaReddyJuspay · 2026-06-16T15:01:44Z

+def test_partition_skips_gated_name_shadowed_by_per_node_function():
+    calls = [_call("issue_refund")]
+    approval_map = {"issue_refund": object()}
+    node = {"functions": [{"name": "issue_refund"}]}  # per-node shadows it


🟧 [MAJOR — functional/test] These tests pass while the production code is broken (they mask the bug above)

Every case builds node from plain dicts ({"functions": [{"name": "issue_refund"}]}), which satisfies the isinstance(f, dict) filter — so the shadow-exclusion path does run here. In production, nodes hold FlowsFunctionSchema objects, where that filter yields an empty set and the exclusion silently never applies. The suite is green (441 passed) precisely because it never exercises the production shape, so nothing catches the CRITICAL no-op on _partition_gated_calls.

Fix: add a case that constructs the node with a real FlowsFunctionSchema(name=..., description=..., properties={}, required=[], handler=...) (or just [FlowsFunctionSchema(name="issue_refund", ...)]); with the current code it will fail, and after the fix in _partition_gated_calls it will pass.

narsimhaReddyJuspay · 2026-06-16T15:01:46Z

PR #838 — `fix(hitl): unify when-to-gate (shadow) + supersede keying across chat/voice`

Verdict: request changes — 1 critical + 1 major. Gates green: black/isort/autoflake/pyrefly ✅, pytest 441 passed — note that the green suite actually confirms the masking problem below (the tests pass despite the production bug).

🟥 Critical / 🟧 Major (inline comments posted)

CRITICAL — _partition_gated_calls node-aware shadow logic is a no-op in production. It filters isinstance(f, dict), but node functions are FlowsFunctionSchema objects (the rest of the same file handles them as such), so node_fn_names is always empty and partitioning falls back to the old behavior. The chat/voice divergence the PR intends to fix still exists. (chat/agent.py:95)
MAJOR — the new tests build nodes from plain dicts, so they pass while the code is broken. Add a case with a real FlowsFunctionSchema node. (tests/test_chat_gate_partition.py)

✅ What's good (verified)

Voice supersede keying is correct. _dedupe_key + _by_request: distinct-args parallel calls each keep their own card; identical re-calls supersede; the finally cleanup preserves the newer owner's _by_request entry and only pops on matching approval_id — no leak, no cross-call collision (ApprovalManager is per-call). Tests (test_manager_supersede_on_identical_recall, test_manager_no_supersede_on_distinct_parallel_args) genuinely assert the new behavior.
Fallback to function_name-only keying on JSON-serialization failure is a safe degradation to the old coarser behavior.
"Shadow" here means a per-node function shadowing a gated global's name — there's no separate "evaluate-but-don't-enforce" flag, so no off-by-one enforcement risk.
Chat HITL is keyed by tool_call_id in DB rows and doesn't use ApprovalManager/dedupe, so the voice dedupe change can't affect chat's resolve/supersede.

Tara-ag approved these changes Jun 16, 2026

View reviewed changes

narsimhaReddyJuspay reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(hitl): unify when-to-gate (shadow) + supersede keying across chat/voice#838

fix(hitl): unify when-to-gate (shadow) + supersede keying across chat/voice#838
swaroopvarma1 wants to merge 1 commit into
releasefrom
fix/hitl-gate-policy-unify

swaroopvarma1 commented Jun 16, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026

Review limit reached

Uh oh!

Tara-ag left a comment

Uh oh!

narsimhaReddyJuspay Jun 16, 2026

Uh oh!

narsimhaReddyJuspay Jun 16, 2026

Uh oh!

narsimhaReddyJuspay commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

swaroopvarma1 commented Jun 16, 2026

What

1. When-to-gate — shadow-gating

2. Supersede keying

Why these choices

Verification

Uh oh!

coderabbitai Bot commented Jun 16, 2026

Review limit reached

Uh oh!

Tara-ag left a comment

Choose a reason for hiding this comment

Review Summary

Changes Overview

Analysis

Verification

Uh oh!

narsimhaReddyJuspay Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

narsimhaReddyJuspay Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

narsimhaReddyJuspay commented Jun 16, 2026

PR #838 — fix(hitl): unify when-to-gate (shadow) + supersede keying across chat/voice

🟥 Critical / 🟧 Major (inline comments posted)

✅ What's good (verified)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PR #838 — `fix(hitl): unify when-to-gate (shadow) + supersede keying across chat/voice`