fix(hitl): unify when-to-gate (shadow) + supersede keying across chat/voice#838
fix(hitl): unify when-to-gate (shadow) + supersede keying across chat/voice#838swaroopvarma1 wants to merge 1 commit into
Conversation
…/voice Two HITL policy decisions that diverged between chat and voice — resolved for correctness + cross-channel consistency, no new machinery. 1. WHEN-TO-GATE (shadow-gating). A gated GLOBAL function whose name is shadowed by a same-named per-node function diverged: chat gated it by flat name (gating the per-node function the author never marked), voice did not (its wrapper gates only globals). Make chat's partition NODE-AWARE (_partition_gated_calls): a name that resolves to a per-node function in the current node stays UNGATED — the gated global is still gated in every node that doesn't shadow it. Chat now matches voice. The build_approval_map warning is updated to describe the new, consistent behavior. Non-shadow nodes are byte-for-byte unchanged. 2. SUPERSEDE keying. The voice ApprovalManager keyed supersede by function_name, so two DISTINCT parallel calls to the same function (e.g. two add_to_cart with different args) had the second silently drop the first. Key by function_name + an args fingerprint instead: a true re-call (same fn, same args) still supersedes — so an async re-call can't double-execute — but distinct parallel calls each keep their own card. Both are part of the HITL "one policy, two transports" unification; the deny-on-terminal slice (chat terminal sweep) ships in its own PR. Tests: tests/test_chat_gate_partition.py (shadow / non-shadow / alias / empty node); tests/test_approval_manager.py updated (identical re-call supersedes; distinct parallel args do NOT). pyrefly 0 errors; full suite 447 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 58 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more credits in the billing tab to continue. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Tara-ag
left a comment
There was a problem hiding this comment.
Review Summary
Files reviewed: 5
New issues: 0
Changes Overview
This PR unifies two behavior-changing aspects of HITL "one policy, two transports":
-
Shadow-gating fix (
_partition_gated_calls): Chat now correctly handles per-node function shadowing of gated globals, matching voice behavior. A per-node function with the same name as a gated global is now correctly left ungated in nodes that define it. -
Supersede keying fix (
_dedupe_key): Voice'sApprovalManagernow keys supersede byfunction_name + args fingerprintinstead of justfunction_name. This fixes the bug where two distinct parallel calls to the same function (e.g.,add_to_cart(A)+add_to_cart(B)) would have the second silently drop the first.
Analysis
Security: ✅ No concerns
- No SQL queries in changed code (in-memory dict operations only)
- No new endpoints or auth changes
- JSON serialization uses
default=strfallback consistently with existing codebase patterns
Correctness: ✅ Well-implemented
_dedupe_key()uses stable JSON (sort_keys=True) with defensive fallback to function-name-only on serialization errors_partition_gated_calls()correctly handles bothnameandfunction_namekeys for per-node functions- Edge cases handled: empty/missing node functions, ungated calls, function name aliases
Testing: ✅ Comprehensive
test_chat_gate_partition.py: 5 test cases covering shadow/non-shadow/alias/empty/ungated scenariostest_approval_manager.py: Updated existing test and added new test for distinct parallel args
Documentation: ✅ Clear
- Docstrings explain the rationale for both changes
- Warning message in
build_approval_mapupdated to reflect unified behavior
Verification
- pyrefly: 0 errors (per PR description)
- Full suite: 447 passed (per PR description)
- No migration files modified (compliant with append-only rule)
Decision: Approve - Clean, focused fix with good test coverage and clear documentation.
| Non-shadow nodes are unaffected — a gated global is still gated. This | ||
| keeps chat consistent with voice, whose wrapper gates only globals. | ||
| """ | ||
| node_fn_names = { |
There was a problem hiding this comment.
🟥 [CRITICAL — functional] The node-aware shadow logic never fires in production — the chat half of this PR is a no-op
node_fn_names only collects functions where isinstance(f, dict). But at runtime the node's functions are FlowsFunctionSchema dataclass objects, not plain dicts — the same file confirms this: _dispatch_tool_call (lines 1137-1141) does isinstance(fn, FlowsFunctionSchema) + fn.name, and _tools_schema (lines 1197-1198) does fn.to_function_schema() if isinstance(fn, FlowsFunctionSchema) else fn. FlowsFunctionSchema has no .get(). So node_fn_names is always empty, c.function_name not in node_fn_names is always True, and partitioning collapses to the old function_name in approval_map behavior.
Net: the chat/voice divergence this PR sets out to fix still exists — a gated global that is shadowed by a same-named per-node function is STILL gated on chat while voice runs it ungated. (Reproduced: with a plain-dict node gated=[]; with a FlowsFunctionSchema node gated=['issue_refund'].)
Fix: match the idiom used by the sibling helpers:
node_fn_names = {fn.name for fn in (node.get("functions") or []) if isinstance(fn, FlowsFunctionSchema)}(Drop the function_name alias fallback — the function_name→name rename happens earlier in the builder, before the schema exists.)
| def test_partition_skips_gated_name_shadowed_by_per_node_function(): | ||
| calls = [_call("issue_refund")] | ||
| approval_map = {"issue_refund": object()} | ||
| node = {"functions": [{"name": "issue_refund"}]} # per-node shadows it |
There was a problem hiding this comment.
🟧 [MAJOR — functional/test] These tests pass while the production code is broken (they mask the bug above)
Every case builds node from plain dicts ({"functions": [{"name": "issue_refund"}]}), which satisfies the isinstance(f, dict) filter — so the shadow-exclusion path does run here. In production, nodes hold FlowsFunctionSchema objects, where that filter yields an empty set and the exclusion silently never applies. The suite is green (441 passed) precisely because it never exercises the production shape, so nothing catches the CRITICAL no-op on _partition_gated_calls.
Fix: add a case that constructs the node with a real FlowsFunctionSchema(name=..., description=..., properties={}, required=[], handler=...) (or just [FlowsFunctionSchema(name="issue_refund", ...)]); with the current code it will fail, and after the fix in _partition_gated_calls it will pass.
PR #838 —
|
What
The two behavior-changing slices of the HITL "one policy, two transports" unification — both chat↔voice divergences resolved for correctness + consistency, with no new machinery. (The additive deny-on-terminal slice ships separately as the chat terminal-sweep PR.)
1. When-to-gate — shadow-gating
The
approvalconfig lives on a global function. In a node that defines a same-named per-node function, the LLM calls the per-node one (it shadows the global) — which the author never gated. Chat gated it anyway by flat name; voice didn't (its wrapper gates only globals). A documented divergence.Fix: chat's partition is now node-aware (
_partition_gated_calls) — a name that resolves to a per-node function in the current node stays ungated, matching voice. The gated global is still gated in every node that doesn't shadow it, so non-shadow nodes are byte-for-byte unchanged.build_approval_map's warning now describes the consistent behavior.2. Supersede keying
Voice's
ApprovalManagerkeyed supersede byfunction_name, so two distinct parallel calls to the same function (e.g.add_to_cart(A)+add_to_cart(B)) had the second silently drop the first.Fix: key supersede by
function_name+ an args fingerprint. A true re-call (same fn, same args) still supersedes — so an async re-call can't double-execute a refund — but distinct parallel calls each keep their own card. (This is the safe middle between "keep function-name keying" = drops distinct calls, and "remove supersede" = allows double-execution.)Why these choices
The HITL policy is already half-unified (
gate_call+ the status vocabulary are shared). For each forked decision I picked the option that's correct, fail-safe, and lowest-complexity:Verification
tests/test_chat_gate_partition.py— shadow / non-shadow /function_namealias / empty node / ungated.tests/test_approval_manager.py— updated: identical re-call supersedes; distinct parallel args do NOT (the dropped-call fix).