feat: implement intervention primitive in python with cancellation support by mehtarac · Pull Request #2693 · strands-agents/harness-sdk

mehtarac · 2026-06-09T14:33:59Z

Description

Adds the intervention primitive — a composable control layer for agents that enables authorization, guardrails, steering, and operational controls to share a common interface with ordered evaluation, short-circuiting, and typed actions.

This implements Python parity with the TypeScript SDK's intervention primitive (strands-agents/sdk-typescript#883).

Resolves #2667

Public API Changes

`Agent(interventions=...)`

Agent(interventions: list[InterventionHandler] | None = None)

Handlers are evaluated in registration order at each lifecycle event. Cheapest handlers (authorization, guardrails) should be listed first; expensive ones (LLM steering) last.

Public Exports

# Top-level
from strands import InterventionHandler

# Action types and handler interface (for type annotations and construction)
from strands.interventions import (
    InterventionHandler,
    InterventionAction,
    OnError,
    Proceed,
    Deny,
    Guide,
    Confirm,
    Transform,
)

`InterventionHandler`

class InterventionHandler(ABC):
    name: str  # abstract property — unique identifier
    on_error: OnError = "throw"  # error policy

    # Lifecycle methods — override the ones you care about (sync or async)
    def before_invocation(self, event: BeforeInvocationEvent, **kwargs) -> Proceed | Deny | Guide | Transform
    def before_tool_call(self, event: BeforeToolCallEvent, **kwargs) -> Proceed | Deny | Guide | Confirm | Transform
    def after_tool_call(self, event: AfterToolCallEvent, **kwargs) -> Proceed | Transform
    def before_model_call(self, event: BeforeModelCallEvent, **kwargs) -> Proceed | Deny | Guide | Transform
    def after_model_call(self, event: AfterModelCallEvent, **kwargs) -> Proceed | Guide | Transform

All lifecycle methods default to Proceed(). Override only the ones you need — the framework detects overrides at the class level and only registers hooks for those. Handlers can be sync (def) or async (async def).

Action Types

Actions are frozen dataclasses constructed directly:

Proceed(reason: str | None = None)
Deny(reason: str = "")
Guide(feedback: str = "", reason: str | None = None)
Confirm(prompt: str = "", reason: str | None = None, response: Any = None, evaluate: Callable = default_evaluate)
Transform(apply: Callable[[LifecycleEvent], None] = ..., reason: str | None = None)

Action	Description
Proceed	Allow the operation to continue
Deny	Block the operation (sets event.cancel)
Guide	Steer with feedback
Confirm	Pause for human approval (before_tool_call only)
Transform	Modify event content in-place

Action-to-Event Compatibility Matrix

Action	before_invocation	before_tool_call	before_model_call	after_tool_call	after_model_call
Proceed	—	—	—	—	—
Deny	cancel	cancel	cancel	—	—
Guide	cancel+	cancel+	inject	—	inject + retry
Confirm	—	confirm	—	—	—
Transform	apply	apply	apply	apply	apply

— = no-op (warns at runtime)
cancel = sets event.cancel/cancel_tool, short-circuits (remaining handlers skipped)
cancel+ = sets cancel with accumulated feedback from all guiding handlers
confirm = uses preemptive response or interrupt, checks with evaluate, sets cancel if denied
inject = appends accumulated feedback as a user message so the model sees it on this call
inject + retry = appends accumulated feedback and retries so the model sees guidance
apply = calls action.apply(event) for in-place mutation, later handlers see the change

`OnError` Policy

Value	Behavior
`"throw"`	Rethrow (default). Invocation fails.
`"proceed"`	Skip handler, continue to next (fail-open).
`"deny"`	Apply Deny (fail-closed).

Hook Ordering

Before* events: interventions run at INTERVENTION_INPUT (90) — after plugins (0)
After* events: interventions run at INTERVENTION_OUTPUT (-90) — before plugins (0)

Flow: plugins → intervention → [operation] → intervention → plugins

What's NOT exported (internal)

InterventionRegistry — internal dispatch mechanism
Audit log — not included. Will be added when consumption patterns are clear.

Infrastructure Changes (cancellation support)

This PR also adds general-purpose cancellation support to two hook events. These fields are usable by any hook or plugin, not just interventions — but are required for the intervention primitive's Deny action to work.

`BeforeInvocationEvent.cancel`

cancel: bool | str = False

When set by a hook callback, the agent loop:

Creates an assistant message with the cancel text
Fires MessageAddedEvent
Yields EventLoopStopEvent with stop_reason="end_turn"
Fires AfterInvocationEvent
Returns without entering the event loop

`BeforeModelCallEvent.cancel`

cancel: bool | str = False

When set by a hook callback, the event loop:

Creates a synthetic assistant message with the cancel text
Fires AfterModelCallEvent (allows retry via event.retry = True)
Ends the model invoke span
Yields ModelStopReason event
Breaks out of the model retry loop (or continues if retry requested)

`HookOrder` constants

HookOrder.INTERVENTION_INPUT = 90    # After plugins on Before* events
HookOrder.INTERVENTION_OUTPUT = -90  # Before plugins on After* events

Example Usage

from strands import Agent, InterventionHandler, tool
from strands.interventions import Deny, Guide, Proceed


@tool
def send_email(to: str, body: str) -> str:
    """Send an email to a recipient."""
    return f"Email sent to {to}"


@tool
def query_database(query: str) -> str:
    """Run a database query."""
    return f"Results for: {query}"


ALLOWED_TOOLS = {
    "analyst": ["query_database"],
    "admin": ["query_database", "send_email"],
}


class RoleAuth(InterventionHandler):
    name = "role-auth"

    def before_tool_call(self, event):
        role = event.invocation_state.get("role", "")
        tool_name = event.tool_use["name"]
        if tool_name not in ALLOWED_TOOLS.get(role, []):
            return Deny(reason=f"Role '{role}' is not authorized for tool '{tool_name}'")
        return Proceed()


class PoliteGuard(InterventionHandler):
    name = "polite-guard"

    def after_model_call(self, event):
        if event.stop_response and event.stop_response.message:
            text = "".join(
                block.get("text", "")
                for block in event.stop_response.message.get("content", [])
                if "text" in block
            )
            if any(word in text.lower() for word in ["stupid", "idiot", "dumb"]):
                return Guide(feedback="Rephrase your response to be professional and respectful.")
        return Proceed()


agent = Agent(
    tools=[query_database, send_email],
    interventions=[RoleAuth(), PoliteGuard()],  # cheapest first
)

# Analyst can query but not send email
result = agent("Send an email to bob@example.com saying hello", invocation_state={"role": "analyst"})

# Admin can do both
result = agent("Query the database for recent orders", invocation_state={"role": "admin"})

Related Issues

Resolves #2667

Documentation PR

N/A

Type of Change

New feature

Testing

How have you tested the change?

I ran hatch run prepare
74 unit tests covering all action types, short-circuiting, guide accumulation, dispatch ordering, override detection, all three onError modes, sync and async handlers, duplicate name rejection, conflict resolution, native interrupt propagation, confirm (pause/approve/deny/preemptive/custom evaluate/falsy response), transform on all event types, unsupported action warnings, cancel-path integration (deny→end_turn, retry re-entry, plain hook cancel), and full hook integration

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

github-actions · 2026-06-09T15:02:13Z

Assessment: Comment

Clean, well-structured intervention primitive that integrates elegantly with the existing hook system. The API is intuitive, the test coverage is comprehensive (67 tests covering all action types and edge cases), and the TypeScript parity goal is well-served.

Review Themes

Observability gap: Error handling in _handle_error silently swallows errors for "proceed" and "deny" modes without the logging promised in the documentation/docstrings.
Safety: Guide-triggered model retries have no cap in the unbounded while True model call loop. Worth documenting convergence requirements or adding a framework-level limit.
Side effects: Direct agent.messages mutation in Guide handlers bypasses session management pipeline (_append_messages).
Type safety: _apply_* methods use the broad LifecycleEvent union while accessing narrow event-specific attributes, which will fail under strict mypy.
Code duplication: tracer.end_model_invoke_span called twice in the cancel path in event_loop.py.

The architecture (registry bridging handlers to hooks, Guide accumulation, Deny short-circuiting, Confirm interrupt integration) is well-thought-out.

agent-of-mkmeral · 2026-06-11T15:22:52Z

Re-review @ `905da50` — all 4 blockers resolved ✅ (one new regression from the factory removal)

Checked out the new head and re-verified everything locally. All four blocking items from the previous summary are fixed:

#	Blocker	Status	Verification
1	`ruff format` on `registry.py:216`	✅ fixed	`ruff format --check` clean; CI Python / Lint passes
2	Cancel-path tests	✅ added	new `test_cancel_paths.py` (7 tests) covers exactly the requested scenarios: Deny→`end_turn` + `DENIED:` text, model not invoked, plain-hook `cancel=True`→default text, cancel+retry→loop re-entry, `before_invocation` deny ×3. 74/74 intervention tests pass
3	Exports	✅ resolved	`Proceed`/`Deny`/`Guide`/`Confirm`/`Transform`/`InterventionAction`/`OnError` all importable from `strands.interventions` (verified). Module docstring example updated to match
4	`**kwargs: Any` on lifecycle methods	✅ added	all 5 methods

Also noted: the factory functions and InterventionActions were removed entirely in favor of direct dataclass construction (Deny(reason="...")), per @mkmeral's open thread. The PR description was updated to match. Direct dataclasses are arguably more Pythonic and the namespacing concern is moot since Deny/Proceed are self-documenting names — but note this supersedes the resolution of @lizradway's namespacing thread (which was resolved via InterventionActions), so flagging for her re-confirmation.

🔴 New regression: removing `confirm()` reintroduced the exact hazard it guarded against

The factory's stated purpose (thread) was coalescing evaluate=None → default_evaluate. With the factory gone, that guard is gone:

Confirm(prompt="ok?", response=True, evaluate=None)
# → TypeError: 'NoneType' object is not callable   (at dispatch time, deep in the registry)

Verified on this head. Since evaluate: Callable[[Any], bool] = field(default=default_evaluate) only protects omission, not explicit None, please add a __post_init__ guard to the frozen dataclass:

def __post_init__(self) -> None:
    if self.evaluate is None:
        object.__setattr__(self, "evaluate", default_evaluate)

(or type it Callable[[Any], bool] | None and resolve at use site — either way, explicit None shouldn't detonate at dispatch time in an approval primitive).

Still open from earlier reviews (maintainer calls, not blockers)

Bare callable in interventions=[...] still dies with AttributeError: 'function' object has no attribute 'name' at registration (verified) — a clear TypeError would help, since hooks=[...] does accept callables.
Instance-assigned handler methods still silently unenforced (_is_overridden checks the class only — verified returns False; TS detects this). Docs-only stance unchanged.
Phantom 0-token span — model_invoke_span still starts before the cancel check, so cancelled calls emit a 0-usage span; TS emits nothing.

CI status

All code gates are green (Lint, unit tests on py3.10–3.14 × linux/windows). The remaining failures are process checks: check-api-review-label (PR carries api/needs-review without api/review-complete — awaiting API-review sign-off), label-size, and check-access-and-checkout.

Verdict: with the Confirm(evaluate=None) guard added, this is mergeable from my side — pending the API-review label sign-off.

github-actions · 2026-06-11T15:40:53Z

Assessment: Comment (approve-leaning)

Re-reviewed against the new HEAD 726255b after re-syncing. The regression flagged on the previous head — Confirm(evaluate=None) detonating at dispatch — is fixed: evaluate is now typed Callable[[Any], bool] | None (actions.py:101) and the registry guards the use-site with action.evaluate if action.evaluate is not None else default_evaluate (registry.py:136). I reproduced the old crash and confirmed it no longer occurs. All four earlier blockers (format gate, cancel-path tests, top-level exports, **kwargs) remain resolved.

Verification @ 726255b

ruff format --check: 15 files already formatted ✅
ruff check: all checks passed ✅
mypy (interventions + event_loop): clean ✅
pytest: 237 passed (interventions + event_loop + agent) ✅

Still open — minor, already have inline threads (replied)

Doc drift (actions.py:1): docstring still says "and factory functions"; module is dataclasses-only now.
Per-field test assertions (test_registry.py:302, :542): could collapse to a single full-object equality.

Both are quick wins, neither blocks merge.

Maintainer-call DevEx items (verified still present, not blocking)

Bare callable in interventions=[fn] dies with AttributeError: 'function' object has no attribute 'name' at registration. Since hooks=[...] accepts callables, users will reasonably try it — a clear TypeError would help.
Instance-assigned methods silently unenforced: _is_overridden checks the class only, so h.before_tool_call = lambda ... returns False and never fires. For a security primitive, a one-line registration warning would turn silent non-enforcement into a loud signal. Currently docs-only.

From a code standpoint this is in good shape — the only merge gate I see is the external api/needs-review → api/review-complete label sign-off.

github-actions · 2026-06-11T17:41:07Z

Assessment: Approve-leaning

Re-reviewed against the new HEAD c150d11 ("fix docstrings"). Both minor nits from the previous round are now resolved in code:

✅ Doc drift — actions.py:1 now reads "Intervention action types." (factory-function reference removed).
✅ Per-field test assertions — test_registry.py:301 and :538 now use a single full-object equality (agent.messages == [{...}]).

Verification @ c150d11

ruff format --check: 15 files already formatted ✅
ruff check: all checks passed ✅
mypy (interventions + event_loop): clean ✅
pytest: 237 passed (interventions + event_loop + agent) ✅

The two remaining items are explicit maintainer calls, not blockers, and are unchanged: bare callable in interventions=[fn] still raises a cryptic AttributeError (vs. a clear TypeError), and instance-assigned lifecycle methods are still silently unenforced (_is_overridden checks the class only). Both are documented/known and fine to defer.

Code-wise this is in good shape — the only outstanding merge gate is the external api/needs-review → api/review-complete label sign-off. Nice, tight iteration.

feat: implement intervention primitive in python

cc754e3

github-actions Bot added the size/xl label Jun 9, 2026

mehtarac requested a deployment to manual-approval June 9, 2026 14:34 — with GitHub Actions Waiting

mehtarac temporarily deployed to manual-approval June 9, 2026 14:34 — with GitHub Actions Inactive

mehtarac added the api/needs-review Makes changes to the public API surface label Jun 9, 2026

github-actions Bot added strands-running and removed strands-running labels Jun 9, 2026

address the mypy lint failures

b5343f3

github-actions Bot added size/xl and removed size/xl labels Jun 9, 2026

mehtarac requested a deployment to manual-approval June 9, 2026 14:51 — with GitHub Actions Waiting

mehtarac marked this pull request as ready for review June 9, 2026 14:53

mehtarac temporarily deployed to auto-approve June 9, 2026 14:54 — with GitHub Actions Inactive

github-actions Bot added the strands-running label Jun 9, 2026