Skip to content

feat: add middleware system for wrapping agent stages#2615

Closed
JackYPCOnline wants to merge 1 commit into
strands-agents:mainfrom
JackYPCOnline:feat/middleware-pr1068
Closed

feat: add middleware system for wrapping agent stages#2615
JackYPCOnline wants to merge 1 commit into
strands-agents:mainfrom
JackYPCOnline:feat/middleware-pr1068

Conversation

@JackYPCOnline

@JackYPCOnline JackYPCOnline commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Migrated to #2681 to cut down on un-needed history


Port of strands-agents/sdk-typescript#1068 into the mono-repo.

Original author: @zastrowm


Summary

Adds a middleware system that wraps agent stages using async generator handlers. Middleware controls flow (retry, cache, transform, short-circuit).

Public API: agent.addMiddleware(stage, handler) — returns a cleanup function.

Three built-in stages: InvokeModelStage, ExecuteToolStage, AgentStreamStage.

Motivation

Hooks let you observe operations and set flags, but they don't let you wrap them. If you want to do something both before and after a model call (timing it, adding a span, catching errors), hooks force you to manage state across two separate callbacks. With middleware you can wrap the entire invocation keeping state within your callback:

agent.addMiddleware(InvokeModelStage, async function* (context, next) {
  const start = Date.now()
  const result = yield* next(context)
  metrics.record(Date.now() - start)
  return result
})

Beyond the before/after pattern, middleware also makes several other use cases much more natural to express: caching, input transformation, short-circuiting, and error handling. All of these are awkward or impossible with hooks alone.

Public API Changes

import { Agent, InvokeModelStage, ExecuteToolStage, AgentStreamStage, createStage } from '@strands-agents/sdk'
import type { MiddlewareStage, MiddlewareHandler, MiddlewareNext } from '@strands-agents/sdk'

const agent = new Agent({ model, tools })

// Register middleware for any built-in stage
agent.addMiddleware(InvokeModelStage, async function* (context, next) {
  // pre-processing: inspect or transform context
  const modified = { ...context, messages: sanitize(context.messages) }
  // call next layer (or don't, to short-circuit)
  const result = yield* next(modified)
  // post-processing: inspect or transform result
  return result
})

Three stages ship with the SDK:

Stage Wraps Context fields
InvokeModelStage Model call (between Before/AfterModelCallEvent) messages, systemPrompt, toolSpecs, toolChoice, modelState
ExecuteToolStage Single tool execution (between Before/AfterToolCallEvent) tool, toolUse (name, id, input)
AgentStreamStage Full agent.stream() output args, options

Handlers are async generators and simple pass-through is return yield* next(context). Manual iteration of next() allows real-time event filtering or injection while not calling next at all short-circuits the operation.

Plugin Examples

class ToolResultCache implements Plugin {
  name = 'tool-result-cache'
  private readonly _cache = new Map<string, ToolResultBlock>()

  initAgent(agent: LocalAgent): void {
    const cache = this._cache
    agent.addMiddleware(ExecuteToolStage, async function* (context, next) {
      const key = `${context.toolUse.name}:${JSON.stringify(context.toolUse.input)}`
      const cached = cache.get(key)
      if (cached) return { result: new ToolResultBlock({ toolUseId: context.toolUse.toolUseId, status: cached.status, content: cached.content }) }
      const result = yield* next(context)
      cache.set(key, result.result)
      return result
    })
  }
}
class RetryOnThrottle implements Plugin {
  name = 'retry-on-throttle'
  initAgent(agent: LocalAgent): void {
    agent.addMiddleware(InvokeModelStage, async function* (context, next) {
      for (let attempt = 0; attempt < 3; attempt++) {
        try { return yield* next(context) }
        catch (e) {
          if (!(e as Error).message.includes('ThrottlingException') || attempt === 2) throw e
        }
      }
      throw new Error('exhausted retries')
    })
  }
}
class SystemPromptInjector implements Plugin {
  name = 'system-prompt-injector'
  constructor(private readonly _suffix: string) {}

  initAgent(agent: LocalAgent): void {
    const suffix = this._suffix
    agent.addMiddleware(InvokeModelStage.Input, async (context) => ({
      ...context,
      systemPrompt: `${context.systemPrompt ?? ''}\n\n${suffix}`.trim(),
    }))
  }
}

// Usage: inject safety guidelines into every model call
const agent = new Agent({
  model,
  systemPrompt: 'You are a helpful assistant.',
  plugins: [new SystemPromptInjector('Always cite your sources.')],
})

Middleware Interrupts

Middleware contexts expose interrupt() for human-in-the-loop gating:

agent.addMiddleware(ExecuteToolStage, async function* (context, next) {
  const { response } = context.interrupt<string>({ name: 'approve', reason: 'Confirm?' })
  if (response !== 'yes') return { result: new ToolResultBlock({ ... }) }
  return yield* next(context)
})

Returns MiddlewareInterruptResult<T> (wrapper) — allows non-breaking additions (cached data, metadata) as the interrupt system evolves.

Phase Sub-Stages (Input / Around / Output)

Each stage exposes three phases with a fixed execution order in order to solve the middleware ordering problem without explicit priority numbers:

  • Input: Modifies the input of the phase as a pure function - take in the original input, return the modified input
  • Output: Modifies the output of the phase as a pure function - take in the original output, return the modified output
  • Around: Wrape the entire invocation of the stage - this is more akin to traditional middleware from express or other frameworks
// Input: transform context before execution (plain async function)
agent.addMiddleware(InvokeModelStage.Input, async (context) => ({
  ...context,
  systemPrompt: injectToSystemPrompt(context),
}))

// Output: transform result after execution (plain async function)
agent.addMiddleware(InvokeModelStage.Output, async (result) => {
  log(`stopReason=${result.result.stopReason}`)
  return result
})

// Around: full async generator wrap (same as just providing `InvokeModelStage`)
agent.addMiddleware(InvokeModelStage.Around, async function* (context, next) {
  return yield* next(context)
})

Execution order is fixed: all Input → all Output → all Around → terminal. A retry plugin on .Around always retries the full chain including Input transforms. A response logger on .Output never needs to coordinate registration order with a system prompt injector on .Input.

Key Decisions

  • First registered = outermost: follows existing middleware conventions (from Express/Koa)
  • Phase ordering is fixed: Input/Output/Around reduces the need for explicit priority management (for now - we'll probably need to add it later)
  • Hooks fire unconditionally before/after middleware: existing behavior is unchanged
  • Result wrapper types (InvokeModelResult, etc.): allows future extension of middleware return values, including returning values "up the chain"
  • readonly arrays on InvokeModelContext: enforce immutable-context pattern at the type level

Checklist

  • I have read the CONTRIBUTING document
  • Tests prove the fix is effective / feature works
  • No new warnings
  • Documentation update (pending)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Comment thread strands-ts/src/agent/agent.ts
Comment thread strands-ts/src/agent/agent.ts
Comment on lines 1438 to 1449
projectedInputTokens = await this._estimateInputTokens(streamOptions)
} catch (e) {
logger.debug(`error=<${e}> | token estimation failed, proceeding without estimate`)
}

const beforeModelCallEvent = new BeforeModelCallEvent({
agent: this,
model: this.model,
invocationState,
...(projectedInputTokens !== undefined && { projectedInputTokens }),
})
yield beforeModelCallEvent

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to worry about middleware mutating what the model actually sees, whereas these operations occur before those mutations happen? Should we at least document this if this is accepted?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean hooks running first, or the projectedInputTokens?

projectedInputTokens

I think we need to add documentation and possibly [in the future] expose a way to change this later on.

This is technically a problem with hooks right now, right?

@opieter-aws opieter-aws Jun 4, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that we have the same problem with hooks today too, yeah

I meant projectedInputTokens initially, but anything observed or computed at the BeforeModelCall boundary reflects pre-middleware state and can diverge from what the model receives

Documenting sounds acceptable to me!

this._interruptState.activate()
return new AgentResult({
stopReason: 'interrupt',
lastMessage: new Message({ role: 'assistant', content: [] }),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a tool/hook interrupt, content is set differently. Why don't we call createInterruptResult(options?.invocationState ?? {}) here to avoid drift?

Comment on lines +685 to +687
const interruptResponses = this._extractInterruptResponses(args)
if (interruptResponses.length > 0) {
this._interruptState.resume(interruptResponses)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resume() now runs only here, once, against the top-level args. When a hook sets AfterInvocationEvent.resume to interrupt-response blocks, the resume-loop re-enters _stream with them but never re-applies resume() (line 980 only reads them to gate a check), so they’re silently dropped. Is this intentional?

Comment on lines +731 to +732
for (const interrupt of error.interrupts) {
this._interruptState.getOrCreateInterrupt(interrupt.id, interrupt.name, interrupt.reason)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike _stream catch, this path registers the interrupts but never yields InterruptEvents, so stream consumers see interrupt events for tool/hook interrupts but not for AgentStreamStage ones. A single shared helper should fix this and the comment below

@poshinchen

Copy link
Copy Markdown
Contributor

Iterate on this comment: https://github.com/strands-agents/sdk-typescript/pull/1068/changes#r3343982119

Can we just create a custom span?

@zastrowm zastrowm force-pushed the feat/middleware-pr1068 branch from 2ba7b60 to 493e9b1 Compare June 4, 2026 19:28
@github-actions github-actions Bot added size/xl and removed size/xl labels Jun 4, 2026
)
}

private async *_executeToolCore(

@zastrowm zastrowm Jun 4, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poshinchen said

Iterate on this comment: ...

Can we just create a custom span?

Issue: Telemetry asymmetry between InvokeModelStage and ExecuteToolStage when middleware short-circuits.

For model calls, the span/metrics wrap the middleware chain — startModelInvokeSpan is called at line 1548 (before middleware runs) and endModelInvokeSpan at line 1563 (after middleware returns). So a cache-hit middleware still appears in traces.

For tool calls, the span/metrics live inside _executeToolCore (started here at line 2212). When ExecuteToolStage middleware short-circuits (e.g., returns a mock result), _executeToolCore never runs, so:

  • No tool span is created
  • No endToolCallSpan is called
  • _meter.endToolCall() never fires

This means tool cache hits, mocked results, and other short-circuit patterns are invisible to telemetry consumers.

Suggestion: Consider moving the tool span start/end outside _executeToolCore to wrap the middleware call (matching the model pattern). Something like:

private async *_executeToolWithMiddleware(...) {
  const toolSpan = this._tracer.startToolCallSpan({ tool: toolUse })
  const toolStartTime = Date.now()
  try {
    const result = yield* this._middlewareRegistry.invoke(ExecuteToolStage, context, terminal)
    this._tracer.endToolCallSpan(toolSpan, { toolResult: result.result })
    this._meter.endToolCall({ tool: toolUse, duration: Date.now() - toolStartTime, success: result.result.status === 'success' })
    return result
  } catch (error) {
    this._tracer.endToolCallSpan(toolSpan, { error })
    throw error
  }
}

This ensures short-circuited tool calls still generate telemetry. The model stage already follows this pattern.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd like to address this as an immediate follow-up.

What are your thoughts on:

  • Every middleware should have a custom span
  • Native Spans (model call, tool call) should always occur inside of the custom span

Is it weird or normal/expected to have the native spans inside of custom spans? Let's discuss offline to refine this a bit

Comment thread strands-ts/src/agent/agent.ts Outdated
*
* @param stage - The stage token identifying the interception point
* @param handler - The middleware handler function (async generator)
*/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Per AGENTS.md, exported methods on main SDK entry points (like Agent) should include @example in TSDoc. addHook above has one, but addMiddleware does not.

Suggestion: Add an @example block:

/**
 * @example
 * ```typescript
 * agent.addMiddleware(InvokeModelStage, async function* (context, next) {
 *   const start = Date.now()
 *   const result = yield* next(context)
 *   console.log(`Model call took ${Date.now() - start}ms`)
 *   return result
 * })
 * ```
 */

const interruptId = `${idPrefix}:${params.name}`
const existing = interruptState.interrupts[interruptId]
if (existing?.response !== undefined) {
return { response: existing.response as T }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The existing.response as T cast (line 272) is unsafe — middleware authors pass a generic T but the stored response is JSONValue. If the response shape doesn't match T, this will silently produce incorrect types at runtime.

Suggestion: Consider adding runtime validation or at minimum a TSDoc warning that the caller is responsible for ensuring the response type matches T. Alternatively, narrow the generic constraint: interrupt<T extends JSONValue>(...) to make the unsafety more visible.

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Issue: This is a significant new public API surface (addMiddleware, Stage, MiddlewareHandler, createStage, three built-in stage tokens, MiddlewareInterruptible, etc.) that will be used by all plugin/middleware authors. Per the project's API bar-raising process, this should have the needs-api-review label to ensure designated reviewers evaluate the API design before merge.

Key API design questions worth discussing:

  1. Symmetry with hooks: addHook returns a cleanup function; addMiddleware does not. Should it?
  2. Plugin ergonomics: The Plugin.initAgent(agent) pattern works, but Plugin has no first-class middleware property (unlike getTools() for tools). Is that intentional?
  3. Event type: All three built-in stages share AgentStreamEvent as their TEvent type parameter. Is there value in stage-specific event types for better type narrowing?

Comment thread strands-ts/src/middleware/registry.ts Outdated
handler: MiddlewareHandler<TContext, TEvent, TResult>
}

/** Phase compose order: input (outermost) → output → around (innermost, closest to terminal). */

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The comment "Phase compose order: input (outermost) → output → around (innermost, closest to terminal)" describes the composition/wrapping order but may mislead middleware authors about execution order.

The actual execution order experienced by a middleware author is: Input → Around → Output (confirmed by the test at line 1420 in agent-middleware.test.ts). The difference is:

  • Input: transforms context first (runs before everything)
  • Around: wraps the terminal (runs next, with before/after control)
  • Output: transforms result last (runs after around completes)

Suggestion: Clarify the comment to describe both perspectives:

/** 
 * Phase compose order (how handlers are layered around the terminal):
 * input (outermost) → output → around (innermost, closest to terminal).
 * 
 * Effective execution order: Input transforms context → Around wraps terminal → Output transforms result.
 */

Comment thread strands-ts/src/middleware/types.ts Outdated
readonly Output: MiddlewareOutputPhase<TContext, TEvent, TResult>
}

/** Phase ordering constant. Input runs outermost, then Output, then Around (closest to terminal). */

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Same comment as on registry.ts — this TSDoc says "Input runs outermost, then Output, then Around" which accurately describes composition layering, but the execution order developers observe is Input → Around → Output. Middleware authors reading this doc will likely think Output runs before Around.

Suggestion: Reword to:

/** Phase ordering constant. Composition: input (outermost) → output → around (innermost). Execution: input → around → output. */

)
return () => this._middlewareRegistry.remove(stage, adapted)
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: If InvokeModelStage.Around is passed to the implementation (bypassing type overloads via as any or future refactoring), it enters the '_phase' in stageOrPhase branch but falls through both 'input' and 'output' checks, landing at line 597 where it's cast as MiddlewareStage. Since MiddlewareAroundPhase is a different shape, registry.add() would use the phase sub-token as the map key instead of the parent stage — silently registering the handler under the wrong key.

TypeScript's overloads prevent this today (no overload accepts MiddlewareAroundPhase), so this is a defensive coding concern rather than a runtime bug.

Suggestion: Add an explicit 'around' case for robustness (and as documentation that the path was considered):

if (stageOrPhase._phase === 'around') {
  const stage = stageOrPhase._stage
  const aroundHandler = handler as MiddlewareHandler<TContext, TEvent, TResult>
  this._middlewareRegistry.add(stage, aroundHandler)
  return () => this._middlewareRegistry.remove(stage, aroundHandler)
}

Or a throw new Error('Use the stage directly for Around phase') to fail fast.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Assessment: Comment

Solid implementation of the Input/Around/Output phase system on top of the already-reviewed middleware core. The new commit adds clean phase ordering with appropriate compose semantics. Existing feedback from prior reviewers is largely addressed.

New issues (this review pass)
  • Important: Phase-related types (MiddlewareInputHandler, MiddlewareOutputHandler, MiddlewareInputPhase, MiddlewareOutputPhase) not exported from src/index.ts — consumers can't explicitly type-annotate their phase handlers
  • Suggestion: Phase ordering documentation describes composition order but can mislead about execution order (Input → Around → Output)
  • Suggestion: Defensive handling for MiddlewareAroundPhase falling through in addMiddleware implementation
Prior feedback (still tracked, not re-raised)
  • Telemetry asymmetry (deferred to follow-up)
  • const self = this × 3 (acknowledged, comment added)
  • collect() helper duplication (already raised)
  • needs-api-review label (process question)
  • Unsafe as T cast in createMiddlewareInterrupt (documented)

The phase composition logic is correct — compose() builds right-to-left from the sorted array (input outermost → output → around innermost), producing the expected Input → Around → Output execution flow. Test coverage for the new phase system is good with ordering verification, cleanup, and cross-stage tests.

Comment thread strands-ts/src/types/agent.ts Outdated
* Middleware wraps stage execution and can intercept, transform, or short-circuit operations.
*
* @param stage - The stage token identifying the interception point
* @param handler - The middleware handler function (async generator)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The TSDoc here describes only the Around overload (@param stage, handler function (async generator)) but applies to all three overloads. The Input/Output overloads use phase as the parameter name and accept plain async functions, not generators.

Suggestion: Either add per-overload TSDoc (matching the Agent class implementation at lines 515-568) or update this to be generic:

/**
 * Register a middleware handler for a given stage or phase.
 * Middleware wraps stage execution and can intercept, transform, or short-circuit operations.
 *
 * @param stageOrPhase - A stage token (Around) or phase sub-token (Input/Output)
 * @param handler - The middleware handler (async generator for Around, plain async function for Input/Output)
 * @returns A cleanup function that removes the middleware when called
 */

Comment thread strands-ts/src/agent/agent.ts Outdated
): () => void
/**
* Register an Output phase handler that transforms the result after execution.
* Output handlers run after Input but before Around handlers in the compose chain

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The Output phase handler TSDoc says "Output handlers run after Input but before Around handlers in the compose chain" — this is describing the composition layering, but could be misread as execution order. The same ambiguity flagged elsewhere.

Since this is the user-facing TSDoc that appears in IDE tooltips, the phrasing matters for developers.

Suggestion: Clarify with execution-order language:

/**
 * Register an Output phase handler that transforms the result after execution.
 * Output handlers see the result after Around handlers complete (execution order: Input → Around → Output).
 */


await agent.invoke('Second')
expect(inputCalled).toBe(false)
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The cleanup test only covers the Input phase (this test). The Output phase uses the same mechanism (addOutput returns adapted handler → remove by reference), so it's very likely correct, but per TESTING.md guidelines, the Output phase cleanup path should also be exercised.

Suggestion: Add a parallel test:

it('cleanup removes Output handler', async () => {
  const model = new MockMessageModel()
    .addTurn({ type: 'textBlock', text: 'First' })
    .addTurn({ type: 'textBlock', text: 'Second' })

  const agent = new Agent({ model, printer: false })

  let outputCalled = false
  const cleanup = agent.addMiddleware(InvokeModelStage.Output, async (result) => {
    outputCalled = true
    return result
  })

  await agent.invoke('First')
  expect(outputCalled).toBe(true)

  outputCalled = false
  cleanup()

  await agent.invoke('Second')
  expect(outputCalled).toBe(false)
})

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Assessment: Comment

Well-designed middleware system with comprehensive test coverage (84+ tests) and clean async-generator composition. The core design — stage tokens, phase sub-tokens, registry with compose — is sound and aligns well with the SDK's extensibility tenet. Prior review feedback has been thoroughly addressed.

New findings (this pass)
  • Important: LocalAgent interface TSDoc describes only the Around overload but applies to all three (Input/Output/Around) — confusing for IDE tooltip consumers
  • Suggestion: Output phase handler TSDoc on Agent class uses "compose chain" language that can be misread as execution order
  • Suggestion: Missing test for Output phase cleanup (only Input phase cleanup is tested)
Prior feedback status

All critical items from earlier reviews are resolved in the current commit:

  • addMiddleware returns cleanup function
  • readonly Message[] / readonly ToolSpec[] enforced
  • @example on main entry point
  • InterruptEvent yielded for AgentStreamStage interrupts
  • ✅ Resume loop applies interrupt responses from AfterInvocationEvent.resume

Still tracked (not blocking, previously discussed):

  • Phase-related type exports from src/index.ts (usability gap)
  • Telemetry asymmetry for tool spans (deferred to follow-up)
  • collect() helper duplication across test files
  • needs-api-review label (process question)

The middleware architecture is well thought-out — hooks fire unconditionally around middleware, phase ordering is deterministic (Input → Around → Output), and the createStage factory enables third-party extensibility without coupling to SDK internals.

Adds a middleware system that wraps agent stages (model calls, tool
execution, agent streaming) using async generator handlers.

Public API: agent.addMiddleware(stage, handler) — returns a cleanup function.
Three built-in stages: InvokeModelStage, ExecuteToolStage, AgentStreamStage.
Each stage exposes Input/Around/Output phase sub-tokens with fixed execution
order, solving the middleware ordering problem without explicit priorities.

Key design points:
- Async generators for streaming compatibility
- First registered = outermost within each phase
- Phase ordering: Input → Around → Output (fixed)
- Middleware interrupt() returns { response: T } wrapper for extensibility
- Result wrapper types (InvokeModelResult, etc.) for future extension
- readonly arrays on InvokeModelContext enforce immutable-context pattern
- Hooks fire unconditionally around the middleware chain

Co-authored-by: Jack Yuan <jackypc@amazon.com>
@zastrowm

zastrowm commented Jun 8, 2026

Copy link
Copy Markdown
Member

Migrating to #2681 to cut down on necessary history/back & forth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-api-review Makes changes to the public API surface size/xl

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants