feat: add middleware system for wrapping agent stages#2615
feat: add middleware system for wrapping agent stages#2615JackYPCOnline wants to merge 1 commit into
Conversation
| projectedInputTokens = await this._estimateInputTokens(streamOptions) | ||
| } catch (e) { | ||
| logger.debug(`error=<${e}> | token estimation failed, proceeding without estimate`) | ||
| } | ||
|
|
||
| const beforeModelCallEvent = new BeforeModelCallEvent({ | ||
| agent: this, | ||
| model: this.model, | ||
| invocationState, | ||
| ...(projectedInputTokens !== undefined && { projectedInputTokens }), | ||
| }) | ||
| yield beforeModelCallEvent |
There was a problem hiding this comment.
Do we need to worry about middleware mutating what the model actually sees, whereas these operations occur before those mutations happen? Should we at least document this if this is accepted?
There was a problem hiding this comment.
Do you mean hooks running first, or the projectedInputTokens?
projectedInputTokens
I think we need to add documentation and possibly [in the future] expose a way to change this later on.
This is technically a problem with hooks right now, right?
There was a problem hiding this comment.
You're right that we have the same problem with hooks today too, yeah
I meant projectedInputTokens initially, but anything observed or computed at the BeforeModelCall boundary reflects pre-middleware state and can diverge from what the model receives
Documenting sounds acceptable to me!
| this._interruptState.activate() | ||
| return new AgentResult({ | ||
| stopReason: 'interrupt', | ||
| lastMessage: new Message({ role: 'assistant', content: [] }), |
There was a problem hiding this comment.
For a tool/hook interrupt, content is set differently. Why don't we call createInterruptResult(options?.invocationState ?? {}) here to avoid drift?
| const interruptResponses = this._extractInterruptResponses(args) | ||
| if (interruptResponses.length > 0) { | ||
| this._interruptState.resume(interruptResponses) |
There was a problem hiding this comment.
resume() now runs only here, once, against the top-level args. When a hook sets AfterInvocationEvent.resume to interrupt-response blocks, the resume-loop re-enters _stream with them but never re-applies resume() (line 980 only reads them to gate a check), so they’re silently dropped. Is this intentional?
| for (const interrupt of error.interrupts) { | ||
| this._interruptState.getOrCreateInterrupt(interrupt.id, interrupt.name, interrupt.reason) |
There was a problem hiding this comment.
Unlike _stream catch, this path registers the interrupts but never yields InterruptEvents, so stream consumers see interrupt events for tool/hook interrupts but not for AgentStreamStage ones. A single shared helper should fix this and the comment below
|
Iterate on this comment: https://github.com/strands-agents/sdk-typescript/pull/1068/changes#r3343982119 Can we just create a custom span? |
2ba7b60 to
493e9b1
Compare
| ) | ||
| } | ||
|
|
||
| private async *_executeToolCore( |
There was a problem hiding this comment.
@poshinchen said
Iterate on this comment: ...
Can we just create a custom span?
Issue: Telemetry asymmetry between
InvokeModelStageandExecuteToolStagewhen middleware short-circuits.For model calls, the span/metrics wrap the middleware chain —
startModelInvokeSpanis called at line 1548 (before middleware runs) andendModelInvokeSpanat line 1563 (after middleware returns). So a cache-hit middleware still appears in traces.For tool calls, the span/metrics live inside
_executeToolCore(started here at line 2212). WhenExecuteToolStagemiddleware short-circuits (e.g., returns a mock result),_executeToolCorenever runs, so:
- No tool span is created
- No
endToolCallSpanis called_meter.endToolCall()never firesThis means tool cache hits, mocked results, and other short-circuit patterns are invisible to telemetry consumers.
Suggestion: Consider moving the tool span start/end outside
_executeToolCoreto wrap the middleware call (matching the model pattern). Something like:private async *_executeToolWithMiddleware(...) { const toolSpan = this._tracer.startToolCallSpan({ tool: toolUse }) const toolStartTime = Date.now() try { const result = yield* this._middlewareRegistry.invoke(ExecuteToolStage, context, terminal) this._tracer.endToolCallSpan(toolSpan, { toolResult: result.result }) this._meter.endToolCall({ tool: toolUse, duration: Date.now() - toolStartTime, success: result.result.status === 'success' }) return result } catch (error) { this._tracer.endToolCallSpan(toolSpan, { error }) throw error } }This ensures short-circuited tool calls still generate telemetry. The model stage already follows this pattern.
There was a problem hiding this comment.
I think I'd like to address this as an immediate follow-up.
What are your thoughts on:
- Every middleware should have a custom span
- Native Spans (model call, tool call) should always occur inside of the custom span
Is it weird or normal/expected to have the native spans inside of custom spans? Let's discuss offline to refine this a bit
| * | ||
| * @param stage - The stage token identifying the interception point | ||
| * @param handler - The middleware handler function (async generator) | ||
| */ |
There was a problem hiding this comment.
Issue: Per AGENTS.md, exported methods on main SDK entry points (like Agent) should include @example in TSDoc. addHook above has one, but addMiddleware does not.
Suggestion: Add an @example block:
/**
* @example
* ```typescript
* agent.addMiddleware(InvokeModelStage, async function* (context, next) {
* const start = Date.now()
* const result = yield* next(context)
* console.log(`Model call took ${Date.now() - start}ms`)
* return result
* })
* ```
*/| const interruptId = `${idPrefix}:${params.name}` | ||
| const existing = interruptState.interrupts[interruptId] | ||
| if (existing?.response !== undefined) { | ||
| return { response: existing.response as T } |
There was a problem hiding this comment.
Issue: The existing.response as T cast (line 272) is unsafe — middleware authors pass a generic T but the stored response is JSONValue. If the response shape doesn't match T, this will silently produce incorrect types at runtime.
Suggestion: Consider adding runtime validation or at minimum a TSDoc warning that the caller is responsible for ensuring the response type matches T. Alternatively, narrow the generic constraint: interrupt<T extends JSONValue>(...) to make the unsafety more visible.
|
Issue: This is a significant new public API surface ( Key API design questions worth discussing:
|
| handler: MiddlewareHandler<TContext, TEvent, TResult> | ||
| } | ||
|
|
||
| /** Phase compose order: input (outermost) → output → around (innermost, closest to terminal). */ |
There was a problem hiding this comment.
Issue: The comment "Phase compose order: input (outermost) → output → around (innermost, closest to terminal)" describes the composition/wrapping order but may mislead middleware authors about execution order.
The actual execution order experienced by a middleware author is: Input → Around → Output (confirmed by the test at line 1420 in agent-middleware.test.ts). The difference is:
- Input: transforms context first (runs before everything)
- Around: wraps the terminal (runs next, with before/after control)
- Output: transforms result last (runs after around completes)
Suggestion: Clarify the comment to describe both perspectives:
/**
* Phase compose order (how handlers are layered around the terminal):
* input (outermost) → output → around (innermost, closest to terminal).
*
* Effective execution order: Input transforms context → Around wraps terminal → Output transforms result.
*/| readonly Output: MiddlewareOutputPhase<TContext, TEvent, TResult> | ||
| } | ||
|
|
||
| /** Phase ordering constant. Input runs outermost, then Output, then Around (closest to terminal). */ |
There was a problem hiding this comment.
Issue: Same comment as on registry.ts — this TSDoc says "Input runs outermost, then Output, then Around" which accurately describes composition layering, but the execution order developers observe is Input → Around → Output. Middleware authors reading this doc will likely think Output runs before Around.
Suggestion: Reword to:
/** Phase ordering constant. Composition: input (outermost) → output → around (innermost). Execution: input → around → output. */| ) | ||
| return () => this._middlewareRegistry.remove(stage, adapted) | ||
| } | ||
| } |
There was a problem hiding this comment.
Issue: If InvokeModelStage.Around is passed to the implementation (bypassing type overloads via as any or future refactoring), it enters the '_phase' in stageOrPhase branch but falls through both 'input' and 'output' checks, landing at line 597 where it's cast as MiddlewareStage. Since MiddlewareAroundPhase is a different shape, registry.add() would use the phase sub-token as the map key instead of the parent stage — silently registering the handler under the wrong key.
TypeScript's overloads prevent this today (no overload accepts MiddlewareAroundPhase), so this is a defensive coding concern rather than a runtime bug.
Suggestion: Add an explicit 'around' case for robustness (and as documentation that the path was considered):
if (stageOrPhase._phase === 'around') {
const stage = stageOrPhase._stage
const aroundHandler = handler as MiddlewareHandler<TContext, TEvent, TResult>
this._middlewareRegistry.add(stage, aroundHandler)
return () => this._middlewareRegistry.remove(stage, aroundHandler)
}Or a throw new Error('Use the stage directly for Around phase') to fail fast.
|
Assessment: Comment Solid implementation of the Input/Around/Output phase system on top of the already-reviewed middleware core. The new commit adds clean phase ordering with appropriate compose semantics. Existing feedback from prior reviewers is largely addressed. New issues (this review pass)
Prior feedback (still tracked, not re-raised)
The phase composition logic is correct — |
| * Middleware wraps stage execution and can intercept, transform, or short-circuit operations. | ||
| * | ||
| * @param stage - The stage token identifying the interception point | ||
| * @param handler - The middleware handler function (async generator) |
There was a problem hiding this comment.
Issue: The TSDoc here describes only the Around overload (@param stage, handler function (async generator)) but applies to all three overloads. The Input/Output overloads use phase as the parameter name and accept plain async functions, not generators.
Suggestion: Either add per-overload TSDoc (matching the Agent class implementation at lines 515-568) or update this to be generic:
/**
* Register a middleware handler for a given stage or phase.
* Middleware wraps stage execution and can intercept, transform, or short-circuit operations.
*
* @param stageOrPhase - A stage token (Around) or phase sub-token (Input/Output)
* @param handler - The middleware handler (async generator for Around, plain async function for Input/Output)
* @returns A cleanup function that removes the middleware when called
*/| ): () => void | ||
| /** | ||
| * Register an Output phase handler that transforms the result after execution. | ||
| * Output handlers run after Input but before Around handlers in the compose chain |
There was a problem hiding this comment.
Issue: The Output phase handler TSDoc says "Output handlers run after Input but before Around handlers in the compose chain" — this is describing the composition layering, but could be misread as execution order. The same ambiguity flagged elsewhere.
Since this is the user-facing TSDoc that appears in IDE tooltips, the phrasing matters for developers.
Suggestion: Clarify with execution-order language:
/**
* Register an Output phase handler that transforms the result after execution.
* Output handlers see the result after Around handlers complete (execution order: Input → Around → Output).
*/|
|
||
| await agent.invoke('Second') | ||
| expect(inputCalled).toBe(false) | ||
| }) |
There was a problem hiding this comment.
Issue: The cleanup test only covers the Input phase (this test). The Output phase uses the same mechanism (addOutput returns adapted handler → remove by reference), so it's very likely correct, but per TESTING.md guidelines, the Output phase cleanup path should also be exercised.
Suggestion: Add a parallel test:
it('cleanup removes Output handler', async () => {
const model = new MockMessageModel()
.addTurn({ type: 'textBlock', text: 'First' })
.addTurn({ type: 'textBlock', text: 'Second' })
const agent = new Agent({ model, printer: false })
let outputCalled = false
const cleanup = agent.addMiddleware(InvokeModelStage.Output, async (result) => {
outputCalled = true
return result
})
await agent.invoke('First')
expect(outputCalled).toBe(true)
outputCalled = false
cleanup()
await agent.invoke('Second')
expect(outputCalled).toBe(false)
})|
Assessment: Comment Well-designed middleware system with comprehensive test coverage (84+ tests) and clean async-generator composition. The core design — stage tokens, phase sub-tokens, registry with compose — is sound and aligns well with the SDK's extensibility tenet. Prior review feedback has been thoroughly addressed. New findings (this pass)
Prior feedback statusAll critical items from earlier reviews are resolved in the current commit:
Still tracked (not blocking, previously discussed):
The middleware architecture is well thought-out — hooks fire unconditionally around middleware, phase ordering is deterministic (Input → Around → Output), and the |
Adds a middleware system that wraps agent stages (model calls, tool
execution, agent streaming) using async generator handlers.
Public API: agent.addMiddleware(stage, handler) — returns a cleanup function.
Three built-in stages: InvokeModelStage, ExecuteToolStage, AgentStreamStage.
Each stage exposes Input/Around/Output phase sub-tokens with fixed execution
order, solving the middleware ordering problem without explicit priorities.
Key design points:
- Async generators for streaming compatibility
- First registered = outermost within each phase
- Phase ordering: Input → Around → Output (fixed)
- Middleware interrupt() returns { response: T } wrapper for extensibility
- Result wrapper types (InvokeModelResult, etc.) for future extension
- readonly arrays on InvokeModelContext enforce immutable-context pattern
- Hooks fire unconditionally around the middleware chain
Co-authored-by: Jack Yuan <jackypc@amazon.com>
|
Migrating to #2681 to cut down on necessary history/back & forth |
Migrated to #2681 to cut down on un-needed history
Port of strands-agents/sdk-typescript#1068 into the mono-repo.
Original author: @zastrowm
Summary
Adds a middleware system that wraps agent stages using async generator handlers. Middleware controls flow (retry, cache, transform, short-circuit).
Public API:
agent.addMiddleware(stage, handler)— returns a cleanup function.Three built-in stages:
InvokeModelStage,ExecuteToolStage,AgentStreamStage.Motivation
Hooks let you observe operations and set flags, but they don't let you wrap them. If you want to do something both before and after a model call (timing it, adding a span, catching errors), hooks force you to manage state across two separate callbacks. With middleware you can wrap the entire invocation keeping state within your callback:
Beyond the before/after pattern, middleware also makes several other use cases much more natural to express: caching, input transformation, short-circuiting, and error handling. All of these are awkward or impossible with hooks alone.
Public API Changes
Three stages ship with the SDK:
InvokeModelStageExecuteToolStageAgentStreamStageagent.stream()outputHandlers are async generators and simple pass-through is
return yield* next(context). Manual iteration ofnext()allows real-time event filtering or injection while not callingnextat all short-circuits the operation.Plugin Examples
Middleware Interrupts
Middleware contexts expose
interrupt()for human-in-the-loop gating:Returns
MiddlewareInterruptResult<T>(wrapper) — allows non-breaking additions (cached data, metadata) as the interrupt system evolves.Phase Sub-Stages (Input / Around / Output)
Each stage exposes three phases with a fixed execution order in order to solve the middleware ordering problem without explicit priority numbers:
Input: Modifies the input of the phase as a pure function - take in the original input, return the modified inputOutput: Modifies the output of the phase as a pure function - take in the original output, return the modified outputAround: Wrape the entire invocation of the stage - this is more akin to traditional middleware from express or other frameworksExecution order is fixed: all Input → all Output → all Around → terminal. A retry plugin on
.Aroundalways retries the full chain including Input transforms. A response logger on.Outputnever needs to coordinate registration order with a system prompt injector on.Input.Key Decisions
InvokeModelResult, etc.): allows future extension of middleware return values, including returning values "up the chain"readonlyarrays onInvokeModelContext: enforce immutable-context pattern at the type levelChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.