feat: add memory injection#2797
Conversation
0cf9b28 to
29fd13f
Compare
29fd13f to
7533bb6
Compare
|
Assessment: Request Changes Solid, well-documented port — the ephemerality contract is cleanly modeled on the middleware seam, fail-open semantics are consistent, XML escaping protects the default Two things should be resolved before merge, both around the Review themes
Nice work on the fold-rule handling (prepend on a user ask, append after a tool result) and the matching |
agent-of-mkmeral
left a comment
There was a problem hiding this comment.
Review — Python port of memory injection (#2631)
Assessment: Approve (no blockers). I checked this out, read every new/changed file, ran the suite, type-checked, and ran my own adversarial edge probes. This is a faithful, well-documented port and the earlier blocking concerns (description-vs-code default mismatch; TS scope) have been resolved — the description now correctly states default-on and calls out the TS behavior change under Breaking Changes, and both prior bot threads are resolved.
Verification (local, head 7533bb6)
135 passedacrosstests/strands/injection,tests/strands/vended_plugins/context_injector,test_injection_integration.py,test_memory_manager.py- mypy clean on the 8 new/changed source files
- Integration test directly proves the ephemerality contract: injected
<memory>reachesmodel.stream(...)for one call but never appears inagent.messages.
What's good
- Ephemerality via the middleware seam is correct.
InvokeModelContext.messagesis already a defensive copy of agent state; the handler doesreplace(context, messages=fold_into_last_user_message(...))andfold_*returns a brand-new list without mutating inputs (testtest_returns_new_list_and_does_not_mutate_input+test_does_not_mutate_original_context_messagesconfirm). Durable history is structurally untouched — not just by convention. - Fold rule is the subtle part and it's right + tested. Prepend on a plain user ask (keeps the user's ask in the recency slot); append on a tool-result turn (keeps the tool result the first block, which providers require). I verified the mixed
[toolResult, text]case appends correctly and toolResult stays first. - Fail-open is consistent across all four callback seams —
triggerpredicate,render_content, memoryquery, and memoryformateach catch, logreason=<…>, and skip injection so the model call proceeds. Matches TS exactly. - XML escaping protects the default
<memory>block from both structural breakage and stored-prompt-injection.&is escaped first; attr-escaping adds"/'. My probe ona & b < c > </entry>andhe said "hi"escaped correctly. - TS parity is line-for-line on the engine, resolver, fold logic,
_provide_memory_context,_resolve_injection_queryadaptive default (user text on user turn, else most-recent assistant text), and_default_injection_format.
Python-specific touches worth calling out (all good)
- The
Callable | Protocol-with-**kwargsunions for the callback fields (vs. TS's bare function type) is the right call under the STYLE_GUIDE — keeps bare lambdas ergonomic under mypy strict while leaving room for the calling convention to grow kwargs. Mirrors theEdgeConditionprecedent. strands.memoryre-exportingInjectionConfig/InjectionContext/InjectionTriggeris a nice single-import ergonomic.
Non-blocking observations
max_entriesdouble-cap is correct but worth a one-line note.search(...)caps each store atmax_results, then[:max_results]caps the concatenation. With N stores you fetch up to N×max before slicing, and sincesearchconcatenates in store-registration order with no cross-store ranking, the slice can systematically favor earlier-registered stores. This is already documented in theMemoryInjectionConfig.max_entriesdocstring ("can favor entries from earlier-registered stores") — good. No change needed; just flagging it's an intentional, documented tradeoff.is_user_turn/ fold assume well-formed messages (every message has acontentlist). A message dict missingcontentraisesKeyErrorrather than failing open. In practice the agent loop only ever produces well-formed messages so this is not reachable from supported paths, and matches TS (which would also misbehave on malformed data). Not worth defensive code, but noting it since the rest of the module is so carefully fail-open.- Default-on is a real behavior change for existing TS
MemoryManagerusers (extra search + injected block + tokens per user turn). It's now correctly documented and is a deliberate "obvious-path default" decision — but it does warrant the API reviewer's explicit sign-off, which the description is well-prepared for.
Nice work — the hard parts (fold placement, ephemerality, fail-open uniformity, escaping) are all handled and covered by tests that mirror the TS describe blocks.
7533bb6 to
a874938
Compare
|
Assessment: Comment (re-review) Re-reviewed at Status of prior feedback
Nice tightening of the public surface in the follow-up commit, and the description rewrite is thorough. Once the |
|
Non-blocking nit (don't hold the merge for this) — nice follow-up underscoring the internal functions in
The functions inside are now The repo's own convention is to underscore the module/package, not just its contents — e.g.
Why now rather than later: once released, the module path could become load-bearing for someone, so renaming gets more expensive post-merge. But this is purely a convention/consistency nit — the exported |
Description
Python port of #2631
Motivation
The TypeScript SDK ships context injection: a way to fold just-in-time text into the model's input for a single call without persisting it to the durable conversation. The
MemoryManagerbuilds on it to surface relevant memories to the model automatically, so the agent doesn't have to spend a tool call searching. Python had neither —MemoryManager(#2740) landed with extraction and search tools but no injection — so the PythonMemoryManagercould only recall memory reactively via thesearch_memorytool, never proactively.This ports the injection primitive to Python and wires it into
MemoryManager, bringing the two SDKs to parity. The design follows the TS source: a small, reusablestrands.injectionengine delivered as anInvokeModelStage.Inputmiddleware, a thinContextInjectorvended plugin for arbitrary content, and aMemoryManagerconsumer for memory-specific retrieval. Behavior, fail-open semantics, the default<memory>format, the prepend-vs-append fold rule, and trigger policies all mirror the TS implementation.Ephemerality is the core contract: injected text reaches the model for one call but never enters
agent.messagesor the session. This falls out of the middleware seam — the engine rewrites the per-callInvokeModelContext.messages(already a defensive copy of agent state), so durable history is untouched.Memory injection is on by default in both SDKs: if you configure a
MemoryManagerwith stores, the point is to use them, so proactive recall is the obvious-path default rather than a flag you have to discover. Because this also changes the already-shipped TSMemoryManager, it is a behavior change there — see Breaking Changes. Opt out withinjection: false/injection=False.Public API Changes
1. New
strands.injectionpackage — shared configuration types for context injection. Delivery primitives are intentionally internal; the public surface is config only:2. New
ContextInjectorvended plugin — folds arbitrary just-in-time text (current time, retrieved docs, request metadata, …) into the model input:render_contentis the only required argument (sync or async; returns the text, orNone/""to skip). Optional keyword-onlyname(default"strands:context-injector") andtrigger("userTurn"default,"everyTurn", or a predicate over theInjectionContext). A callback that raises fails open — injection is skipped and the model call proceeds.3.
MemoryManagergains aninjectionoption — on by default:MemoryInjectionConfigextendsInjectionConfig. When enabled,MemoryManagerderives a query from the conversation (the latest user text on a user turn, else the most recent assistant text), searches its stores, and folds the top entries into the model input.injectiondefaults toTrue(Falsedisables it).strands.memoryalso re-exportsInjectionConfig/InjectionContext/InjectionTriggerso injection can be configured from a single import.The same default flip is applied to the TypeScript
MemoryManager(injection?: boolean | MemoryInjectionConfig, now defaulting to enabled) to keep the two SDKs aligned.One deliberate cross-SDK divergence worth a reviewer's eye: the callback config fields (
trigger,render_content,query,format) are typed asCallable | Protocol-with-**kwargsunions rather than the bareCallablethe TS side uses. This follows the STYLE_GUIDE's "avoidCallablefor extensible interfaces" rule (theProtocolarm lets the calling convention grow keyword arguments later) while theCallablearm keeps the plain-lambda happy path ergonomic under mypy strict — a pureProtocolrejects bare lambdas. This mirrors the existingEdgeConditionpattern inmultiagent/graph.py.Breaking Changes
For the Python
MemoryManager, memory injection is new in this PR, so default-on is a fresh default, not a change. For the TypeScriptMemoryManager(shipped in #2631), this flipsinjectionfrom opt-in (false) to opt-out (enabled). Existing TS agents that configure aMemoryManagerwill now, on each model call, derive a query from the conversation, search their stores, and prepend a<memory>block to the model input — adding tokens and a search round-trip per call, and potentially surfacing irrelevant recall. This is a behavior change under the "pay for play" tenet, called out here for explicit reviewer sign-off; it is not an API-signature break (the field and its type are unchanged).Migration
Documentation PR
Follow-up: a docs page for
ContextInjectorandMemoryManagerinjection is not included here.Type of Change
New feature
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce new warnings.
hatch run prepareUnit tests mirror the TS describe blocks (fold/trigger/query/format/fail-open), plus real-Agent integration tests on both the
ContextInjectorandMemoryManagerpaths asserting the injected text reaches the model but neveragent.messages.Checklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.