Added git-memory-store design doc#2895
Conversation
| MemoryManager already handles extraction (promoting observations into long-term memory) and retrieval (injecting stored knowledge back into context). What it lacks is a maintenance layer and a shared substrate: | ||
|
|
||
| - **No maintenance** — there is no built-in way to deduplicate, resolve contradictions, or restructure stored knowledge after it's written. MemoryManager writes and retrieves, but never improves what's stored. | ||
| - **No unified timeline** — when L1 and L2 use separate backends, there is no single view that shows when sessions happened, when facts were extracted, and how knowledge evolved. Debugging requires checking multiple systems independently. |
There was a problem hiding this comment.
Would be helpful to define L1 and L2 before referencing them
|
|
||
| --- | ||
|
|
||
| ## Decision |
There was a problem hiding this comment.
doc nit: It feels weird that we are saying "Decision" right after "Context"
There was a problem hiding this comment.
I think it's time to update our template; https://github.com/maisieyanz/harness-sdk/blob/017553452bb81d618f46ea221bb86cf9abf259a1/team/designs/README.md
Weighing of Pros & Cons needs to be in there too
|
|
||
| ## Decision | ||
|
|
||
| `GitMemoryStore` is a unified, git-backed storage layer that implements both the `Storage` interface (for `ContextManager`, L1) and the `MemoryStore` interface (for `MemoryManager`, L2) against a single git repository. It serves as a single versioned, inspectable, diffable repository containing everything an agent has learned and experienced (session history, extracted facts, and learned skills). |
There was a problem hiding this comment.
so this is a new memory implementation? did we even have a file based memory implementation before moving onto timeline based issues?
There was a problem hiding this comment.
Yes this is a new memory implementation built on top of Thomas' memory manager
| MemoryManager already handles extraction (promoting observations into long-term memory) and retrieval (injecting stored knowledge back into context). What it lacks is a maintenance layer and a shared substrate: | ||
|
|
||
| - **No maintenance** — there is no built-in way to deduplicate, resolve contradictions, or restructure stored knowledge after it's written. MemoryManager writes and retrieves, but never improves what's stored. | ||
| - **No unified timeline** — when L1 and L2 use separate backends, there is no single view that shows when sessions happened, when facts were extracted, and how knowledge evolved. Debugging requires checking multiple systems independently. |
There was a problem hiding this comment.
is this for episodic memory? why do we need timeline? what do we use it for?
There was a problem hiding this comment.
It helps with debugging especially if an agent gives a wrong response etc
| constructor(config: GitMemoryStoreConfig) | ||
|
|
||
| // --- Storage (ContextManager L1) --- | ||
| async store(key: string, content: Uint8Array, contentType?: string): Promise<string> |
There was a problem hiding this comment.
could consider just defining one store/write method and specify the storage location on the function ie. store for both L1 and L2 is same function with diff arg since storage is unified.
curious to hear arguments for or against. im curious what thomas thinks
There was a problem hiding this comment.
I'm wondering if we should conflate the MemoryStore with the L1 storage. Since you already propose an agnostic write system, and L1 storage should be its own construct (do we use session manager today?), I'd expect to use that same pattern:
- I create a
StoragefromFileStoreConfig/ - I create a
FileMemoryStoragethat I plug into myMemoryManager - I create
L1Storagethat I plug into myL1Manager/SessionManager
|
|
||
| const agent = new Agent({ | ||
| model, | ||
| contextManager: new ContextManager({ |
There was a problem hiding this comment.
Is ContextManager something we already have?
There was a problem hiding this comment.
yes this project is building off of the existing context manager and memory manager implementations
There was a problem hiding this comment.
no the class doesn't exist yet, but working on it.
at minimum if we do not want to block this on an elegant context solution, what this requires is a storage param added to the contextManager that writes out to L1 on eviction which should be super simple.
from my perspective, if context work somehow gets dropped the memory manager implementation with consolidation is enough for this project. context manager L1 is nice to have session based data for, but less interesting imo than the memory consolidation aspect.
|
|
||
| MemoryManager already handles extraction (promoting observations into long-term memory) and retrieval (injecting stored knowledge back into context). What it lacks is a maintenance layer and a shared substrate: | ||
|
|
||
| - **No maintenance** — there is no built-in way to deduplicate, resolve contradictions, or restructure stored knowledge after it's written. MemoryManager writes and retrieves, but never improves what's stored. |
There was a problem hiding this comment.
LLM will handle the update? any exmaple of contraditions?
|
|
||
| ## Consolidation | ||
|
|
||
| Consolidation improves memory quality after facts accumulate. It is a developer-invoked Strands agent exposed as a method on `MemoryManager` (defined under `src/memory/consolidation/`). It reads stored knowledge, reasons across files, and writes changes directly to the git repo via filesystem tools (`readFile`, `writeFile`, `deleteFile`, `gitCommit`). Every change is a git commit, so bad consolidation is trivially reversible with `git revert`. |
There was a problem hiding this comment.
When we say " It is a developer-invoked" are we just saying that it's up to callers to control when it's enacted?
I'm guessing we do want some sort of "automatic" behavior for agents more generally
There was a problem hiding this comment.
The developer can control the consolidation frequency. This is explicitly set in the GitHub action that triggers the consolidation.
|
|
||
| The `metadata` fields come from the `ModelExtractor` when automatic extraction is configured — its system prompt instructs it to produce a title and description for each extracted fact (see [Appendix A](#appendix-a-extraction-configuration) for the configuration example). When the agent uses the `store_memory` tool instead (explicit write), no extractor is involved — `add()` receives raw content with no metadata and falls back to deriving both from the content. | ||
|
|
||
| **`search(query, options?)`** |
There was a problem hiding this comment.
I want us to focus more in here. Behind "git memory" there is a filesystem based memory implementation proposal here, that we do not spend enough time explaining. We need to dive into there
|
|
||
| ### Architecture | ||
|
|
||
| `GitMemoryStore` is a single class that implements both the `Storage` interface (for `ContextManager`) and the `MemoryStore` interface (for `MemoryManager`). This dual implementation is what enables the unified timeline — both layers write commits to the same repo, so `git log` shows the complete history of sessions and knowledge together. |
There was a problem hiding this comment.
Do we need an interface that combines them? Will all customers need to implement both? cc @lizradway @opieter-aws
There was a problem hiding this comment.
this is an ongoing question in which we have begged for a unified storage interface, but haven't had the time to whip up a good design.
if i remember correctly i beleive we discussed this with @JackYPCOnline / you/ jonathan the other day and decided we would prioritize unified storage interface in the next sprint?
There was a problem hiding this comment.
transcripts are a data type/definition. not necessarily storage. so if we need to fix/improve this, let's do :)
|
|
||
| **`search(query, options?)`** | ||
|
|
||
| Required by the `MemoryStore` interface. Performs keyword matching (grep) against filenames, `description` frontmatter, and file content, excluding `knowledge/system/` (already loaded in full). Returns the top matches as `MemoryEntry[]`, ranked by term frequency. No model call, no embeddings. |
There was a problem hiding this comment.
do we know keyword matching is good enough?
There was a problem hiding this comment.
progressive disclosure is the primary retrieval / search mechanism. The keyword matching search() implementation is more of a fallback
There was a problem hiding this comment.
currently is what our context offloader does for search pretty much is grep/regex and that has performed pretty well.
i actually experimented with some semantic similarly/relevance based mechanisms as well briefly and it performed a tad worse (but barely).
|
|
||
| `GitMemoryStore` is a unified, git-backed storage layer that implements both the `Storage` interface (for `ContextManager`, L1) and the `MemoryStore` interface (for `MemoryManager`, L2) against a single git repository. It serves as a single versioned, inspectable, diffable repository containing everything an agent has learned and experienced (session history, extracted facts, and learned skills). | ||
|
|
||
| The existing Strands API remains unchanged. `ContextManager` still owns L0 <--> L1, `MemoryManager` still owns L1 --> L2. What changes is the physical storage: instead of separate, disconnected backends for each layer, both write to the same git repo. Every write from any layer produces an informative git commit, giving developers a complete audit trail using standard git tooling (`git log`, `git diff`, `git revert`). |
There was a problem hiding this comment.
So two questions:
- I assume that I can use this as just context storage or just as a memory store right?
- Are there changes being proposed to the MemoryManager in this doc? It seems like we're pointing out shortcomings with the current design?
There was a problem hiding this comment.
I wouldn't be changing the actual MemoryManager interface this would just be an alternative memory storage for agents.
| }); | ||
| ``` | ||
|
|
||
| Scheduling frequency is controlled externally by the GitHub Action or cron job. See [Appendix B](#appendix-b-github-action-yaml) for a GitHub Action example. |
There was a problem hiding this comment.
I think we're talking about two things in this doc? 1) the feature overall (git based memory) and 2) Strand's usage of it?
I think (1) should be the focus with (2) being a dog-fooding exercise which is worth talking about, but ultimately is a specific use case that can be addressed in a couple of different ways
|
|
||
| ### Context Loading | ||
|
|
||
| Files in `knowledge/system/` are always loaded in full into the system prompt. This is where core context lives (persona, key preferences, critical project facts). Everything outside `system/` is visible by filename + description only, loaded when the agent reads it. |
There was a problem hiding this comment.
Is it up to the user to explain the file structure, or is that handled by the primitive?
There was a problem hiding this comment.
The primitive would handle this
|
|
||
| // Option 2: Standalone script (no agent session needed — for cron, GitHub Action, CLI) | ||
| const memoryManager = new MemoryManager({ stores: [myStore] }); | ||
| await memoryManager.consolidate({ |
There was a problem hiding this comment.
consolidate doesn't exist on the manager now. If it's specific to this store, it might make more sense to expose it there
There was a problem hiding this comment.
i would imagine this would be exposed on other stores eventually (ie. file system at minimum)
There was a problem hiding this comment.
Agree with @notowen333 , this is a store functionality
|
|
||
| Each session writes to its own branch, merges back to `main` on close. | ||
|
|
||
| **Why rejected:** Path-based isolation (`sessions/{id}.md`) achieves the same separation without branch management overhead or merge conflicts. |
There was a problem hiding this comment.
What happens when 2 agents try to modify the same file for things like consolidation?
|
|
||
| All extracted facts land in `knowledge/facts/` by default — `GitMemoryStore.add()` writes there unconditionally for simplicity and to avoid a classification model call on every extraction. Consolidation is therefore responsible for reorganizing files into appropriate subdirectories (`skills/`, `system/`, etc.) during offline maintenance, when it has full cross-file context to make informed categorization decisions. | ||
|
|
||
| ### How It Works |
There was a problem hiding this comment.
what do we actually generate with consolidation though?
| }); | ||
| ``` | ||
|
|
||
| Scheduling frequency is controlled externally by the GitHub Action or cron job. See [Appendix B](#appendix-b-github-action-yaml) for a GitHub Action example. |
There was a problem hiding this comment.
Is this a feature for the SDK, or a feature for a specific agent? how scheduling done would depend on that answer
| - **No maintenance** — there is no built-in way to deduplicate, resolve contradictions, or restructure stored knowledge after it's written. MemoryManager writes and retrieves, but never improves what's stored. | ||
| - **No unified timeline** — when L1 and L2 use separate backends, there is no single view that shows when sessions happened, when facts were extracted, and how knowledge evolved. Debugging requires checking multiple systems independently. | ||
|
|
||
| A git-based approach addresses these issues: version-controlled memory provides built-in history, diffing, and rollback. A shared git repository for both layers provides a unified timeline across sessions and knowledge in a single `git log`. And a developer-invoked consolidation agent provides the missing maintenance mechanism. Since Strands is a client-side SDK with no server process, a scheduled GitHub Action is the natural trigger for consolidation. |
There was a problem hiding this comment.
How important are most of these properties for the memory; I think the biggest win that we get from git is the the snapshot of the FS at the specific time, the other items just seem like audit-ability vs concrete items that we actually need. I'm curious if this generalizes to something larger like "Snapshottable environment/filesystem" which is what the abstraction should be built on
There was a problem hiding this comment.
+1 ^ the way i read this proposal is "we want a filesystem based memory, and git can be a nice way to audit it"
I am not sure how git plugs into this thing functionally
There was a problem hiding this comment.
i think @opieter-aws has filebased memory on his roadmap. i worry if a filesystem based memory interface is over-engineered though and if we should instead just implement Snapshottable or something like that that this can extend
| interface GitMemoryStoreConfig { | ||
| // Required | ||
| name: string; | ||
| repoPath: string; |
There was a problem hiding this comment.
So this seems like we need the git repo to be accessible locally in the same filesystem as the top level agent runtime? Is that correct?
Where does the GitHub assumption start and stop in this design?
|
|
||
| The `operations` config controls which directives go into the agent's system prompt. They are prompt instructions — the LLM decides how to apply them. | ||
|
|
||
| | Operation | Agent behavior | Example | |
There was a problem hiding this comment.
Would these operations run sequentially in the order that they appear in the operations list in the config? Is there any sort of underlying priority of operations? I'd imagine the order will lead to varying qualities of consolidation
There was a problem hiding this comment.
The priority would be decided by the agent
|
|
||
|
|
||
|
|
||
| ### Method Behavior |
There was a problem hiding this comment.
You had mentioned git log earlier in the doc for getting a full session history - would be interested to see what git APIs you envision being used for each of these methods
| ``` | ||
| --- | ||
|
|
||
| ## Consolidation |
There was a problem hiding this comment.
how does this fit into extraction logic from memory? can't we use that one?
|
|
||
| LLM-powered agents struggle with maintaining and managing long-term memory effectively. As memory accumulates over extended interactions, memory quality degrades. In the Strands SDK, there is no built-in maintenance mechanism that can combine, deduplicate, resolve contradictions, or restructure isolated facts. Over long-horizon use, memory files accumulate redundancy and lose coherent structure, making retrieval less reliable and context windows less efficient. | ||
|
|
||
| MemoryManager already handles extraction (promoting observations into long-term memory) and retrieval (injecting stored knowledge back into context). What it lacks is a maintenance layer and a shared substrate: |
There was a problem hiding this comment.
You can position this slightly differently: a maintenance layer wouldn't necessarily be a functionality of the MM, but of the store implementation. There exist managed backends that serve this purpose, but these are server-based. What you introduce here is a local alternative to a managed memory store. This allows for both a maintenance layer, but also outside-of-the-agent-loop indexing (which enables semantic search rather than lexical search)
|
|
||
| A file-based memory system addresses these issues by organizing knowledge as a structured file hierarchy that the agent can navigate directly. By abstracting the storage layer behind a `FileBackend` interface, the same file-based memory system can be backed by a local filesystem, a git repository, S3, or any other store that supports basic file operations. A developer-invoked consolidation agent provides the missing maintenance mechanism — it reads accumulated knowledge, deduplicates redundant entries, resolves contradictions, and reorganizes files, running offline so it doesn't add latency to agent sessions. | ||
|
|
||
| The existing `BedrockKnowledgeBaseStore` addresses retrieval via managed vector search, but requires provisioned AWS infrastructure (Bedrock Knowledge Base, credentials, optional S3). This is well-suited for production and enterprise deployments where teams already have AWS infrastructure. `FileMemoryStore` targets the other end: individual developers, prototyping, and environments where standing up a managed service is unnecessary overhead. It requires zero external infrastructure, just a filesystem. |
There was a problem hiding this comment.
I'd move this up, as IMO this is the main point, and it avoids conflating the memory manager and memory store concepts
|
|
||
| The storage backend is abstracted behind a `FileBackend` interface — any system that can read, write, list, and delete files can serve as the underlying store. This enables the same memory system to run against a local directory, a git repository, S3, or a custom implementation without changing the memory logic. | ||
|
|
||
| The existing Strands API remains unchanged. `ContextManager` still owns L0 <--> L1, `MemoryManager` still owns L1 --> L2. What changes is the physical storage: instead of separate, disconnected backends for each layer, both write to the same file hierarchy. Every write from any layer is routed through the `FileBackend`, which determines how persistence, history, and atomicity are handled. |
There was a problem hiding this comment.
Can you devote a sentence on the state of L1 today? What are the current solutions?
|
|
||
| ### File Hierarchy | ||
|
|
||
| Both the `ContextManager` and `MemoryManager` write to the same file hierarchy but are isolated by path: L1 writes to `sessions/`, while L2 writes to `knowledge/`. Consolidation metadata lives in `consolidation/`. |
There was a problem hiding this comment.
Be mindful when you use MemoryManager and MemoryStore: the MemoryManager never writes anything, it just calls a MemoryStore's methods
|
|
||
| ## Progressive Disclosure | ||
|
|
||
| Not everything loads into context every turn. The agent retrieves relevant knowledge on demand by navigating the file hierarchy directly. LLMs are precise and accurate at scoped filesystem calls (listing directories, grepping for keywords, reading specific files), and progressive disclosure leverages this skill as the primary retrieval mechanism. |
There was a problem hiding this comment.
The agent retrieves relevant knowledge on demand by navigating the file hierarchy directly
Can you make this more concrete? Currently, when the agent searches or adds is mandated by the MemoryManager, this likely wouldn't be in control of the store itself
| The full directory listing of `knowledge/` with each file's `description` frontmatter is injected into the agent's system prompt every turn. The agent always knows what knowledge exists without loading the content: | ||
|
|
||
| ``` | ||
| knowledge/ |
There was a problem hiding this comment.
Are the sub directories static? How does the agent get context about what's in the sub-dir? Does it need that?
|
|
||
| Files in `knowledge/system/` are always loaded in full into the system prompt. This is where core context lives (persona, key preferences, critical project facts). Everything outside `system/` is visible by filename + description only, loaded when the agent reads it. | ||
|
|
||
| **Who manages `system/`:** Developers seed `system/` at repo creation with anything the agent always needs (persona, core preferences). The consolidation agent promotes and demotes files during offline maintenance, analyzing cross-session patterns to move broadly relevant files into `system/` and overly specific ones out. The main agent never writes to `system/` during a session. |
There was a problem hiding this comment.
How do we avoid system prompt bloat here if all files with all content in this directory go into the system prompt? Would be interesting to do some testing to see the difference with just context injection based on query.
| `FileMemoryStore` is the class that implements both the `Storage` interface (for `ContextManager`) and the `MemoryStore` interface (for `MemoryManager`). It operates on the file hierarchy through whichever `FileBackend` is provided. This dual implementation enables the unified timeline — both layers write to the same hierarchy, so the agent sees sessions and knowledge together. | ||
|
|
||
| ```typescript | ||
| interface FileMemoryStoreConfig { |
There was a problem hiding this comment.
Consider if you want to add namespaces here to allow for mutli-tenacy
|
|
||
| **`search(query, options?)`** | ||
|
|
||
| Required by the `MemoryStore` interface. Performs keyword matching against filenames, `description` frontmatter, and file content, excluding `knowledge/system/` (already loaded in full). Returns the top matches as `MemoryEntry[]`, ranked by term frequency. No model call, no embeddings. |
There was a problem hiding this comment.
May be out of scope, but think about how we can extend with an indexing layer to support semantic search
|
|
||
| Consolidation improves memory quality after facts accumulate. It is a developer-invoked Strands agent exposed as a method on `MemoryManager` (defined under `src/memory/consolidation/`). It reads stored knowledge, reasons across files, and writes changes through the `FileBackend`. Every change is versioned by the backend (via `history()`/`rollback()`), so bad consolidation is trivially reversible. | ||
|
|
||
| All extracted facts land in `knowledge/facts/` by default — `FileMemoryStore.add()` writes there unconditionally for simplicity and to avoid a classification model call on every extraction. Consolidation is therefore responsible for reorganizing files into appropriate subdirectories (`skills/`, `system/`, etc.) during offline maintenance, when it has full cross-file context to make informed categorization decisions. |
There was a problem hiding this comment.
Does the agent here create the files and add the descriptions?
| This serves as both an audit log and the cursor for the next "since-last" run. | ||
| ``` | ||
|
|
||
| ### Operations |
There was a problem hiding this comment.
I don't understand how the operations integrate with the agent / system prompt injection. Can you explain the mechanism?
Description
Related Issues
Documentation PR
Type of Change
Bug fix
New feature
Breaking change
Documentation update
Other (please describe):
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce new warnings.
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.