🐈 nanobot Memory: Less is More #566

Re-bin · 2026-02-12T15:23:26Z

Re-bin
Feb 12, 2026
Maintainer

The Philosophy

We redesigned nanobot's memory system. Not by adding more — by removing almost everything.

Bruce Lee said:

"It's not the daily increase but daily decrease. Hack away at the unessentials."

Most AI agent memory systems chase the same pattern: vector databases, embedding models, semantic retrieval, chunking strategies, re-ranking pipelines... They build a brain that looks like a human brain. But agents are not humans. They don't need to "recall" — they need to find.

"Absorb what is useful, discard what is useless, and add what is specifically your own."

So we asked: what is the simplest memory that actually works?

The Architecture

Two files. That's it.

File	Role	Access
`MEMORY.md`	Long-term facts (who the user is, preferences, project context)	Always in system prompt
`HISTORY.md`	Append-only event log (timestamped conversation summaries)	`grep` on demand

No vector DB. No embeddings. No RAG pipeline. No external dependencies.

Why grep beats RAG for agent memory:

Deterministic — same query, same results, every time. No embedding drift, no similarity threshold tuning.
Auditable — open the file, read it with your eyes. Try that with a vector database.
Zero cost — no embedding API calls, no database hosting, no index maintenance.
Composable — grep -i "user preference" HISTORY.md works in any shell, any OS, any context.

Claude Code uses the same approach — no RAG, just text files and grep search. If it's good enough for Anthropic's own coding agent, it's good enough for us.

Auto-Consolidation

When the conversation grows beyond a configurable threshold (memoryWindow), nanobot automatically:

Summarizes old messages → appends to HISTORY.md
Extracts new long-term facts → updates MEMORY.md
Trims the session → keeps recent context

The agent doesn't need to "decide" to remember. It just happens. Like breathing.

"Be water, my friend." Water doesn't decide to fill a cup. It just flows. Nanobot's memory doesn't require the agent to manage it — it adapts to the shape of the conversation automatically.

By the Numbers

Memory module: 110 lines → 30 lines (−73%)
External dependencies added: 0
Config: one number (memoryWindow: 50)

Less code. Fewer bugs. More reliable.

We believe the best agent infrastructure is the kind you forget is there. If you're interested in this approach, check out PR #565 or try it yourself — nanobot onboard and start chatting.

Skye-flyhigh · 2026-02-12T20:52:57Z

Skye-flyhigh
Feb 12, 2026

Interesting I will be watching where you go. I will install a database on my side of the fork and a memory manager.

0 replies

ArghyaRanjanDas · 2026-02-12T21:13:12Z

ArghyaRanjanDas
Feb 12, 2026

Really cool direction.
One thought (not sure if it’s already planned): to encourage a lightweight convention for the AI in MEMORY.md / HISTORY.md using special tokens/symbols for deterministic retrieval, e.g. #tags for topics/categories and @mentions for people/places. This keeps everything dependency-free while making grep queries more precise and composable.

3 replies

ArghyaRanjanDas Feb 12, 2026

Like say for a friends birthday
Something like this
@friend has #birthday on 2026-02-12, the user decided to gift a Macbook-mini
or
#birthday @friend date:2026-02-12 gift_idea:book

We can then have
grep -i "#birthday " HISTORY.md

Re-bin Feb 13, 2026
Maintainer Author

Nice feedback, love it.

borawjm Feb 13, 2026

Friend lost a macbook-mini :(

aryuu-allocfun · 2026-02-13T02:51:56Z

aryuu-allocfun
Feb 13, 2026

I love the "Be water" philosophy - it will grow into something worth building. I'm a Bruce Lee fan too!

---my words ---
Observe water from the sky, when rain fall on top of mountain, it forms many streams. These streams runs downhill to find other ponds of water to amass or dry out, the main stream run into each other to form a river, when rivers eventually congregate into lakes. That's also how human memory is created. first came from incoming words (from book, from talks), the streams formed when attention continue to focus. connection with with other active trail of thoughts to form ideas (river), eventually those ideas become memory (lake) when they continue to gain focus of the mind (gravity).

-- LLM (edited) ---
Rain strikes the mountaintop. Instantly—streams. Each trickles downward, not by choice, by gravity. Some streams dry and disappear. Others find water already pooled, and grow. The strongest streams collide, merge, become river. Rivers gather into lakes.

No architect. No blueprint. Only gravity, doing what gravity does.

Now—the mind.

Words arrive. From written. From spoken. These are your rain. Most vanish on contact. But when attention stays, a stream forms. One line of thought, moving.

That stream encounters another—a memory already there, a question still alive. They do not meet by accident. They meet because your focus pulled them together. Stream joins stream. River.

An idea.

Hold that idea. Keep attending. Other rivers feed into it. What flowed now deepens. What moved now settles.

Lake.

Memory.

In nature, gravity gathers water. In mind, attention gathers thought. The process is the same. The force is different.

You do not construct memory. You allow it—by sustained focus, by letting streams find streams, by giving gravity time to work.

Watch water long enough, you will understand how memory works.

0 replies

danielyangfei · 2026-02-13T03:44:00Z

danielyangfei
Feb 13, 2026

Great idea.

0 replies

TreeTreeDi · 2026-02-13T06:05:16Z

TreeTreeDi
Feb 13, 2026

Great idea, the previous memory systems were too complex

0 replies

foresturquhart · 2026-02-13T14:53:30Z

foresturquhart
Feb 13, 2026

My view is that it's good to have core memory, history and personality, and also have a vector database of some kind for long term memory. But designing prompts that ensure proper use of these different memory types is tricky.

0 replies

luzhi · 2026-02-14T03:25:28Z

luzhi
Feb 14, 2026

Cool! One more idea on enhancing memory "summarisation": Is that possible to use LLM to "pick up" the points that most relevant to recent/short-term or current context? And ignore those minors and not relevant anymore.

This will keep in the memory or history, the LLM only focus on relevant and important information.

One possible strategy is to use Skill on this summarisation? Thanks.

0 replies

hosainnet · 2026-02-14T08:05:58Z

hosainnet
Feb 14, 2026

Thanks for this, quick question - if I have an older version of nanobot running (on docker) what's the upgrade path to make use of this new memory system? Is it enough for me to deploy the new version on my existing workspace or is there any migration needed?

0 replies

derdide · 2026-02-20T11:24:32Z

derdide
Feb 20, 2026

Hi,
I've proposed to expand the memory with a file-based knowledge base: #568

The current system (MEMORY.md + HISTORY.md) handles flat facts and chronological logs well, but lacks topic-organized, structured knowledge that the agent can navigate and build over time. After 50 messages, conversational context about specific topics, people, decisions, and projects is lost — even though HISTORY.md captures a summary, there's no way to look up "what do I know about Docker" or "what decisions did we make about authentication."

The knowledge feature enables a knowledge layer on top of the existing memory system. It's complementary — MEMORY.md stays for quick-access facts, HISTORY.md stays for grep-searchable events, and knowledge/ adds depth and structure.

3 replies

Ayko9Labs Feb 20, 2026

is there any QDrant DB usage planned yet? I believe that would be spot on for memorising vector data and solving the ever-growing memory-file sizes.

foresturquhart Feb 22, 2026

LanceDB could also be a good option, I used it for my own custom agent solution. https://github.com/lancedb/lancedb

HaiDang2001VN Feb 24, 2026

I also think that we should have some forms of long-term memory that contains further details of past messages rather than just wipe them out of the HISTORY.md. Upon reading the other claws such as ZeroClaw, I discovered that they combine the embedding search with grep/keyworded search so as to partially mitigate the embedding drift. We may have an option to fallback on grep whenever the embedding model is unavailable or similar.

Ayko9Labs · 2026-02-24T17:14:30Z

Ayko9Labs
Feb 24, 2026

yes, and that is exactly why I think adding a QDrant RAG layer would be a good idea. I developed one myself and it performs very well on my system, however it is a tool as of now - I believe it should be embedded to a deeper degree albeit leaving it as a tool option also. The code should be there for QDrant in general and it should allow to use existing local LAN QDrant endpoints, albeit with a customisable collection identifier .

3 replies

butterflyio Feb 26, 2026

Whats your actual use case for this Qdrant RAG layer @Ayko9Labs ? - long term memory or something else?

Ayko9Labs Mar 1, 2026

that too but also short term memory, conversational tracking, semantic recalling - I reckon it would be way more efficient than text files.

derdide Mar 1, 2026

Good point on the text files, I can see how these will grow up to the point of becoming unusable.

On the other side, text files allow to capture semantics and relationships which was my goal (hence calling it "knowledge"). I need to look into some GraphDB, maybe Falkor or something. I'll take your point on using it as a tool.

quakeboy · 2026-03-08T17:03:16Z

quakeboy
Mar 8, 2026

Do we have something similar to project level memory (like Claude Code does)? - They have seem to have sub-agent memory that is global to the sub-agent and sub-agent memory per project.

Or is there no concept of projects for the moment?

4 replies

Skye-flyhigh Mar 8, 2026

I have coded a memory mcp module which you could connect all you mcp compatible agent to.

It's a public repo on my profile named Mnemo-mcp.

pverstegen Mar 9, 2026

This looks impressive. I apologize for my relative lack of knowledge here. I'm running nanobot in a docker container. Do you have any recommendations on how to configure mnemo-mcp in that scenario? Would it be a matter of adding the service within my existing docker-compose for nanobot or should it be a separate thing? It seems like permissions would be easier if it ran from the same docker-compose.yml.

Skye-flyhigh Mar 9, 2026

For mnemo-mcp, you follow the instruction on the README of that repo. What's missing from the explanation is install of ollama embedder if you want to stay local (also cheap computing).
You plug that mcp like all the other mcps to the blackcat, through the .config.

pverstegen Mar 14, 2026

Got it working with LM studio even. Slick. Thanks for sharing!

adm1neca · 2026-03-12T16:26:40Z

adm1neca
Mar 12, 2026

Can you guys look into expanding the memory layers? i like the setup as suggested by Daniel https://danielmiessler.com/blog/personal-ai-infrastructure#tier-1-session-memory

1 reply

T3chC0wb0y Mar 16, 2026

This is great if you are coding the LLM. I think it would get very large if you were trying to use this on the local agent unless you are also using a local LLM. In that scenario, I think it would be very interesting.

On the other hand, if you are connecting via API to a cloud based LLM, you then have to send all this data every call to ensure that it can "behave" as desired and the token spend gets bloated quickly.

jsapede · 2026-03-30T06:53:21Z

jsapede
Mar 30, 2026

would it be possible to have a switch to turn off totally MEMORY.md from config file ?
using vector semantic automem mcp here, and always have some conflicting settings with MEMORY.md

2 replies

TempestuousMan May 18, 2026

What are the benefits of your setup you in particular were going for? I have found long-term usability and personalization to be lacking, I want a much more personal touch. Does this method accomplish that? I also don't like the idea of wasting moments that could/should be influential to the bot. But getting it to a point of having enough information on me and learning from interaction to be able to "choose" these moments after learning and having clear instructions and/or guidelines. Also, are you still using or have you jumped to something else?

T3chC0wb0y May 19, 2026

I built my own MCP layer with an SQL DB on the local box. I liked the idea of Open Brain but did not want the bloat or the cloud storage. I've designed it so that all native memory pieces work alongside the new MCP and defined what to store at each layer. The guiding principal was to make context memory more like Sesame Maya while retaining all of the coding functions. Now, when I create a new session, it retains context stored in all the memory locations based on the category of each specific memory. This also uses less tokens because it no longer has to search every time for what you are referencing.

TempestuousMan · 2026-05-18T08:41:20Z

TempestuousMan
May 18, 2026

How many people are still using Nanobot now? Id like to hear from both those who are and have made some adjustments to the level of personalization and utility possible with nanobot, or if they jumped ship and why?
Im fairly new in this whole field and this is one area that drew me in last year and has been a HUGE help in organizing my thoughts, life, goals, projects, tasks, etc. But I feel it could be much better.

2 replies

leonardorey1992 May 18, 2026

For me current config is fair, but yes, improving memory working is always a good thing cause having the possibility to ask things from previous years or so makes lot of sense for a personal usage of an assistant. I think u're asking for this :D anyways congrats to the team for all the job done :D

T3chC0wb0y May 19, 2026

See my response above. I also built a reverse proxy, multiple gateway orchestrations, and a natural language router for GPT integration from ChatGPT Enterpise.

ferhimedamine · 2026-06-11T21:51:42Z

ferhimedamine
Jun 11, 2026

"Less is more" is exactly right, and the thread has been circling around a key question that @luzhi and @foresturquhart raised from different angles: how do you decide what to keep and what to drop as the memory grows?

The MEMORY.md + HISTORY.md split is a clean starting point, but the auto-consolidation step — summarize old messages, extract long-term facts, trim the session — is doing a lot of heavy lifting. The quality of that summarization determines whether "less" means "high-signal" or just "less." The distinction matters because naive summarization treats all content equally, compressing important decisions and routine exchanges at the same rate. @luzhi suggested using the LLM to pick up points most relevant to recent context, which is the right direction. In our system, we score each memory by importance (how critical is this information) multiplied by access count (how often has it been retrieved) multiplied by relevance to the current task. Low-scoring memories decay naturally without explicit deletion, while critical memories persist indefinitely. The result is that retrieval quality actually improves over time because the noise floor drops — exactly the "less is more" effect, but achieved through continuous curation rather than periodic bulk summarization.

@derdide proposed a knowledge base layer (topic-organized files) and @Ayko9Labs proposed a Qdrant vector layer. These are not competing with the MEMORY.md pattern — they solve the retrieval problem that grep cannot. Grep is deterministic and fast for exact matches, but it cannot answer "what do I know about authentication" when the agent stored the memory using the word "login" or "auth flow." That is where vector search adds value: semantic matching across vocabulary differences. The composable architecture is: MEMORY.md for always-loaded identity and rules (procedural memory), HISTORY.md for grep-searchable event log (episodic memory), and a vector index for semantic retrieval when you do not know the exact keywords (semantic memory). Each layer has a different retrieval strategy because each stores a different kind of knowledge. The key is that all three layers should share the same importance scoring so that curation decisions are consistent — a low-importance memory should be low-importance regardless of which layer it lives in.

One concrete implementation detail for the consolidation pipeline: before summarizing old messages into MEMORY.md, run a deduplication pre-pass. Agents often store the same fact multiple times in slightly different phrasings, and naive consolidation preserves all variants. Cluster similar memories first, then extract the canonical version for each cluster, then merge into the long-term store. This alone can reduce the memory footprint by 30-40% without losing any information.

1 reply

jsapede Jun 12, 2026

I've vibe bullt a dream-memory, parallel to the internal dream that collects incremental history entries, analyse, compare and summarize them using non reasoning model then inject new memories in automem (falkor + qdrant), deduplicate, corrects discrepancies, and correct/recreate relations, and embeddings.

It also seems that the internal dream procédure has strict comparaison ratio that preuves finally very few new memory.md entries.

chfrank-cgn · 2026-06-12T08:25:58Z

chfrank-cgn
Jun 12, 2026

Is there a way to return to this memory model and turn off dreaming? (Other than reverting to a 0.1.4 release)

0 replies

🐈 nanobot Memory: Less is More #566

Uh oh!

Uh oh!

Re-bin Feb 12, 2026 Maintainer

The Philosophy

The Architecture

Auto-Consolidation

By the Numbers

Replies: 16 comments · 19 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Re-bin Feb 13, 2026 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Re-bin
Feb 12, 2026
Maintainer

Replies: 16 comments 19 replies

Re-bin Feb 13, 2026
Maintainer Author