Problem Statement
Currently, SympoziumInstance memory uses a ConfigMap with a flat MEMORY.md file (max 256KB). While this provides basic persistence between agent runs, it has significant limitations:
- No search capability — Agents must read the entire MEMORY.md at the start of each run, consuming context window tokens regardless of relevance
- Doesn't scale — As memory grows, the full file becomes too large for the context window
- No semantic retrieval — Agents can't find past investigations based on similarity to the current problem
- Each run starts from scratch — Even if the agent diagnosed a nearly identical issue 2 weeks ago, it has no efficient way to find and reuse that knowledge
Use Case: Learning from Past Investigations
In SRE troubleshooting, many issues are recurring or similar. An agent that investigated a Kafka queue problem last week should be able to:
- Receive a new alert about Kafka consumer lag
- Search memory for past Kafka-related investigations
- Find the previous root cause analysis and resolution steps
- Skip the dead ends from the first investigation
- Resolve the issue faster
Without semantic search, the agent either:
- Reads the entire memory (expensive, hits context limits)
- Starts from zero every time (wasteful, slower)
This is especially critical for Sympozium because agents are ephemeral (pod-per-run) — there's no conversation history to fall back on. Memory is the only continuity mechanism.
Proposed Solution
Option A: Built-in Embedding Support (Recommended)
Add an optional vector search layer to the existing memory system:
apiVersion: sympozium.ai/v1alpha1
kind: SympoziumInstance
spec:
memory:
enabled: true
maxSizeKB: 1024
search:
enabled: true
provider: ollama # or "openai"
embeddingModel: nomic-embed-text # lightweight, runs on Ollama
baseUrl: http://ollama:11434 # embedding model endpoint
vectorStore: persistent-volume # or "qdrant", "chromadb"
chunkSize: 512 # tokens per chunk
topK: 5 # results per search
systemPrompt: |
You have access to a search_memory tool.
Before starting any investigation, search memory for similar past issues.
After completing an investigation, store key findings for future reference.
How it works:
- Controller watches MEMORY.md ConfigMap for changes
- On update: chunks the text → generates embeddings via configured provider → stores vectors
- Agent pods get a
search_memory tool injected (similar to how MCP tools are mounted)
- Agent searches before investigating, writes findings after
Option B: MCP-Based Memory Server
A dedicated MCP server that handles memory storage and retrieval:
spec:
mcpServers:
- name: memory
toolsPrefix: memory
url: http://memory-server.sympozium-system:8080
The memory MCP server would expose:
memory_search(query, topK) — semantic search over past entries
memory_store(content, tags) — store new findings
memory_list(tags, limit) — list recent entries
Option C: SkillPack Approach
A reusable SkillPack that any instance can mount:
spec:
skills:
- skillPackRef: memory-search
params:
EMBEDDING_MODEL: nomic-embed-text
OLLAMA_URL: http://ollama:11434
STORAGE: /data/memory-vectors
Architecture Considerations
Embedding Model Options
| Model |
Size |
Speed |
Quality |
Where |
| nomic-embed-text |
274MB |
Fast |
Good |
Ollama (local) |
| mxbai-embed-large |
670MB |
Medium |
Better |
Ollama (local) |
| text-embedding-3-small |
API |
Fastest |
Good |
OpenAI (cloud) |
For air-gapped / local-model users, Ollama-based embeddings are essential — this aligns with Sympozium's strength of working with local models.
Vector Storage Options
- PersistentVolume with embedded DB (simplest) — Use something like SQLite with
sqlite-vss or hnswlib bundled into the controller
- In-cluster Qdrant/ChromaDB — More scalable, but adds infrastructure
- ConfigMap-based (current pattern extended) — Store serialized vectors in a ConfigMap; simplest but limited by ConfigMap size (1MB)
Data Flow
Agent Run Completes
↓
Controller detects MEMORY.md update
↓
Chunk text → Generate embeddings → Store vectors
↓
Next Agent Run starts
↓
Agent calls search_memory("kafka consumer lag")
↓
Vector similarity search → Top-K results returned
↓
Agent uses past findings to accelerate investigation
Benchmark Evidence
From our AI Agent Benchmark comparing kagent, Sympozium, and HolmesGPT across 13 scenarios:
- Agents frequently encounter similar failure patterns across scenarios (e.g., multiple scenarios involve feature flag misconfigurations, Kafka issues, or pod crashloops)
- An agent with memory search could have reused diagnostic steps from scenario 1 (basic crashloop) when encountering scenario 7 (complex crashloop with timeout)
- Token consumption could be reduced by 30-50% on recurring patterns if the agent finds relevant past context instead of re-discovering it
Prior Art
- OpenClaw uses MEMORY.md + daily notes with Voyage AI embeddings for semantic search across memory files. Agents call
memory_search(query) before answering questions about prior work.
- LangChain/LangGraph have built-in memory stores with vector retrieval
- CrewAI supports long-term memory with embedding-based search
Summary
| Approach |
Complexity |
Scalability |
Local Model Support |
| A: Built-in |
Medium |
High |
✅ Ollama embeddings |
| B: MCP Server |
Medium |
High |
✅ Any embedding API |
| C: SkillPack |
Low |
Medium |
✅ Ollama embeddings |
Problem Statement
Currently, SympoziumInstance memory uses a ConfigMap with a flat
MEMORY.mdfile (max 256KB). While this provides basic persistence between agent runs, it has significant limitations:Use Case: Learning from Past Investigations
In SRE troubleshooting, many issues are recurring or similar. An agent that investigated a Kafka queue problem last week should be able to:
Without semantic search, the agent either:
This is especially critical for Sympozium because agents are ephemeral (pod-per-run) — there's no conversation history to fall back on. Memory is the only continuity mechanism.
Proposed Solution
Option A: Built-in Embedding Support (Recommended)
Add an optional vector search layer to the existing memory system:
How it works:
search_memorytool injected (similar to how MCP tools are mounted)Option B: MCP-Based Memory Server
A dedicated MCP server that handles memory storage and retrieval:
The memory MCP server would expose:
memory_search(query, topK)— semantic search over past entriesmemory_store(content, tags)— store new findingsmemory_list(tags, limit)— list recent entriesOption C: SkillPack Approach
A reusable SkillPack that any instance can mount:
Architecture Considerations
Embedding Model Options
For air-gapped / local-model users, Ollama-based embeddings are essential — this aligns with Sympozium's strength of working with local models.
Vector Storage Options
sqlite-vssorhnswlibbundled into the controllerData Flow
Benchmark Evidence
From our AI Agent Benchmark comparing kagent, Sympozium, and HolmesGPT across 13 scenarios:
Prior Art
memory_search(query)before answering questions about prior work.Summary