Feature Request: Embedding-Based Memory Search for Persistent Agent Learning


## Problem Statement

Currently, SympoziumInstance memory uses a ConfigMap with a flat `MEMORY.md` file (max 256KB). While this provides basic persistence between agent runs, it has significant limitations:

1. **No search capability** — Agents must read the entire MEMORY.md at the start of each run, consuming context window tokens regardless of relevance
2. **Doesn't scale** — As memory grows, the full file becomes too large for the context window
3. **No semantic retrieval** — Agents can't find past investigations based on similarity to the current problem
4. **Each run starts from scratch** — Even if the agent diagnosed a nearly identical issue 2 weeks ago, it has no efficient way to find and reuse that knowledge

## Use Case: Learning from Past Investigations

In SRE troubleshooting, many issues are recurring or similar. An agent that investigated a Kafka queue problem last week should be able to:

1. Receive a new alert about Kafka consumer lag
2. **Search memory** for past Kafka-related investigations
3. Find the previous root cause analysis and resolution steps
4. Skip the dead ends from the first investigation
5. Resolve the issue faster

Without semantic search, the agent either:
- Reads the entire memory (expensive, hits context limits)
- Starts from zero every time (wasteful, slower)

This is especially critical for Sympozium because agents are **ephemeral** (pod-per-run) — there's no conversation history to fall back on. Memory is the only continuity mechanism.

## Proposed Solution

### Option A: Built-in Embedding Support (Recommended)

Add an optional vector search layer to the existing memory system:

```yaml
apiVersion: sympozium.ai/v1alpha1
kind: SympoziumInstance
spec:
  memory:
    enabled: true
    maxSizeKB: 1024
    search:
      enabled: true
      provider: ollama                    # or "openai"
      embeddingModel: nomic-embed-text    # lightweight, runs on Ollama
      baseUrl: http://ollama:11434        # embedding model endpoint
      vectorStore: persistent-volume      # or "qdrant", "chromadb"
      chunkSize: 512                      # tokens per chunk
      topK: 5                             # results per search
    systemPrompt: |
      You have access to a search_memory tool.
      Before starting any investigation, search memory for similar past issues.
      After completing an investigation, store key findings for future reference.
```

**How it works:**
1. Controller watches MEMORY.md ConfigMap for changes
2. On update: chunks the text → generates embeddings via configured provider → stores vectors
3. Agent pods get a `search_memory` tool injected (similar to how MCP tools are mounted)
4. Agent searches before investigating, writes findings after

### Option B: MCP-Based Memory Server

A dedicated MCP server that handles memory storage and retrieval:

```yaml
spec:
  mcpServers:
    - name: memory
      toolsPrefix: memory
      url: http://memory-server.sympozium-system:8080
```

The memory MCP server would expose:
- `memory_search(query, topK)` — semantic search over past entries
- `memory_store(content, tags)` — store new findings
- `memory_list(tags, limit)` — list recent entries

### Option C: SkillPack Approach

A reusable SkillPack that any instance can mount:

```yaml
spec:
  skills:
    - skillPackRef: memory-search
      params:
        EMBEDDING_MODEL: nomic-embed-text
        OLLAMA_URL: http://ollama:11434
        STORAGE: /data/memory-vectors
```

## Architecture Considerations

### Embedding Model Options

| Model | Size | Speed | Quality | Where |
|---|---|---|---|---|
| nomic-embed-text | 274MB | Fast | Good | Ollama (local) |
| mxbai-embed-large | 670MB | Medium | Better | Ollama (local) |
| text-embedding-3-small | API | Fastest | Good | OpenAI (cloud) |

For air-gapped / local-model users, Ollama-based embeddings are essential — this aligns with Sympozium's strength of working with local models.

### Vector Storage Options

1. **PersistentVolume with embedded DB** (simplest) — Use something like SQLite with `sqlite-vss` or `hnswlib` bundled into the controller
2. **In-cluster Qdrant/ChromaDB** — More scalable, but adds infrastructure
3. **ConfigMap-based** (current pattern extended) — Store serialized vectors in a ConfigMap; simplest but limited by ConfigMap size (1MB)

### Data Flow

```
Agent Run Completes
      ↓
Controller detects MEMORY.md update
      ↓
Chunk text → Generate embeddings → Store vectors
      ↓
Next Agent Run starts
      ↓
Agent calls search_memory("kafka consumer lag")
      ↓
Vector similarity search → Top-K results returned
      ↓
Agent uses past findings to accelerate investigation
```

## Benchmark Evidence

From our [AI Agent Benchmark](https://github.com/henrikrexed/k8s-ai-agent-benchmark) comparing kagent, Sympozium, and HolmesGPT across 13 scenarios:

- Agents frequently encounter **similar failure patterns** across scenarios (e.g., multiple scenarios involve feature flag misconfigurations, Kafka issues, or pod crashloops)
- An agent with memory search could have reused diagnostic steps from scenario 1 (basic crashloop) when encountering scenario 7 (complex crashloop with timeout)
- Token consumption could be reduced by **30-50%** on recurring patterns if the agent finds relevant past context instead of re-discovering it

## Prior Art

- **OpenClaw** uses MEMORY.md + daily notes with Voyage AI embeddings for semantic search across memory files. Agents call `memory_search(query)` before answering questions about prior work.
- **LangChain/LangGraph** have built-in memory stores with vector retrieval
- **CrewAI** supports long-term memory with embedding-based search

## Summary

| Approach | Complexity | Scalability | Local Model Support |
|---|---|---|---|
| A: Built-in | Medium | High | ✅ Ollama embeddings |
| B: MCP Server | Medium | High | ✅ Any embedding API |
| C: SkillPack | Low | Medium | ✅ Ollama embeddings |



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Embedding-Based Memory Search for Persistent Agent Learning #45

Problem Statement

Use Case: Learning from Past Investigations

Proposed Solution

Option A: Built-in Embedding Support (Recommended)

Option B: MCP-Based Memory Server

Option C: SkillPack Approach

Architecture Considerations

Embedding Model Options

Vector Storage Options

Data Flow

Benchmark Evidence

Prior Art

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model	Size	Speed	Quality	Where
nomic-embed-text	274MB	Fast	Good	Ollama (local)
mxbai-embed-large	670MB	Medium	Better	Ollama (local)
text-embedding-3-small	API	Fastest	Good	OpenAI (cloud)

Approach	Complexity	Scalability	Local Model Support
A: Built-in	Medium	High	✅ Ollama embeddings
B: MCP Server	Medium	High	✅ Any embedding API
C: SkillPack	Low	Medium	✅ Ollama embeddings

Feature Request: Embedding-Based Memory Search for Persistent Agent Learning #45

Description

Problem Statement

Use Case: Learning from Past Investigations

Proposed Solution

Option A: Built-in Embedding Support (Recommended)

Option B: MCP-Based Memory Server

Option C: SkillPack Approach

Architecture Considerations

Embedding Model Options

Vector Storage Options

Data Flow

Benchmark Evidence

Prior Art

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions