Skip to content

Feature Request: Embedding-Based Memory Search for Persistent Agent Learning #45

@henrikrexed

Description

@henrikrexed

Problem Statement

Currently, SympoziumInstance memory uses a ConfigMap with a flat MEMORY.md file (max 256KB). While this provides basic persistence between agent runs, it has significant limitations:

  1. No search capability — Agents must read the entire MEMORY.md at the start of each run, consuming context window tokens regardless of relevance
  2. Doesn't scale — As memory grows, the full file becomes too large for the context window
  3. No semantic retrieval — Agents can't find past investigations based on similarity to the current problem
  4. Each run starts from scratch — Even if the agent diagnosed a nearly identical issue 2 weeks ago, it has no efficient way to find and reuse that knowledge

Use Case: Learning from Past Investigations

In SRE troubleshooting, many issues are recurring or similar. An agent that investigated a Kafka queue problem last week should be able to:

  1. Receive a new alert about Kafka consumer lag
  2. Search memory for past Kafka-related investigations
  3. Find the previous root cause analysis and resolution steps
  4. Skip the dead ends from the first investigation
  5. Resolve the issue faster

Without semantic search, the agent either:

  • Reads the entire memory (expensive, hits context limits)
  • Starts from zero every time (wasteful, slower)

This is especially critical for Sympozium because agents are ephemeral (pod-per-run) — there's no conversation history to fall back on. Memory is the only continuity mechanism.

Proposed Solution

Option A: Built-in Embedding Support (Recommended)

Add an optional vector search layer to the existing memory system:

apiVersion: sympozium.ai/v1alpha1
kind: SympoziumInstance
spec:
  memory:
    enabled: true
    maxSizeKB: 1024
    search:
      enabled: true
      provider: ollama                    # or "openai"
      embeddingModel: nomic-embed-text    # lightweight, runs on Ollama
      baseUrl: http://ollama:11434        # embedding model endpoint
      vectorStore: persistent-volume      # or "qdrant", "chromadb"
      chunkSize: 512                      # tokens per chunk
      topK: 5                             # results per search
    systemPrompt: |
      You have access to a search_memory tool.
      Before starting any investigation, search memory for similar past issues.
      After completing an investigation, store key findings for future reference.

How it works:

  1. Controller watches MEMORY.md ConfigMap for changes
  2. On update: chunks the text → generates embeddings via configured provider → stores vectors
  3. Agent pods get a search_memory tool injected (similar to how MCP tools are mounted)
  4. Agent searches before investigating, writes findings after

Option B: MCP-Based Memory Server

A dedicated MCP server that handles memory storage and retrieval:

spec:
  mcpServers:
    - name: memory
      toolsPrefix: memory
      url: http://memory-server.sympozium-system:8080

The memory MCP server would expose:

  • memory_search(query, topK) — semantic search over past entries
  • memory_store(content, tags) — store new findings
  • memory_list(tags, limit) — list recent entries

Option C: SkillPack Approach

A reusable SkillPack that any instance can mount:

spec:
  skills:
    - skillPackRef: memory-search
      params:
        EMBEDDING_MODEL: nomic-embed-text
        OLLAMA_URL: http://ollama:11434
        STORAGE: /data/memory-vectors

Architecture Considerations

Embedding Model Options

Model Size Speed Quality Where
nomic-embed-text 274MB Fast Good Ollama (local)
mxbai-embed-large 670MB Medium Better Ollama (local)
text-embedding-3-small API Fastest Good OpenAI (cloud)

For air-gapped / local-model users, Ollama-based embeddings are essential — this aligns with Sympozium's strength of working with local models.

Vector Storage Options

  1. PersistentVolume with embedded DB (simplest) — Use something like SQLite with sqlite-vss or hnswlib bundled into the controller
  2. In-cluster Qdrant/ChromaDB — More scalable, but adds infrastructure
  3. ConfigMap-based (current pattern extended) — Store serialized vectors in a ConfigMap; simplest but limited by ConfigMap size (1MB)

Data Flow

Agent Run Completes
      ↓
Controller detects MEMORY.md update
      ↓
Chunk text → Generate embeddings → Store vectors
      ↓
Next Agent Run starts
      ↓
Agent calls search_memory("kafka consumer lag")
      ↓
Vector similarity search → Top-K results returned
      ↓
Agent uses past findings to accelerate investigation

Benchmark Evidence

From our AI Agent Benchmark comparing kagent, Sympozium, and HolmesGPT across 13 scenarios:

  • Agents frequently encounter similar failure patterns across scenarios (e.g., multiple scenarios involve feature flag misconfigurations, Kafka issues, or pod crashloops)
  • An agent with memory search could have reused diagnostic steps from scenario 1 (basic crashloop) when encountering scenario 7 (complex crashloop with timeout)
  • Token consumption could be reduced by 30-50% on recurring patterns if the agent finds relevant past context instead of re-discovering it

Prior Art

  • OpenClaw uses MEMORY.md + daily notes with Voyage AI embeddings for semantic search across memory files. Agents call memory_search(query) before answering questions about prior work.
  • LangChain/LangGraph have built-in memory stores with vector retrieval
  • CrewAI supports long-term memory with embedding-based search

Summary

Approach Complexity Scalability Local Model Support
A: Built-in Medium High ✅ Ollama embeddings
B: MCP Server Medium High ✅ Any embedding API
C: SkillPack Low Medium ✅ Ollama embeddings

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions