Memory Retrieval Strategies

Home > Docs > Advanced > Retrieval Strategies

This guide explains the different retrieval strategies available in EverCore and when to use each one.

Overview
Lightweight Retrieval
Agentic Retrieval
Choosing a Strategy
API Examples
Performance Comparison
Best Practices

Overview

EverCore provides two main retrieval strategies:

Lightweight Retrieval - Fast, efficient retrieval for latency-sensitive scenarios
Agentic Retrieval - Intelligent, multi-round retrieval for complex queries

Both strategies leverage the Memory Perception layer to recall relevant memories through multi-round reasoning and intelligent fusion, achieving precise contextual awareness.

Lightweight Retrieval

Fast retrieval mode that skips LLM calls for minimum latency.

Retrieval Modes

1. Keyword Search

Pure keyword-based search using Elasticsearch BM25.

Characteristics:

Fastest retrieval mode
No embedding required
Best for exact keyword matches
Lower accuracy for semantic queries

When to use:

Exact phrase or keyword search
Latency is critical (< 100ms)
No semantic understanding needed

Example:

{
    "query": "soccer weekend",
    "retrieve_method": "keyword"
}

2. Vector (Semantic Search)

Pure vector-based search using Milvus.

Characteristics:

Semantic understanding
Finds similar meaning, not just keywords
Requires embedding model
Moderate latency (~200-500ms)

When to use:

Semantic similarity important
Query phrasing differs from stored content
Need conceptual matches

Example:

{
    "query": "What sports does the user enjoy?",
    "retrieve_method": "vector"
}

3. RRF (Hybrid Retrieval) - Recommended

Reciprocal Rank Fusion of BM25 and Embedding results.

Characteristics:

Best of both worlds
Parallel execution of BM25 and embedding search
Fuses results using RRF algorithm
Balanced accuracy and speed

When to use:

Default choice for most scenarios
Want both keyword and semantic matching
Need robust retrieval across query types

Example:

{
    "query": "What are the user's weekend activities?",
    "retrieve_method": "rrf"
}

Intelligent Reranking

Optional reranking step to improve result relevance:

Batch concurrent processing with exponential backoff retry
Deep relevance scoring using reranker models
Prioritization of most critical information
High throughput stability

Reranking is automatically applied for hybrid and agentic retrieval methods. For programmatic control, see the Agentic Retrieval Guide.

Agentic Retrieval

Intelligent, multi-round retrieval using LLM for query expansion and fusion.

How It Works

Query Analysis - LLM analyzes the user query
Query Expansion - Generates 2-3 complementary queries
Parallel Retrieval - Retrieves memories for each query
RRF Fusion - Fuses results using multi-path RRF
Context Integration - Concatenates memories with current conversation

Characteristics

Higher latency (~2-5 seconds with LLM calls)
Better coverage for complex intents
Multi-aspect queries handled effectively
Adaptive to query complexity

When to Use

Complex, multi-faceted queries
Queries requiring context understanding
When accuracy is more important than speed
Insufficient results from lightweight modes

Example Workflow

User Query: "Tell me about my work-life balance"

Step 1 - Query Expansion:

Original: "Tell me about my work-life balance"
Expanded 1: "work schedule and working hours"
Expanded 2: "hobbies and leisure activities"
Expanded 3: "stress and relaxation"

Step 2 - Parallel Retrieval: Each query retrieves top-k memories using RRF

Step 3 - Fusion: Results merged using multi-path RRF

Step 4 - Response: LLM generates response based on retrieved memories

Choosing a Strategy

Decision Flow

Is latency critical (< 100ms)?
├─ Yes → Use Keyword
└─ No → Continue

Do you need semantic understanding?
├─ No → Use Keyword
└─ Yes → Continue

Is the query complex or multi-faceted?
├─ Yes → Use Agentic
└─ No → Continue

Default choice → Use RRF

Use Case Matrix

Use Case	Recommended Strategy	Reason
Exact phrase search	Keyword	Fast, precise keyword matching
Product search by name	Keyword or RRF	Keywords important
Conversational queries	RRF or Agentic	Semantic understanding needed
Complex analysis questions	Agentic	Multi-aspect coverage
Real-time chat	RRF	Balance of speed and accuracy
Background indexing	Any	No latency constraints
Autocomplete/suggestions	Keyword	Speed critical
Research/analysis	Agentic	Accuracy critical

API Examples

Lightweight - Keyword

curl -X GET http://localhost:1995/api/v0/memories/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "soccer",
    "user_id": "user_001",
    "memory_types": ["episodic_memory"],
    "retrieve_method": "keyword",
    "top_k": 5
  }'

Lightweight - Vector

curl -X GET http://localhost:1995/api/v0/memories/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What sports does the user like?",
    "user_id": "user_001",
    "memory_types": ["episodic_memory"],
    "retrieve_method": "vector",
    "top_k": 5
  }'

Lightweight - RRF (Recommended)

curl -X GET http://localhost:1995/api/v0/memories/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Tell me about the user hobbies",
    "user_id": "user_001",
    "memory_types": ["episodic_memory"],
    "retrieve_method": "rrf",
    "top_k": 5
  }'

Agentic Retrieval

curl -X GET http://localhost:1995/api/v0/memories/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is my work-life balance like?",
    "user_id": "user_001",
    "memory_types": ["episodic_memory"],
    "retrieve_method": "agentic",
    "top_k": 10
  }'

Performance Comparison

Latency

Strategy	Typical Latency	Notes
Keyword	50-100ms	Fastest
Vector	200-500ms	Depends on Milvus performance
RRF	200-600ms	Parallel keyword + vector
Agentic	2-5 seconds	Includes LLM query expansion

Accuracy

Measured on LoCoMo benchmark:

Strategy	Precision	Recall	F1 Score
Keyword	0.72	0.68	0.70
Vector	0.78	0.75	0.77
RRF	0.85	0.82	0.84
Agentic	0.91	0.89	0.90

Note: Actual performance varies by query type and data

Resource Usage

Strategy	CPU	Memory	Network
Keyword	Low	Low	Minimal
Vector	Medium	Medium	Moderate (embedding API)
RRF	Medium	Medium	Moderate
Agentic	Medium-High	Medium	High (multiple LLM calls)

Best Practices

1. Start with RRF

For most applications, RRF provides the best balance:

Good accuracy
Reasonable latency
Robust across query types

2. Use Keyword Search for Known Patterns

When users search for specific keywords or phrases:

Product names
Exact quotes
Technical terms

3. Reserve Agentic for Complex Queries

Use agentic retrieval when:

User query is vague or complex
Standard retrieval returns insufficient results
Analysis or reasoning required

4. Tune top_k Parameter

Keyword: Lower top_k (3-5) for precise matches
Vector/RRF: Medium top_k (5-10) for coverage
Agentic: Higher top_k (10-20) for comprehensive results

5. Monitor and Optimize

Track query latency and adjust strategy
Monitor result relevance and switch modes
Consider caching frequent queries

6. Combine Strategies

Use different strategies for different query types:

def select_strategy(query):
    # Exact phrase (in quotes)
    if query.startswith('"') and query.endswith('"'):
        return "keyword"

    # Complex question
    if any(word in query.lower() for word in ["why", "how", "explain", "analyze"]):
        return "agentic"

    # Default
    return "rrf"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Retrieval Strategies

Table of Contents

Overview

Lightweight Retrieval

Retrieval Modes

1. Keyword Search

2. Vector (Semantic Search)

3. RRF (Hybrid Retrieval) - Recommended

Intelligent Reranking

Agentic Retrieval

How It Works

Characteristics

When to Use

Example Workflow

Choosing a Strategy

Decision Flow

Use Case Matrix

API Examples

Lightweight - Keyword

Lightweight - Vector

Lightweight - RRF (Recommended)

Agentic Retrieval

Performance Comparison

Latency

Accuracy

Resource Usage

Best Practices

1. Start with RRF

2. Use Keyword Search for Known Patterns

3. Reserve Agentic for Complex Queries

4. Tune top_k Parameter

5. Monitor and Optimize

6. Combine Strategies

See Also

FilesExpand file tree

RETRIEVAL_STRATEGIES.md

Latest commit

History

RETRIEVAL_STRATEGIES.md

File metadata and controls

Memory Retrieval Strategies

Table of Contents

Overview

Lightweight Retrieval

Retrieval Modes

1. Keyword Search

2. Vector (Semantic Search)

3. RRF (Hybrid Retrieval) - Recommended

Intelligent Reranking

Agentic Retrieval

How It Works

Characteristics

When to Use

Example Workflow

Choosing a Strategy

Decision Flow

Use Case Matrix

API Examples

Lightweight - Keyword

Lightweight - Vector

Lightweight - RRF (Recommended)

Agentic Retrieval

Performance Comparison

Latency

Accuracy

Resource Usage

Best Practices

1. Start with RRF

2. Use Keyword Search for Known Patterns

3. Reserve Agentic for Complex Queries

4. Tune top_k Parameter

5. Monitor and Optimize

6. Combine Strategies

See Also