Home > Docs > Advanced > Retrieval Strategies
This guide explains the different retrieval strategies available in EverCore and when to use each one.
- Overview
- Lightweight Retrieval
- Agentic Retrieval
- Choosing a Strategy
- API Examples
- Performance Comparison
- Best Practices
EverCore provides two main retrieval strategies:
- Lightweight Retrieval - Fast, efficient retrieval for latency-sensitive scenarios
- Agentic Retrieval - Intelligent, multi-round retrieval for complex queries
Both strategies leverage the Memory Perception layer to recall relevant memories through multi-round reasoning and intelligent fusion, achieving precise contextual awareness.
Fast retrieval mode that skips LLM calls for minimum latency.
Pure keyword-based search using Elasticsearch BM25.
Characteristics:
- Fastest retrieval mode
- No embedding required
- Best for exact keyword matches
- Lower accuracy for semantic queries
When to use:
- Exact phrase or keyword search
- Latency is critical (< 100ms)
- No semantic understanding needed
Example:
{
"query": "soccer weekend",
"retrieve_method": "keyword"
}Pure vector-based search using Milvus.
Characteristics:
- Semantic understanding
- Finds similar meaning, not just keywords
- Requires embedding model
- Moderate latency (~200-500ms)
When to use:
- Semantic similarity important
- Query phrasing differs from stored content
- Need conceptual matches
Example:
{
"query": "What sports does the user enjoy?",
"retrieve_method": "vector"
}Reciprocal Rank Fusion of BM25 and Embedding results.
Characteristics:
- Best of both worlds
- Parallel execution of BM25 and embedding search
- Fuses results using RRF algorithm
- Balanced accuracy and speed
When to use:
- Default choice for most scenarios
- Want both keyword and semantic matching
- Need robust retrieval across query types
Example:
{
"query": "What are the user's weekend activities?",
"retrieve_method": "rrf"
}Optional reranking step to improve result relevance:
- Batch concurrent processing with exponential backoff retry
- Deep relevance scoring using reranker models
- Prioritization of most critical information
- High throughput stability
Reranking is automatically applied for hybrid and agentic retrieval methods. For programmatic control, see the Agentic Retrieval Guide.
Intelligent, multi-round retrieval using LLM for query expansion and fusion.
- Query Analysis - LLM analyzes the user query
- Query Expansion - Generates 2-3 complementary queries
- Parallel Retrieval - Retrieves memories for each query
- RRF Fusion - Fuses results using multi-path RRF
- Context Integration - Concatenates memories with current conversation
- Higher latency (~2-5 seconds with LLM calls)
- Better coverage for complex intents
- Multi-aspect queries handled effectively
- Adaptive to query complexity
- Complex, multi-faceted queries
- Queries requiring context understanding
- When accuracy is more important than speed
- Insufficient results from lightweight modes
User Query: "Tell me about my work-life balance"
Step 1 - Query Expansion:
- Original: "Tell me about my work-life balance"
- Expanded 1: "work schedule and working hours"
- Expanded 2: "hobbies and leisure activities"
- Expanded 3: "stress and relaxation"
Step 2 - Parallel Retrieval: Each query retrieves top-k memories using RRF
Step 3 - Fusion: Results merged using multi-path RRF
Step 4 - Response: LLM generates response based on retrieved memories
Is latency critical (< 100ms)?
├─ Yes → Use Keyword
└─ No → Continue
Do you need semantic understanding?
├─ No → Use Keyword
└─ Yes → Continue
Is the query complex or multi-faceted?
├─ Yes → Use Agentic
└─ No → Continue
Default choice → Use RRF
| Use Case | Recommended Strategy | Reason |
|---|---|---|
| Exact phrase search | Keyword | Fast, precise keyword matching |
| Product search by name | Keyword or RRF | Keywords important |
| Conversational queries | RRF or Agentic | Semantic understanding needed |
| Complex analysis questions | Agentic | Multi-aspect coverage |
| Real-time chat | RRF | Balance of speed and accuracy |
| Background indexing | Any | No latency constraints |
| Autocomplete/suggestions | Keyword | Speed critical |
| Research/analysis | Agentic | Accuracy critical |
curl -X GET http://localhost:1995/api/v0/memories/search \
-H "Content-Type: application/json" \
-d '{
"query": "soccer",
"user_id": "user_001",
"memory_types": ["episodic_memory"],
"retrieve_method": "keyword",
"top_k": 5
}'curl -X GET http://localhost:1995/api/v0/memories/search \
-H "Content-Type: application/json" \
-d '{
"query": "What sports does the user like?",
"user_id": "user_001",
"memory_types": ["episodic_memory"],
"retrieve_method": "vector",
"top_k": 5
}'curl -X GET http://localhost:1995/api/v0/memories/search \
-H "Content-Type: application/json" \
-d '{
"query": "Tell me about the user hobbies",
"user_id": "user_001",
"memory_types": ["episodic_memory"],
"retrieve_method": "rrf",
"top_k": 5
}'curl -X GET http://localhost:1995/api/v0/memories/search \
-H "Content-Type: application/json" \
-d '{
"query": "What is my work-life balance like?",
"user_id": "user_001",
"memory_types": ["episodic_memory"],
"retrieve_method": "agentic",
"top_k": 10
}'| Strategy | Typical Latency | Notes |
|---|---|---|
| Keyword | 50-100ms | Fastest |
| Vector | 200-500ms | Depends on Milvus performance |
| RRF | 200-600ms | Parallel keyword + vector |
| Agentic | 2-5 seconds | Includes LLM query expansion |
Measured on LoCoMo benchmark:
| Strategy | Precision | Recall | F1 Score |
|---|---|---|---|
| Keyword | 0.72 | 0.68 | 0.70 |
| Vector | 0.78 | 0.75 | 0.77 |
| RRF | 0.85 | 0.82 | 0.84 |
| Agentic | 0.91 | 0.89 | 0.90 |
Note: Actual performance varies by query type and data
| Strategy | CPU | Memory | Network |
|---|---|---|---|
| Keyword | Low | Low | Minimal |
| Vector | Medium | Medium | Moderate (embedding API) |
| RRF | Medium | Medium | Moderate |
| Agentic | Medium-High | Medium | High (multiple LLM calls) |
For most applications, RRF provides the best balance:
- Good accuracy
- Reasonable latency
- Robust across query types
When users search for specific keywords or phrases:
- Product names
- Exact quotes
- Technical terms
Use agentic retrieval when:
- User query is vague or complex
- Standard retrieval returns insufficient results
- Analysis or reasoning required
- Keyword: Lower top_k (3-5) for precise matches
- Vector/RRF: Medium top_k (5-10) for coverage
- Agentic: Higher top_k (10-20) for comprehensive results
- Track query latency and adjust strategy
- Monitor result relevance and switch modes
- Consider caching frequent queries
Use different strategies for different query types:
def select_strategy(query):
# Exact phrase (in quotes)
if query.startswith('"') and query.endswith('"'):
return "keyword"
# Complex question
if any(word in query.lower() for word in ["why", "how", "explain", "analyze"]):
return "agentic"
# Default
return "rrf"- Architecture: Memory Perception - Technical architecture
- API Documentation - Complete API reference
- Agentic Retrieval Guide - In-depth agentic retrieval
- Evaluation Guide - Benchmarking retrieval strategies
- Usage Examples - Practical examples