Semantic search enables finding context entries based on meaning rather than exact keyword matching. Using vector embeddings from multiple providers, the MCP Context Server can understand semantic similarity between queries and stored context, making it powerful for:
- Finding related work across different threads
- Discovering similar contexts without shared keywords
- Concept-based retrieval from large context collections
- Cross-agent knowledge discovery
This feature is optional and supports multiple embedding providers via LangChain integration.
The server supports 5 embedding providers:
| Provider | Default Model | Dimensions | Cost | Best For |
|---|---|---|---|---|
| Ollama (default) | qwen3-embedding:0.6b | 1024 | Free (local) | Development, privacy-focused |
| OpenAI | text-embedding-3-small | 1536 | $0.02/1M tokens | Production, high quality |
| Azure OpenAI | text-embedding-ada-002 | 1536 | Pay-as-you-go | Enterprise, compliance |
| HuggingFace | sentence-transformers/all-MiniLM-L6-v2 | 384 | Free (API) | Open source, experimentation |
| Voyage AI | voyage-3 | 1024 | $0.06/1M tokens | RAG optimization, long context |
Select a provider via the EMBEDDING_PROVIDER environment variable.
- Python: 3.12+ (already required by MCP Context Server)
- SQLite: 3.35+ minimum, 3.41+ recommended (for SQLite backend)
- PostgreSQL: 14+ with pgvector extension (for PostgreSQL backend)
- RAM: 4GB minimum, 8GB recommended for local models
| Provider | Requirements |
|---|---|
| Ollama | Ollama installed locally, ~1GB storage per model |
| OpenAI | OpenAI API key, internet access |
| Azure OpenAI | Azure subscription, deployed embedding model |
| HuggingFace | HuggingFace API token (optional for some models) |
| Voyage AI | Voyage AI API key |
Each provider has its own optional dependencies:
# Ollama provider (default)
uv sync --extra embeddings-ollama
# OpenAI provider
uv sync --extra embeddings-openai
# Azure OpenAI provider
uv sync --extra embeddings-azure
# HuggingFace provider
uv sync --extra embeddings-huggingface
# Voyage AI provider
uv sync --extra embeddings-voyage
# All providers
uv sync --extra embeddings-allSee the provider-specific sections below for detailed setup instructions.
Ollama runs embedding models locally with no API costs.
-
Install Ollama from ollama.com/download
Windows:
- Download and run the .exe installer
- Verify Windows Defender allows port 11434
macOS:
- Download the .dmg from ollama.com/download/mac
- Drag Ollama to Applications and launch
Important macOS Note: The default macOS Python lacks SQLite extension support. You must use Homebrew Python:
brew install python /opt/homebrew/bin/python3 -m venv .venv source .venv/bin/activateLinux:
curl -fsSL https://ollama.com/install.sh | sh -
Pull embedding model:
ollama pull qwen3-embedding:0.6b
-
Verify installation:
ollama list curl http://localhost:11434
{
"mcpServers": {
"context-server": {
"type": "stdio",
"command": "uvx",
"args": ["--python", "3.12", "--with", "mcp-context-server[embeddings-ollama,reranking]", "mcp-context-server"],
"env": {
"ENABLE_SEMANTIC_SEARCH": "true",
"EMBEDDING_PROVIDER": "ollama",
"EMBEDDING_MODEL": "qwen3-embedding:0.6b",
"EMBEDDING_DIM": "1024",
"OLLAMA_HOST": "http://localhost:11434"
}
}
}
}Note: The --extra reranking is necessary to enable reranking.
| Variable | Default | Description |
|---|---|---|
EMBEDDING_PROVIDER |
ollama |
Set to ollama |
EMBEDDING_MODEL |
qwen3-embedding:0.6b |
Ollama model name |
EMBEDDING_DIM |
1024 |
Vector dimensions |
OLLAMA_HOST |
http://localhost:11434 |
Ollama API URL |
EMBEDDING_OLLAMA_TRUNCATE |
false |
Truncation mode: false (default) returns error when context exceeded, true enables silent truncation |
EMBEDDING_OLLAMA_NUM_CTX |
4096 |
Context window size in tokens (range: 512-2097152) |
Docker Networking: Use host.docker.internal:11434 (Windows/macOS) or 172.17.0.1:11434 (Linux) when running in containers.
| Model | Dimensions | Notes |
|---|---|---|
| qwen3-embedding:0.6b | 1024 | Default, MTEB-R 61.82, 32K context, 100+ languages |
| nomic-embed-text | 768 | Strong performance, English-focused |
| mxbai-embed-large | 1024 | Higher quality, slower |
| all-minilm | 384 | Very fast, good for large-scale |
OpenAI provides high-quality embeddings via API.
-
Get API key from platform.openai.com/api-keys
-
Install dependencies:
uv sync --extra embeddings-openai
{
"mcpServers": {
"context-server": {
"type": "stdio",
"command": "uvx",
"args": ["--python", "3.12", "--with", "mcp-context-server[embeddings-openai,reranking]", "mcp-context-server"],
"env": {
"ENABLE_SEMANTIC_SEARCH": "true",
"EMBEDDING_PROVIDER": "openai",
"EMBEDDING_MODEL": "text-embedding-3-small",
"EMBEDDING_DIM": "1536",
"OPENAI_API_KEY": "${OPENAI_API_KEY}"
}
}
}
}Note: The --extra reranking is necessary to enable reranking.
| Variable | Default | Description |
|---|---|---|
EMBEDDING_PROVIDER |
- | Set to openai |
EMBEDDING_MODEL |
text-embedding-3-small |
OpenAI model name |
EMBEDDING_DIM |
1536 |
Vector dimensions |
OPENAI_API_KEY |
- | Required: OpenAI API key |
OPENAI_API_BASE |
- | Custom base URL (optional) |
OPENAI_ORGANIZATION |
- | Organization ID (optional) |
| Model | Dimensions | Price | Notes |
|---|---|---|---|
| text-embedding-3-small | 1536 | $0.02/1M | Recommended, cost-effective |
| text-embedding-3-large | 3072 | $0.13/1M | Higher quality |
| text-embedding-ada-002 | 1536 | $0.10/1M | Legacy model |
Azure OpenAI provides enterprise-grade embeddings with compliance features.
-
Create Azure OpenAI resource in Azure Portal
-
Deploy embedding model in Azure OpenAI Studio
-
Get credentials:
- Endpoint URL
- API key
- Deployment name
-
Install dependencies:
uv sync --extra embeddings-azure
{
"mcpServers": {
"context-server": {
"type": "stdio",
"command": "uvx",
"args": ["--python", "3.12", "--with", "mcp-context-server[embeddings-azure,reranking]", "mcp-context-server"],
"env": {
"ENABLE_SEMANTIC_SEARCH": "true",
"EMBEDDING_PROVIDER": "azure",
"EMBEDDING_MODEL": "text-embedding-ada-002",
"EMBEDDING_DIM": "1536",
"AZURE_OPENAI_API_KEY": "${AZURE_OPENAI_API_KEY}",
"AZURE_OPENAI_ENDPOINT": "https://your-resource.openai.azure.com",
"AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME": "your-deployment-name",
"AZURE_OPENAI_API_VERSION": "2024-02-01"
}
}
}
}Note: The --extra reranking is necessary to enable reranking.
| Variable | Default | Description |
|---|---|---|
EMBEDDING_PROVIDER |
- | Set to azure |
EMBEDDING_MODEL |
- | Model name |
EMBEDDING_DIM |
1536 |
Vector dimensions |
AZURE_OPENAI_API_KEY |
- | Required: Azure API key |
AZURE_OPENAI_ENDPOINT |
- | Required: Azure endpoint URL |
AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME |
- | Required: Deployment name |
AZURE_OPENAI_API_VERSION |
2024-02-01 |
API version |
HuggingFace provides access to open-source embedding models.
-
Get API token (optional) from huggingface.co/settings/tokens
-
Install dependencies:
uv sync --extra embeddings-huggingface
{
"mcpServers": {
"context-server": {
"type": "stdio",
"command": "uvx",
"args": ["--python", "3.12", "--with", "mcp-context-server[embeddings-huggingface,reranking]", "mcp-context-server"],
"env": {
"ENABLE_SEMANTIC_SEARCH": "true",
"EMBEDDING_PROVIDER": "huggingface",
"EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2",
"EMBEDDING_DIM": "384",
"HUGGINGFACEHUB_API_TOKEN": "${HUGGINGFACEHUB_API_TOKEN}"
}
}
}
}Note: The --extra reranking is necessary to enable reranking.
| Variable | Default | Description |
|---|---|---|
EMBEDDING_PROVIDER |
- | Set to huggingface |
EMBEDDING_MODEL |
sentence-transformers/all-MiniLM-L6-v2 |
Model identifier |
EMBEDDING_DIM |
384 |
Vector dimensions |
HUGGINGFACEHUB_API_TOKEN |
- | HuggingFace API token (optional for some models) |
| Model | Dimensions | Notes |
|---|---|---|
| sentence-transformers/all-MiniLM-L6-v2 | 384 | Fast, good quality |
| sentence-transformers/all-mpnet-base-v2 | 768 | Higher quality |
| BAAI/bge-small-en-v1.5 | 384 | BGE series, excellent quality |
| BAAI/bge-base-en-v1.5 | 768 | BGE series, balanced |
Voyage AI specializes in RAG-optimized embeddings with long context support.
-
Get API key from dash.voyageai.com
-
Install dependencies:
uv sync --extra embeddings-voyage
{
"mcpServers": {
"context-server": {
"type": "stdio",
"command": "uvx",
"args": ["--python", "3.12", "--with", "mcp-context-server[embeddings-voyage,reranking]", "mcp-context-server"],
"env": {
"ENABLE_SEMANTIC_SEARCH": "true",
"EMBEDDING_PROVIDER": "voyage",
"EMBEDDING_MODEL": "voyage-3",
"EMBEDDING_DIM": "1024",
"VOYAGE_API_KEY": "${VOYAGE_API_KEY}"
}
}
}
}Note: The --extra reranking is necessary to enable reranking.
| Variable | Default | Description |
|---|---|---|
EMBEDDING_PROVIDER |
- | Set to voyage |
EMBEDDING_MODEL |
voyage-3 |
Voyage model name |
EMBEDDING_DIM |
1024 |
Vector dimensions |
VOYAGE_API_KEY |
- | Required: Voyage AI API key |
VOYAGE_TRUNCATION |
false |
Truncation mode: false (default) returns error when context exceeded, true enables silent truncation |
VOYAGE_BATCH_SIZE |
7 |
Texts per API call (1-128) |
| Model | Dimensions | Context | Notes |
|---|---|---|---|
| voyage-3 | 1024 | 32K | Latest, recommended |
| voyage-3-lite | 512 | 32K | Faster, lower cost |
| voyage-code-3 | 1024 | 32K | Optimized for code |
| voyage-finance-2 | 1024 | 32K | Financial domain |
| voyage-law-2 | 1024 | 32K | Legal domain |
LangSmith provides observability for embedding operations, including:
- Cost tracking per request
- Latency monitoring
- Error debugging
- Usage analytics
LangSmith tracing requires the optional langsmith dependency:
# For local development
uv sync --extra langsmith
# For uvx (combined with embeddings, e.g., Ollama)
uvx --python 3.12 --with "mcp-context-server[embeddings-ollama,langsmith,reranking]" mcp-context-serverGraceful Degradation: If LANGSMITH_TRACING=true but the langsmith package is not installed, the server will log a warning once and continue operating without tracing.
-
Create account at smith.langchain.com
-
Get API key from settings
-
Install dependencies:
uv sync --extra langsmith -
Enable tracing in your configuration:
{
"env": {
"LANGSMITH_TRACING": "true",
"LANGSMITH_API_KEY": "${LANGSMITH_API_KEY}",
"LANGSMITH_PROJECT": "mcp-context-server"
}
}| Variable | Default | Description |
|---|---|---|
LANGSMITH_TRACING |
false |
Enable/disable tracing |
LANGSMITH_API_KEY |
- | LangSmith API key |
LANGSMITH_ENDPOINT |
https://api.smith.langchain.com |
API endpoint |
LANGSMITH_PROJECT |
mcp-context-server |
Project name for grouping traces |
- Navigate to smith.langchain.com
- Select your project
- View embedding operations with timing and cost data
Text chunking splits long context entries into smaller chunks for embedding generation. This improves semantic search quality for long documents by ensuring each chunk fits within the embedding model's optimal context window.
Key point: Chunking is enabled by default. No configuration is required.
| Variable | Default | Description |
|---|---|---|
ENABLE_CHUNKING |
true |
Enable text chunking for embedding generation |
CHUNK_SIZE |
1500 |
Target chunk size in characters |
CHUNK_OVERLAP |
150 |
Overlap between consecutive chunks |
CHUNK_AGGREGATION |
max |
How to aggregate chunk scores (only 'max' supported; avg, sum planned for future) |
CHUNK_DEDUP_OVERFETCH |
5 |
Multiplier for over-fetching chunks before deduplication |
- Text Splitting: Long text is split using RecursiveCharacterTextSplitter with separators:
['\n\n', '\n', '. ', ' ', ''] - Embedding Generation: Each chunk is embedded independently
- Storage: Multiple embeddings stored per context entry (1:N relationship)
- Search Deduplication: Results are deduplicated by context_id using the configured aggregation method
Currently only max aggregation is supported. The avg and sum methods will be added in a future release.
- Increase CHUNK_SIZE: For documents with long cohesive sections
- Increase CHUNK_OVERLAP: To preserve context across chunk boundaries
Reranking uses a cross-encoder model (FlashRank) to improve search result precision by re-scoring candidates using the full query-document context.
Key point: Reranking is enabled by default. The FlashRank model (~34MB) downloads automatically during server startup.
Reranking requires the reranking extra which contains the flashrank package:
# For local development
uv sync --extra reranking
# For uvx (combined with embeddings, e.g., Ollama)
uvx --python 3.12 --with "mcp-context-server[embeddings-ollama,reranking]" mcp-context-serverImportant: Since ENABLE_RERANKING=true by default, you must install the reranking extra for the server to start successfully. To run without reranking, set ENABLE_RERANKING=false.
| Variable | Default | Description |
|---|---|---|
ENABLE_RERANKING |
true |
Enable cross-encoder reranking |
RERANKING_PROVIDER |
flashrank |
Reranking provider name |
RERANKING_MODEL |
ms-marco-MiniLM-L-12-v2 |
FlashRank model (~34MB) |
RERANKING_MAX_LENGTH |
512 |
Maximum input length in tokens |
RERANKING_OVERFETCH |
4 |
Multiplier for over-fetching before reranking |
RERANKING_CACHE_DIR |
None | Custom cache directory for model files |
- Over-fetching: Search retrieves
limit * RERANKING_OVERFETCHcandidates - Cross-Encoder Scoring: FlashRank scores each candidate against the query
- Re-ordering: Results sorted by cross-encoder score (higher = more relevant)
- Limiting: Top
limitresults returned
| Model | Size | Notes |
|---|---|---|
ms-marco-MiniLM-L-12-v2 (default) |
34MB | Fast, good quality |
ms-marco-MiniLM-L-6-v2 |
23MB | Faster, slightly lower quality |
rank_zephyr_7b_v1_full |
14GB | Highest quality, requires GPU |
- Startup: Model downloads during server initialization (~34MB for default model)
- Latency: Adds 20-100ms per search depending on candidate count
- Memory: ~200MB RAM for default model
- Cache: Model cached in
~/.cache/flashrank/by default
Embedding models have maximum context windows (measured in tokens). Text exceeding this limit must either be truncated or rejected with an error.
| Provider | Truncation Control | Default Behavior | Configuration |
|---|---|---|---|
| Ollama | Configurable | Error on context exceed | EMBEDDING_OLLAMA_TRUNCATE=false |
| Voyage | Configurable | Error on context exceed | VOYAGE_TRUNCATION=false |
| OpenAI | Always error | Returns error if input exceeds limit | N/A |
| Azure | Always error | Returns error if input exceeds limit | N/A |
| HuggingFace | Always truncate | Silently truncates (cannot be disabled) | N/A |
For production use, keep truncation disabled (false) and enable chunking (ENABLE_CHUNKING=true, default):
# Recommended: Chunking handles long documents, errors on unexpected overflows
ENABLE_CHUNKING=true # Default
EMBEDDING_OLLAMA_TRUNCATE=false # Default - errors prevent silent quality degradationSilent truncation (EMBEDDING_OLLAMA_TRUNCATE=true) is only recommended when:
- You accept potential embedding quality degradation for very long documents
- Chunking is disabled and you want graceful handling of edge cases
When truncation is disabled (EMBEDDING_OLLAMA_TRUNCATE=false or VOYAGE_TRUNCATION=false):
- Before calling the embedding API, text length is estimated using heuristic: 1 token ~ 3 characters
- If estimated tokens exceed the model's context limit, an error is raised before the API call
- This provides fail-fast behavior with clear error messages
Example error:
ValueError: Text length (5000 chars, ~1666 estimated tokens) may exceed context window
(1000 tokens from EMBEDDING_OLLAMA_NUM_CTX) for model qwen3-embedding:0.6b.
Options: 1) Enable chunking (ENABLE_CHUNKING=true, default),
2) Increase EMBEDDING_OLLAMA_NUM_CTX,
3) Set EMBEDDING_OLLAMA_TRUNCATE=true to allow silent truncation.
Common embedding models and their context limits:
| Model | Provider | Max Tokens | Notes |
|---|---|---|---|
| qwen3-embedding:0.6b | Ollama | 32,000 | Default model |
| nomic-embed-text | Ollama | 8,192 | Limited by EMBEDDING_OLLAMA_NUM_CTX |
| text-embedding-3-small | OpenAI | 8,191 | Always error on exceed |
| text-embedding-3-large | OpenAI | 8,191 | Always error on exceed |
| voyage-3 | Voyage AI | 32,000 | Configurable via truncation |
| all-MiniLM-L6-v2 | HuggingFace | 256 | Always silently truncates |
Note: For Ollama models, the effective context limit is min(model_max_tokens, EMBEDDING_OLLAMA_NUM_CTX).
On server startup, the universal validator checks:
- When chunking enabled: Warns if
CHUNK_SIZEmay exceed model's context limit - When chunking disabled: Warns about potential issues with large documents
- Unknown models: Uses provider default and logs a warning
Example startup warning:
[EMBEDDING CONFIG] CHUNK_SIZE (100000 chars, ~33333 tokens estimate) exceeds
model context limit (32000 tokens from model spec for qwen3-embedding:0.6b).
Chunks will cause embedding errors.
Recommendation: Reduce CHUNK_SIZE to ~76800 chars (80% of context window).
All providers support these settings:
| Variable | Default | Description |
|---|---|---|
EMBEDDING_TIMEOUT_S |
240.0 |
API timeout in seconds |
EMBEDDING_RETRY_MAX_ATTEMPTS |
5 |
Max retry attempts |
EMBEDDING_RETRY_BASE_DELAY_S |
1.0 |
Base delay between retries |
{
"mcpServers": {
"context-server": {
"type": "stdio",
"command": "uvx",
"args": ["--python", "3.12", "--with", "mcp-context-server[embeddings-openai,reranking]", "mcp-context-server"],
"env": {
"ENABLE_SEMANTIC_SEARCH": "true",
"EMBEDDING_PROVIDER": "openai",
"EMBEDDING_MODEL": "text-embedding-3-small",
"EMBEDDING_DIM": "1536",
"OPENAI_API_KEY": "${OPENAI_API_KEY}",
"EMBEDDING_TIMEOUT_S": "240",
"EMBEDDING_RETRY_MAX_ATTEMPTS": "5",
"LANGSMITH_TRACING": "true",
"LANGSMITH_API_KEY": "${LANGSMITH_API_KEY}"
}
}
}
}When using PostgreSQL backend (including Supabase), ensure the pgvector extension is enabled.
The pgvector/pgvector Docker image has pgvector pre-installed:
docker run --name pgvector18 \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=mcp_context \
-p 5432:5432 \
-d pgvector/pgvector:pg18-trixie
# Configure and run
export STORAGE_BACKEND=postgresql
export ENABLE_SEMANTIC_SEARCH=true
uv run mcp-context-serverSupabase provides pgvector but requires manual enablement:
Method A: Via Dashboard
- Navigate to Database > Extensions in Supabase Dashboard
- Search for "vector" and enable it
Method B: Via SQL
CREATE EXTENSION IF NOT EXISTS vector;CREATE EXTENSION IF NOT EXISTS vector;
SELECT * FROM pg_available_extensions WHERE name = 'vector';IMPORTANT: Changing embedding dimensions requires database migration and loss of existing embeddings.
| Model | Dimension |
|---|---|
| qwen3-embedding:0.6b | 1024 |
| text-embedding-3-small | 1536 |
| text-embedding-3-large | 3072 |
| all-MiniLM-L6-v2 | 384 |
| voyage-3 | 1024 |
When you change EMBEDDING_DIM and restart, you'll see:
RuntimeError: Embedding dimension mismatch detected!
Existing database dimension: 1024
Configured EMBEDDING_DIM: 1536
Steps:
-
Back up database:
# Windows copy %USERPROFILE%\.mcp\context_storage.db %USERPROFILE%\.mcp\context_storage.backup.db # Linux/macOS cp ~/.mcp/context_storage.db ~/.mcp/context_storage.backup.db
-
Update configuration with new dimension and model
-
Delete database:
# Windows del %USERPROFILE%\.mcp\context_storage.db # Linux/macOS rm ~/.mcp/context_storage.db
-
Restart server - new tables created automatically
-
Re-store data - embeddings are generated at
store_contextandstore_context_batchtime (and regenerated when text changes viaupdate_contextorupdate_context_batch)
When enabled, the semantic_search_context MCP tool becomes available.
Parameters:
query(str, required): Natural language search querylimit(int, optional): Maximum results (1-100, default: 5)offset(int, optional): Pagination offset (default: 0)thread_id(str, optional): Filter by threadsource(str, optional): Filter by source ('user' or 'agent')tags(list, optional): Filter by tags (OR logic)content_type(str, optional): Filter by content typestart_date(str, optional): Filter by creation date (ISO 8601)end_date(str, optional): Filter by creation date (ISO 8601)metadata(dict, optional): Simple metadata filtersmetadata_filters(list, optional): Advanced metadata filtersinclude_images(bool, optional): Include image data (default: false)explain_query(bool, optional): Include statistics (default: false)
Metadata Filtering: See the Metadata Guide for operators and examples.
Returns:
{
"query": "original search query",
"results": [
{
"id": 123,
"thread_id": "thread-abc",
"text_content": "matching context...",
"summary": "Summary of the matching context entry with key semantic information.",
"is_text_content_truncated": true,
"scores": {
"semantic_distance": 0.234,
"semantic_rank": null,
"rerank_score": 0.92
},
"tags": ["tag1", "tag2"]
}
],
"count": 5,
"model": "text-embedding-3-small",
"stats": {
"execution_time_ms": 85.3,
"embedding_generation_ms": 45.1,
"filters_applied": 2
}
}Scores Object:
semantic_distance: L2 Euclidean distance (LOWER = more similar)semantic_rank: Always null for standalone semantic search (no ranking)rerank_score: Cross-encoder relevance score (HIGHER = better, 0.0-1.0), present when reranking is enabled
Embeddings are generated automatically:
- On
store_context: Embeddings generated in background - On
store_context_batch: Embeddings generated for each entry - On
update_context: Embeddings regenerated when text changes - On
update_context_batch: Embeddings regenerated for entries with text changes - On
delete_context: Embeddings cascade deleted
CRITICAL: When ENABLE_EMBEDDING_GENERATION=true (default when semantic search is enabled), embedding generation follows an "embedding-first" pattern:
- Embeddings are generated FIRST (outside any database transaction)
- If embedding generation fails, NO data is saved (operation returns error immediately)
- If embedding generation succeeds, ALL database operations occur in a SINGLE atomic transaction
This ensures data consistency: you never have context entries without their corresponding embeddings when embedding generation is enabled.
Key Implications:
- Embedding failures prevent data storage (not graceful degradation)
- Network issues with embedding provider will cause store_context to fail
- Consider retry logic in client code for transient embedding failures
-
Cross-thread discovery:
semantic_search_context(query="authentication implementation", limit=10)
-
Agent collaboration:
semantic_search_context(query="API rate limiting solutions", source="agent")
-
Metadata-filtered search:
semantic_search_context( query="performance optimization", metadata={"status": "completed"}, metadata_filters=[{"key": "priority", "operator": "gte", "value": 7}] )
-
Time-bounded search:
semantic_search_context(query="database optimization", start_date="2025-11-01")
- Embedding Generation: 50-150ms per text (varies by provider)
- Similarity Search: O(n * d) where n = filtered entries, d = dimensions
- Acceptable Scale: <100K context entries
- Storage Impact: ~4 bytes per dimension per entry
-
Verify provider installation:
# For Ollama ollama list curl http://localhost:11434 # For API providers, check credentials are set echo $OPENAI_API_KEY
-
Verify dependencies:
# Check LangChain packages installed python -c "from langchain_openai import OpenAIEmbeddings; print('OK')"
-
Start server:
export ENABLE_SEMANTIC_SEARCH=true export EMBEDDING_PROVIDER=openai export OPENAI_API_KEY=sk-... uv run mcp-context-server
-
Check logs for:
[OK] Embedding provider initialized: openai [OK] Semantic search enabled -
Test functionality:
semantic_search_context(query="test", limit=5)
The server classifies startup errors into two categories with different exit codes:
Configuration Errors (Exit Code 78)
These errors require human intervention to fix. The server will NOT recover by restarting:
| Error Type | Example | Fix |
|---|---|---|
| Missing API key | OPENAI_API_KEY environment variable is not set |
Set the required environment variable |
| Package not installed | langchain-openai package not installed |
Run uv sync --extra embeddings-openai |
| Invalid provider | Unknown provider: 'invalid' |
Use valid EMBEDDING_PROVIDER value |
| Vector storage dependencies | vector storage dependencies not available |
Install required packages |
| Model not found | model 'model' not found (status code: 404) |
Pull model: ollama pull <model> or fix EMBEDDING_MODEL |
Dependency Errors (Exit Code 69)
These errors may resolve with time. Supervisor may retry with backoff:
| Error Type | Example | Fix |
|---|---|---|
| Service not running | Ollama service not accessible: Connection refused |
Start Ollama: ollama serve |
| Network issues | Connection timed out |
Check network connectivity |
| Temporary unavailability | Service returned status 503 |
Wait and retry |
The server logs startup errors with clear indicators:
# Configuration Error (will NOT retry)
[FATAL] Configuration error (will not retry): ENABLE_EMBEDDING_GENERATION=true but langchain-openai package not installed...
# Dependency Error (may retry)
[ERROR] Dependency unavailable (may retry): ENABLE_EMBEDDING_GENERATION=true but ollama is not available (service may be down)...
Error: Failed to connect to Ollama: Connection refused
Solutions:
- Start Ollama:
ollama serve - Check firewall allows port 11434
- For Docker: use
OLLAMA_HOST=http://host.docker.internal:11434
Error: OpenAI API key not found
Solutions:
- Set
OPENAI_API_KEYenvironment variable - Verify key is valid at platform.openai.com
Error: Embedding dimension mismatch: expected 1536, got 1024
Cause: Model produces different dimensions than configured
Solution: Update EMBEDDING_DIM to match model output, or use correct model
Error: Model 'qwen3-embedding:0.6b' not found
Solution (Ollama):
ollama pull qwen3-embedding:0.6bError: ImportError: langchain_openai not installed
Solution:
uv sync --extra embeddings-openaiError: AttributeError: 'sqlite3.Connection' object has no attribute 'enable_load_extension'
Cause: Default macOS Python lacks SQLite extension support
Solution:
brew install python
/opt/homebrew/bin/python3 -m venv .venv
source .venv/bin/activate
uv sync --extra embeddings-ollama| Error | Cause | Solution |
|---|---|---|
Connection refused |
Provider not running | Start Ollama or check API credentials |
API key invalid |
Wrong credentials | Verify API key is correct |
Model not found |
Model not available | Pull model or check model name |
Dimension mismatch |
Wrong EMBEDDING_DIM | Match dimension to model output |
Rate limit exceeded |
API rate limiting | Reduce request rate or upgrade plan |
| Use Case | Recommended Provider |
|---|---|
| Development | Ollama (free, local) |
| Production (general) | OpenAI (balanced) |
| Enterprise | Azure OpenAI (compliance) |
| Cost-sensitive | HuggingFace (free API) |
| RAG optimization | Voyage AI (specialized) |
Adjust batch size for throughput vs latency:
VOYAGE_BATCH_SIZE=20 # Higher for throughput, lower for latencyIncrease timeout for slow connections:
EMBEDDING_TIMEOUT_S=300- LangChain Embeddings: python.langchain.com/docs/integrations/text_embedding
- Ollama: ollama.com
- OpenAI Embeddings: platform.openai.com/docs/guides/embeddings
- Azure OpenAI: learn.microsoft.com/azure/ai-services/openai
- Voyage AI: docs.voyageai.com
- LangSmith: smith.langchain.com
- API Reference: API Reference - complete tool documentation
- Summary Generation: Summary Generation Guide - LLM-based automatic summarization for search results
- Database Backends: Database Backends Guide - database configuration
- Full-Text Search: Full-Text Search Guide - linguistic search with stemming
- Hybrid Search: Hybrid Search Guide - combined FTS + semantic search
- Metadata Filtering: Metadata Guide - filtering with operators
- Docker Deployment: Docker Deployment Guide - containerized deployment
- Authentication: Authentication Guide - HTTP transport authentication
- Main Documentation: README.md - overview and quick start