Semantic Search Guide

Introduction

Semantic search enables finding context entries based on meaning rather than exact keyword matching. Using vector embeddings from multiple providers, the MCP Context Server can understand semantic similarity between queries and stored context, making it powerful for:

Finding related work across different threads
Discovering similar contexts without shared keywords
Concept-based retrieval from large context collections
Cross-agent knowledge discovery

This feature is optional and supports multiple embedding providers via LangChain integration.

Embedding Providers

The server supports 5 embedding providers:

Provider	Default Model	Dimensions	Cost	Best For
Ollama (default)	qwen3-embedding:0.6b	1024	Free (local)	Development, privacy-focused
OpenAI	text-embedding-3-small	1536	$0.02/1M tokens	Production, high quality
Azure OpenAI	text-embedding-ada-002	1536	Pay-as-you-go	Enterprise, compliance
HuggingFace	sentence-transformers/all-MiniLM-L6-v2	384	Free (API)	Open source, experimentation
Voyage AI	voyage-3	1024	$0.06/1M tokens	RAG optimization, long context

Select a provider via the EMBEDDING_PROVIDER environment variable.

Prerequisites

Python: 3.12+ (already required by MCP Context Server)
SQLite: 3.35+ minimum, 3.41+ recommended (for SQLite backend)
PostgreSQL: 14+ with pgvector extension (for PostgreSQL backend)
RAM: 4GB minimum, 8GB recommended for local models

Provider-Specific Requirements

Provider	Requirements
Ollama	Ollama installed locally, ~1GB storage per model
OpenAI	OpenAI API key, internet access
Azure OpenAI	Azure subscription, deployed embedding model
HuggingFace	HuggingFace API token (optional for some models)
Voyage AI	Voyage AI API key

Installation

Step 1: Install Provider Dependencies

Each provider has its own optional dependencies:

# Ollama provider (default)
uv sync --extra embeddings-ollama

# OpenAI provider
uv sync --extra embeddings-openai

# Azure OpenAI provider
uv sync --extra embeddings-azure

# HuggingFace provider
uv sync --extra embeddings-huggingface

# Voyage AI provider
uv sync --extra embeddings-voyage

# All providers
uv sync --extra embeddings-all

Step 2: Provider-Specific Setup

See the provider-specific sections below for detailed setup instructions.

Provider Configuration

Ollama (Default)

Ollama runs embedding models locally with no API costs.

Setup

Install Ollama from ollama.com/download

Windows:
- Download and run the .exe installer
- Verify Windows Defender allows port 11434
macOS:
- Download the .dmg from ollama.com/download/mac
- Drag Ollama to Applications and launch
Important macOS Note: The default macOS Python lacks SQLite extension support. You must use Homebrew Python:
```
brew install python
/opt/homebrew/bin/python3 -m venv .venv
source .venv/bin/activate
```
Linux:
```
curl -fsSL https://ollama.com/install.sh | sh
```
Pull embedding model:
```
ollama pull qwen3-embedding:0.6b
```
Verify installation:
```
ollama list
curl http://localhost:11434
```

Configuration

{
  "mcpServers": {
    "context-server": {
      "type": "stdio",
      "command": "uvx",
      "args": ["--python", "3.12", "--with", "mcp-context-server[embeddings-ollama,reranking]", "mcp-context-server"],
      "env": {
        "ENABLE_SEMANTIC_SEARCH": "true",
        "EMBEDDING_PROVIDER": "ollama",
        "EMBEDDING_MODEL": "qwen3-embedding:0.6b",
        "EMBEDDING_DIM": "1024",
        "OLLAMA_HOST": "http://localhost:11434"
      }
    }
  }
}

Note: The --extra reranking is necessary to enable reranking.

Environment Variables

Variable	Default	Description
`EMBEDDING_PROVIDER`	`ollama`	Set to `ollama`
`EMBEDDING_MODEL`	`qwen3-embedding:0.6b`	Ollama model name
`EMBEDDING_DIM`	`1024`	Vector dimensions
`OLLAMA_HOST`	`http://localhost:11434`	Ollama API URL
`EMBEDDING_OLLAMA_TRUNCATE`	`false`	Truncation mode: false (default) returns error when context exceeded, true enables silent truncation
`EMBEDDING_OLLAMA_NUM_CTX`	`4096`	Context window size in tokens (range: 512-2097152)

Docker Networking: Use host.docker.internal:11434 (Windows/macOS) or 172.17.0.1:11434 (Linux) when running in containers.

Alternative Ollama Models

Model	Dimensions	Notes
qwen3-embedding:0.6b	1024	Default, MTEB-R 61.82, 32K context, 100+ languages
nomic-embed-text	768	Strong performance, English-focused
mxbai-embed-large	1024	Higher quality, slower
all-minilm	384	Very fast, good for large-scale

OpenAI

OpenAI provides high-quality embeddings via API.

Setup

Get API key from platform.openai.com/api-keys
Install dependencies:
```
uv sync --extra embeddings-openai
```

Configuration

{
  "mcpServers": {
    "context-server": {
      "type": "stdio",
      "command": "uvx",
      "args": ["--python", "3.12", "--with", "mcp-context-server[embeddings-openai,reranking]", "mcp-context-server"],
      "env": {
        "ENABLE_SEMANTIC_SEARCH": "true",
        "EMBEDDING_PROVIDER": "openai",
        "EMBEDDING_MODEL": "text-embedding-3-small",
        "EMBEDDING_DIM": "1536",
        "OPENAI_API_KEY": "${OPENAI_API_KEY}"
      }
    }
  }
}

Note: The --extra reranking is necessary to enable reranking.

Environment Variables

Variable	Default	Description
`EMBEDDING_PROVIDER`	-	Set to `openai`
`EMBEDDING_MODEL`	`text-embedding-3-small`	OpenAI model name
`EMBEDDING_DIM`	`1536`	Vector dimensions
`OPENAI_API_KEY`	-	Required: OpenAI API key
`OPENAI_API_BASE`	-	Custom base URL (optional)
`OPENAI_ORGANIZATION`	-	Organization ID (optional)

Available OpenAI Models

Model	Dimensions	Price	Notes
text-embedding-3-small	1536	$0.02/1M	Recommended, cost-effective
text-embedding-3-large	3072	$0.13/1M	Higher quality
text-embedding-ada-002	1536	$0.10/1M	Legacy model

Azure OpenAI

Azure OpenAI provides enterprise-grade embeddings with compliance features.

Setup

Create Azure OpenAI resource in Azure Portal
Deploy embedding model in Azure OpenAI Studio
Get credentials:
- Endpoint URL
- API key
- Deployment name
Install dependencies:
```
uv sync --extra embeddings-azure
```

Configuration

{
  "mcpServers": {
    "context-server": {
      "type": "stdio",
      "command": "uvx",
      "args": ["--python", "3.12", "--with", "mcp-context-server[embeddings-azure,reranking]", "mcp-context-server"],
      "env": {
        "ENABLE_SEMANTIC_SEARCH": "true",
        "EMBEDDING_PROVIDER": "azure",
        "EMBEDDING_MODEL": "text-embedding-ada-002",
        "EMBEDDING_DIM": "1536",
        "AZURE_OPENAI_API_KEY": "${AZURE_OPENAI_API_KEY}",
        "AZURE_OPENAI_ENDPOINT": "https://your-resource.openai.azure.com",
        "AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME": "your-deployment-name",
        "AZURE_OPENAI_API_VERSION": "2024-02-01"
      }
    }
  }
}

Note: The --extra reranking is necessary to enable reranking.

Environment Variables

Variable	Default	Description
`EMBEDDING_PROVIDER`	-	Set to `azure`
`EMBEDDING_MODEL`	-	Model name
`EMBEDDING_DIM`	`1536`	Vector dimensions
`AZURE_OPENAI_API_KEY`	-	Required: Azure API key
`AZURE_OPENAI_ENDPOINT`	-	Required: Azure endpoint URL
`AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME`	-	Required: Deployment name
`AZURE_OPENAI_API_VERSION`	`2024-02-01`	API version

HuggingFace

HuggingFace provides access to open-source embedding models.

Setup

Get API token (optional) from huggingface.co/settings/tokens
Install dependencies:
```
uv sync --extra embeddings-huggingface
```

Configuration

{
  "mcpServers": {
    "context-server": {
      "type": "stdio",
      "command": "uvx",
      "args": ["--python", "3.12", "--with", "mcp-context-server[embeddings-huggingface,reranking]", "mcp-context-server"],
      "env": {
        "ENABLE_SEMANTIC_SEARCH": "true",
        "EMBEDDING_PROVIDER": "huggingface",
        "EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2",
        "EMBEDDING_DIM": "384",
        "HUGGINGFACEHUB_API_TOKEN": "${HUGGINGFACEHUB_API_TOKEN}"
      }
    }
  }
}

Note: The --extra reranking is necessary to enable reranking.

Environment Variables

Variable	Default	Description
`EMBEDDING_PROVIDER`	-	Set to `huggingface`
`EMBEDDING_MODEL`	`sentence-transformers/all-MiniLM-L6-v2`	Model identifier
`EMBEDDING_DIM`	`384`	Vector dimensions
`HUGGINGFACEHUB_API_TOKEN`	-	HuggingFace API token (optional for some models)

Recommended HuggingFace Models

Model	Dimensions	Notes
sentence-transformers/all-MiniLM-L6-v2	384	Fast, good quality
sentence-transformers/all-mpnet-base-v2	768	Higher quality
BAAI/bge-small-en-v1.5	384	BGE series, excellent quality
BAAI/bge-base-en-v1.5	768	BGE series, balanced

Voyage AI

Voyage AI specializes in RAG-optimized embeddings with long context support.

Setup

Get API key from dash.voyageai.com
Install dependencies:
```
uv sync --extra embeddings-voyage
```

Configuration

{
  "mcpServers": {
    "context-server": {
      "type": "stdio",
      "command": "uvx",
      "args": ["--python", "3.12", "--with", "mcp-context-server[embeddings-voyage,reranking]", "mcp-context-server"],
      "env": {
        "ENABLE_SEMANTIC_SEARCH": "true",
        "EMBEDDING_PROVIDER": "voyage",
        "EMBEDDING_MODEL": "voyage-3",
        "EMBEDDING_DIM": "1024",
        "VOYAGE_API_KEY": "${VOYAGE_API_KEY}"
      }
    }
  }
}

Note: The --extra reranking is necessary to enable reranking.

Environment Variables

Variable	Default	Description
`EMBEDDING_PROVIDER`	-	Set to `voyage`
`EMBEDDING_MODEL`	`voyage-3`	Voyage model name
`EMBEDDING_DIM`	`1024`	Vector dimensions
`VOYAGE_API_KEY`	-	Required: Voyage AI API key
`VOYAGE_TRUNCATION`	`false`	Truncation mode: false (default) returns error when context exceeded, true enables silent truncation
`VOYAGE_BATCH_SIZE`	`7`	Texts per API call (1-128)

Available Voyage Models

Model	Dimensions	Context	Notes
voyage-3	1024	32K	Latest, recommended
voyage-3-lite	512	32K	Faster, lower cost
voyage-code-3	1024	32K	Optimized for code
voyage-finance-2	1024	32K	Financial domain
voyage-law-2	1024	32K	Legal domain

LangSmith Tracing

LangSmith provides observability for embedding operations, including:

Cost tracking per request
Latency monitoring
Error debugging
Usage analytics

Installation

LangSmith tracing requires the optional langsmith dependency:

# For local development
uv sync --extra langsmith

# For uvx (combined with embeddings, e.g., Ollama)
uvx --python 3.12 --with "mcp-context-server[embeddings-ollama,langsmith,reranking]" mcp-context-server

Graceful Degradation: If LANGSMITH_TRACING=true but the langsmith package is not installed, the server will log a warning once and continue operating without tracing.

Setup

Create account at smith.langchain.com
Get API key from settings
Install dependencies: uv sync --extra langsmith
Enable tracing in your configuration:

{
  "env": {
    "LANGSMITH_TRACING": "true",
    "LANGSMITH_API_KEY": "${LANGSMITH_API_KEY}",
    "LANGSMITH_PROJECT": "mcp-context-server"
  }
}

Environment Variables

Variable	Default	Description
`LANGSMITH_TRACING`	`false`	Enable/disable tracing
`LANGSMITH_API_KEY`	-	LangSmith API key
`LANGSMITH_ENDPOINT`	`https://api.smith.langchain.com`	API endpoint
`LANGSMITH_PROJECT`	`mcp-context-server`	Project name for grouping traces

Viewing Traces

Navigate to smith.langchain.com
Select your project
View embedding operations with timing and cost data

Text Chunking

Text chunking splits long context entries into smaller chunks for embedding generation. This improves semantic search quality for long documents by ensuring each chunk fits within the embedding model's optimal context window.

Key point: Chunking is enabled by default. No configuration is required.

Configuration

Variable	Default	Description
`ENABLE_CHUNKING`	`true`	Enable text chunking for embedding generation
`CHUNK_SIZE`	`1500`	Target chunk size in characters
`CHUNK_OVERLAP`	`150`	Overlap between consecutive chunks
`CHUNK_AGGREGATION`	`max`	How to aggregate chunk scores (only 'max' supported; avg, sum planned for future)
`CHUNK_DEDUP_OVERFETCH`	`5`	Multiplier for over-fetching chunks before deduplication

How It Works

Text Splitting: Long text is split using RecursiveCharacterTextSplitter with separators: ['\n\n', '\n', '. ', ' ', '']
Embedding Generation: Each chunk is embedded independently
Storage: Multiple embeddings stored per context entry (1:N relationship)
Search Deduplication: Results are deduplicated by context_id using the configured aggregation method

Aggregation Methods

Currently only max aggregation is supported. The avg and sum methods will be added in a future release.

When to Adjust Settings

Increase CHUNK_SIZE: For documents with long cohesive sections
Increase CHUNK_OVERLAP: To preserve context across chunk boundaries

Cross-Encoder Reranking

Reranking uses a cross-encoder model (FlashRank) to improve search result precision by re-scoring candidates using the full query-document context.

Key point: Reranking is enabled by default. The FlashRank model (~34MB) downloads automatically during server startup.

Installation

Reranking requires the reranking extra which contains the flashrank package:

# For local development
uv sync --extra reranking

# For uvx (combined with embeddings, e.g., Ollama)
uvx --python 3.12 --with "mcp-context-server[embeddings-ollama,reranking]" mcp-context-server

Important: Since ENABLE_RERANKING=true by default, you must install the reranking extra for the server to start successfully. To run without reranking, set ENABLE_RERANKING=false.

Configuration

Variable	Default	Description
`ENABLE_RERANKING`	`true`	Enable cross-encoder reranking
`RERANKING_PROVIDER`	`flashrank`	Reranking provider name
`RERANKING_MODEL`	`ms-marco-MiniLM-L-12-v2`	FlashRank model (~34MB)
`RERANKING_MAX_LENGTH`	`512`	Maximum input length in tokens
`RERANKING_OVERFETCH`	`4`	Multiplier for over-fetching before reranking
`RERANKING_CACHE_DIR`	None	Custom cache directory for model files

How It Works

Over-fetching: Search retrieves limit * RERANKING_OVERFETCH candidates
Cross-Encoder Scoring: FlashRank scores each candidate against the query
Re-ordering: Results sorted by cross-encoder score (higher = more relevant)
Limiting: Top limit results returned

Available Models

Model	Size	Notes
`ms-marco-MiniLM-L-12-v2` (default)	34MB	Fast, good quality
`ms-marco-MiniLM-L-6-v2`	23MB	Faster, slightly lower quality
`rank_zephyr_7b_v1_full`	14GB	Highest quality, requires GPU

Performance Considerations

Startup: Model downloads during server initialization (~34MB for default model)
Latency: Adds 20-100ms per search depending on candidate count
Memory: ~200MB RAM for default model
Cache: Model cached in ~/.cache/flashrank/ by default

Context Length and Truncation Control

Embedding models have maximum context windows (measured in tokens). Text exceeding this limit must either be truncated or rejected with an error.

Truncation Behavior by Provider

Provider	Truncation Control	Default Behavior	Configuration
Ollama	Configurable	Error on context exceed	`EMBEDDING_OLLAMA_TRUNCATE=false`
Voyage	Configurable	Error on context exceed	`VOYAGE_TRUNCATION=false`
OpenAI	Always error	Returns error if input exceeds limit	N/A
Azure	Always error	Returns error if input exceeds limit	N/A
HuggingFace	Always truncate	Silently truncates (cannot be disabled)	N/A

Recommended Configuration

For production use, keep truncation disabled (false) and enable chunking (ENABLE_CHUNKING=true, default):

# Recommended: Chunking handles long documents, errors on unexpected overflows
ENABLE_CHUNKING=true        # Default
EMBEDDING_OLLAMA_TRUNCATE=false       # Default - errors prevent silent quality degradation

Silent truncation (EMBEDDING_OLLAMA_TRUNCATE=true) is only recommended when:

You accept potential embedding quality degradation for very long documents
Chunking is disabled and you want graceful handling of edge cases

How Pre-Validation Works

When truncation is disabled (EMBEDDING_OLLAMA_TRUNCATE=false or VOYAGE_TRUNCATION=false):

Before calling the embedding API, text length is estimated using heuristic: 1 token ~ 3 characters
If estimated tokens exceed the model's context limit, an error is raised before the API call
This provides fail-fast behavior with clear error messages

Example error:

ValueError: Text length (5000 chars, ~1666 estimated tokens) may exceed context window
(1000 tokens from EMBEDDING_OLLAMA_NUM_CTX) for model qwen3-embedding:0.6b.
Options: 1) Enable chunking (ENABLE_CHUNKING=true, default),
         2) Increase EMBEDDING_OLLAMA_NUM_CTX,
         3) Set EMBEDDING_OLLAMA_TRUNCATE=true to allow silent truncation.

Context Limits by Model

Common embedding models and their context limits:

Model	Provider	Max Tokens	Notes
qwen3-embedding:0.6b	Ollama	32,000	Default model
nomic-embed-text	Ollama	8,192	Limited by EMBEDDING_OLLAMA_NUM_CTX
text-embedding-3-small	OpenAI	8,191	Always error on exceed
text-embedding-3-large	OpenAI	8,191	Always error on exceed
voyage-3	Voyage AI	32,000	Configurable via truncation
all-MiniLM-L6-v2	HuggingFace	256	Always silently truncates

Note: For Ollama models, the effective context limit is min(model_max_tokens, EMBEDDING_OLLAMA_NUM_CTX).

Startup Validation

On server startup, the universal validator checks:

When chunking enabled: Warns if CHUNK_SIZE may exceed model's context limit
When chunking disabled: Warns about potential issues with large documents
Unknown models: Uses provider default and logs a warning

Example startup warning:

[EMBEDDING CONFIG] CHUNK_SIZE (100000 chars, ~33333 tokens estimate) exceeds
model context limit (32000 tokens from model spec for qwen3-embedding:0.6b).
Chunks will cause embedding errors.
Recommendation: Reduce CHUNK_SIZE to ~76800 chars (80% of context window).

Common Configuration Settings

Timeout and Retry Settings

All providers support these settings:

Variable	Default	Description
`EMBEDDING_TIMEOUT_S`	`240.0`	API timeout in seconds
`EMBEDDING_RETRY_MAX_ATTEMPTS`	`5`	Max retry attempts
`EMBEDDING_RETRY_BASE_DELAY_S`	`1.0`	Base delay between retries

Full Configuration Example

{
  "mcpServers": {
    "context-server": {
      "type": "stdio",
      "command": "uvx",
      "args": ["--python", "3.12", "--with", "mcp-context-server[embeddings-openai,reranking]", "mcp-context-server"],
      "env": {
        "ENABLE_SEMANTIC_SEARCH": "true",
        "EMBEDDING_PROVIDER": "openai",
        "EMBEDDING_MODEL": "text-embedding-3-small",
        "EMBEDDING_DIM": "1536",
        "OPENAI_API_KEY": "${OPENAI_API_KEY}",
        "EMBEDDING_TIMEOUT_S": "240",
        "EMBEDDING_RETRY_MAX_ATTEMPTS": "5",
        "LANGSMITH_TRACING": "true",
        "LANGSMITH_API_KEY": "${LANGSMITH_API_KEY}"
      }
    }
  }
}

PostgreSQL Backend Setup

When using PostgreSQL backend (including Supabase), ensure the pgvector extension is enabled.

Docker PostgreSQL (pgvector/pgvector image)

The pgvector/pgvector Docker image has pgvector pre-installed:

docker run --name pgvector18 \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=mcp_context \
  -p 5432:5432 \
  -d pgvector/pgvector:pg18-trixie

# Configure and run
export STORAGE_BACKEND=postgresql
export ENABLE_SEMANTIC_SEARCH=true
uv run mcp-context-server

Supabase Setup

Supabase provides pgvector but requires manual enablement:

Method A: Via Dashboard

Navigate to Database > Extensions in Supabase Dashboard
Search for "vector" and enable it

Method B: Via SQL

CREATE EXTENSION IF NOT EXISTS vector;

Changing Embedding Dimensions

IMPORTANT: Changing embedding dimensions requires database migration and loss of existing embeddings.

Common Dimensions by Model

Model	Dimension
qwen3-embedding:0.6b	1024
text-embedding-3-small	1536
text-embedding-3-large	3072
all-MiniLM-L6-v2	384
voyage-3	1024

Migration Procedure

When you change EMBEDDING_DIM and restart, you'll see:

RuntimeError: Embedding dimension mismatch detected!
  Existing database dimension: 1024
  Configured EMBEDDING_DIM: 1536

Steps:

Back up database:

# Windows
copy %USERPROFILE%\.mcp\context_storage.db %USERPROFILE%\.mcp\context_storage.backup.db

# Linux/macOS
cp ~/.mcp/context_storage.db ~/.mcp/context_storage.backup.db

Update configuration with new dimension and model

Delete database:

# Windows
del %USERPROFILE%\.mcp\context_storage.db

# Linux/macOS
rm ~/.mcp/context_storage.db

Restart server - new tables created automatically
Re-store data - embeddings are generated at store_context and store_context_batch time (and regenerated when text changes via update_context or update_context_batch)

Usage

semantic_search_context Tool

When enabled, the semantic_search_context MCP tool becomes available.

Parameters:

query (str, required): Natural language search query
limit (int, optional): Maximum results (1-100, default: 5)
offset (int, optional): Pagination offset (default: 0)
thread_id (str, optional): Filter by thread
source (str, optional): Filter by source ('user' or 'agent')
tags (list, optional): Filter by tags (OR logic)
content_type (str, optional): Filter by content type
start_date (str, optional): Filter by creation date (ISO 8601)
end_date (str, optional): Filter by creation date (ISO 8601)
metadata (dict, optional): Simple metadata filters
metadata_filters (list, optional): Advanced metadata filters
include_images (bool, optional): Include image data (default: false)
explain_query (bool, optional): Include statistics (default: false)

Metadata Filtering: See the Metadata Guide for operators and examples.

Returns:

{
  "query": "original search query",
  "results": [
    {
      "id": 123,
      "thread_id": "thread-abc",
      "text_content": "matching context...",
      "summary": "Summary of the matching context entry with key semantic information.",
      "is_text_content_truncated": true,
      "scores": {
        "semantic_distance": 0.234,
        "semantic_rank": null,
        "rerank_score": 0.92
      },
      "tags": ["tag1", "tag2"]
    }
  ],
  "count": 5,
  "model": "text-embedding-3-small",
  "stats": {
    "execution_time_ms": 85.3,
    "embedding_generation_ms": 45.1,
    "filters_applied": 2
  }
}

Scores Object:

semantic_distance: L2 Euclidean distance (LOWER = more similar)
semantic_rank: Always null for standalone semantic search (no ranking)
rerank_score: Cross-encoder relevance score (HIGHER = better, 0.0-1.0), present when reranking is enabled

Automatic Embedding Generation

Embeddings are generated automatically:

On store_context: Embeddings generated in background
On store_context_batch: Embeddings generated for each entry
On update_context: Embeddings regenerated when text changes
On update_context_batch: Embeddings regenerated for entries with text changes
On delete_context: Embeddings cascade deleted

Embedding-First Transactional Integrity

CRITICAL: When ENABLE_EMBEDDING_GENERATION=true (default when semantic search is enabled), embedding generation follows an "embedding-first" pattern:

Embeddings are generated FIRST (outside any database transaction)
If embedding generation fails, NO data is saved (operation returns error immediately)
If embedding generation succeeds, ALL database operations occur in a SINGLE atomic transaction

This ensures data consistency: you never have context entries without their corresponding embeddings when embedding generation is enabled.

Key Implications:

Embedding failures prevent data storage (not graceful degradation)
Network issues with embedding provider will cause store_context to fail
Consider retry logic in client code for transient embedding failures

Example Use Cases

Cross-thread discovery:

semantic_search_context(query="authentication implementation", limit=10)

Agent collaboration:

semantic_search_context(query="API rate limiting solutions", source="agent")

Metadata-filtered search:

semantic_search_context(
    query="performance optimization",
    metadata={"status": "completed"},
    metadata_filters=[{"key": "priority", "operator": "gte", "value": 7}]
)

Time-bounded search:

semantic_search_context(query="database optimization", start_date="2025-11-01")

Performance Characteristics

Embedding Generation: 50-150ms per text (varies by provider)
Similarity Search: O(n * d) where n = filtered entries, d = dimensions
Acceptable Scale: <100K context entries
Storage Impact: ~4 bytes per dimension per entry

Verification

Setup Checklist

Verify provider installation:

# For Ollama
ollama list
curl http://localhost:11434

# For API providers, check credentials are set
echo $OPENAI_API_KEY

Verify dependencies:

# Check LangChain packages installed
python -c "from langchain_openai import OpenAIEmbeddings; print('OK')"

Start server:

export ENABLE_SEMANTIC_SEARCH=true
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
uv run mcp-context-server

Check logs for:

[OK] Embedding provider initialized: openai
[OK] Semantic search enabled

Test functionality:

semantic_search_context(query="test", limit=5)

Troubleshooting

Startup Error Classification

The server classifies startup errors into two categories with different exit codes:

Configuration Errors (Exit Code 78)

These errors require human intervention to fix. The server will NOT recover by restarting:

Error Type	Example	Fix
Missing API key	`OPENAI_API_KEY environment variable is not set`	Set the required environment variable
Package not installed	`langchain-openai package not installed`	Run `uv sync --extra embeddings-openai`
Invalid provider	`Unknown provider: 'invalid'`	Use valid EMBEDDING_PROVIDER value
Vector storage dependencies	`vector storage dependencies not available`	Install required packages
Model not found	`model 'model' not found (status code: 404)`	Pull model: `ollama pull <model>` or fix EMBEDDING_MODEL

Dependency Errors (Exit Code 69)

These errors may resolve with time. Supervisor may retry with backoff:

Error Type	Example	Fix
Service not running	`Ollama service not accessible: Connection refused`	Start Ollama: `ollama serve`
Network issues	`Connection timed out`	Check network connectivity
Temporary unavailability	`Service returned status 503`	Wait and retry

Log Messages

The server logs startup errors with clear indicators:

# Configuration Error (will NOT retry)
[FATAL] Configuration error (will not retry): ENABLE_EMBEDDING_GENERATION=true but langchain-openai package not installed...

# Dependency Error (may retry)
[ERROR] Dependency unavailable (may retry): ENABLE_EMBEDDING_GENERATION=true but ollama is not available (service may be down)...

Provider Connection Errors

Error: Failed to connect to Ollama: Connection refused

Solutions:

Start Ollama: ollama serve
Check firewall allows port 11434
For Docker: use OLLAMA_HOST=http://host.docker.internal:11434

Error: OpenAI API key not found

Solutions:

Set OPENAI_API_KEY environment variable
Verify key is valid at platform.openai.com

Dimension Mismatch

Error: Embedding dimension mismatch: expected 1536, got 1024

Cause: Model produces different dimensions than configured

Solution: Update EMBEDDING_DIM to match model output, or use correct model

Model Not Found

Error: Model 'qwen3-embedding:0.6b' not found

Solution (Ollama):

ollama pull qwen3-embedding:0.6b

Dependencies Not Installed

Error: ImportError: langchain_openai not installed

Solution:

uv sync --extra embeddings-openai

macOS SQLite Extension Error

Error: AttributeError: 'sqlite3.Connection' object has no attribute 'enable_load_extension'

Cause: Default macOS Python lacks SQLite extension support

Solution:

brew install python
/opt/homebrew/bin/python3 -m venv .venv
source .venv/bin/activate
uv sync --extra embeddings-ollama

Common Error Messages

Error	Cause	Solution
`Connection refused`	Provider not running	Start Ollama or check API credentials
`API key invalid`	Wrong credentials	Verify API key is correct
`Model not found`	Model not available	Pull model or check model name
`Dimension mismatch`	Wrong EMBEDDING_DIM	Match dimension to model output
`Rate limit exceeded`	API rate limiting	Reduce request rate or upgrade plan

Performance Optimization

Provider Selection by Use Case

Use Case	Recommended Provider
Development	Ollama (free, local)
Production (general)	OpenAI (balanced)
Enterprise	Azure OpenAI (compliance)
Cost-sensitive	HuggingFace (free API)
RAG optimization	Voyage AI (specialized)

Batch Size (Voyage AI)

Adjust batch size for throughput vs latency:

VOYAGE_BATCH_SIZE=20  # Higher for throughput, lower for latency

Timeouts

Increase timeout for slow connections:

EMBEDDING_TIMEOUT_S=300

Additional Resources

Official Documentation

LangChain Embeddings: python.langchain.com/docs/integrations/text_embedding
Ollama: ollama.com
OpenAI Embeddings: platform.openai.com/docs/guides/embeddings
Azure OpenAI: learn.microsoft.com/azure/ai-services/openai
Voyage AI: docs.voyageai.com
LangSmith: smith.langchain.com

FilesExpand file tree

semantic-search.md

Latest commit

History

semantic-search.md

File metadata and controls

Semantic Search Guide

Introduction

Embedding Providers

Prerequisites

Provider-Specific Requirements

Installation

Step 1: Install Provider Dependencies

Step 2: Provider-Specific Setup

Provider Configuration

Ollama (Default)

Setup

Configuration

Environment Variables

Alternative Ollama Models

OpenAI

Setup

Configuration

Environment Variables

Available OpenAI Models

Azure OpenAI

Setup

Configuration

Environment Variables

HuggingFace

Setup

Configuration

Environment Variables

Recommended HuggingFace Models

Voyage AI

Setup

Configuration

Environment Variables

Available Voyage Models

LangSmith Tracing

Installation

Setup

Environment Variables

Viewing Traces

Text Chunking

Configuration

How It Works

Aggregation Methods

When to Adjust Settings

Cross-Encoder Reranking

Installation

Configuration

How It Works

Available Models

Performance Considerations

Context Length and Truncation Control

Truncation Behavior by Provider

Recommended Configuration

How Pre-Validation Works

Context Limits by Model

Startup Validation

Common Configuration Settings

Timeout and Retry Settings

Full Configuration Example

PostgreSQL Backend Setup

Docker PostgreSQL (pgvector/pgvector image)

Supabase Setup

Other PostgreSQL Deployments

Changing Embedding Dimensions

Common Dimensions by Model

Migration Procedure

Usage

semantic_search_context Tool

Automatic Embedding Generation

Embedding-First Transactional Integrity

Example Use Cases

Performance Characteristics

Verification

Setup Checklist

Troubleshooting