The search package (pkg/search/) implements a hybrid search system combining:
- Vector Search - Cosine similarity on 1024-dimensional embeddings
- BM25 Full-Text Search - Keyword search with TF-IDF scoring
- RRF (Reciprocal Rank Fusion) - Industry-standard algorithm to merge rankings
This is the same approach used by:
- Azure AI Search
- Elasticsearch
- Weaviate
- Google Cloud Search
┌─────────────────────────────────────────────────────────────┐
│ Search Service │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ Vector Index │ │ Fulltext Index │ │ Storage │ │
│ │ (Cosine Sim) │ │ (BM25) │ │ Engine │ │
│ └────────┬────────┘ └────────┬────────┘ └──────┬──────┘ │
│ │ │ │ │
│ └────────────┬───────┘ │ │
│ │ │ │
│ ┌─────────▼─────────┐ │ │
│ │ RRF Fusion │─────────────────┘ │
│ │ score = Σ(w/(k+r))│ │
│ └─────────┬─────────┘ │
│ │ │
│ ┌─────────▼─────────┐ │
│ │ Ranked Results │ │
│ └───────────────────┘ │
└─────────────────────────────────────────────────────────────┘
The BM25 search indexes these node properties (matching Mimir's Neo4j node_search index):
| Property | Description |
|---|---|
content |
Main content field |
text |
Text content (file chunks) |
title |
Node titles |
name |
Node names |
description |
Node descriptions |
path |
File paths |
workerRole |
Agent worker roles |
requirements |
Task requirements |
All properties are concatenated and indexed together - a search for "docker configuration" will match nodes where any of these fields contain those terms.
Formula: RRF_score(doc) = Σ (weight_i / (k + rank_i))
Where:
k= constant (default: 60)rank_i= rank of document in result set i (1-indexed)weight_i= importance weight for result set i
The system automatically adjusts weights based on query characteristics:
| Query Type | Words | Vector Weight | BM25 Weight | Rationale |
|---|---|---|---|---|
| Short | 1-2 | 0.5 | 1.5 | Exact keyword matching |
| Medium | 3-5 | 1.0 | 1.0 | Balanced |
| Long | 6+ | 1.5 | 0.5 | Semantic understanding |
import "github.com/orneryd/nornicdb/pkg/search"
// Create search service
engine := storage.NewMemoryEngine()
svc := search.NewService(engine)
// Build indexes from storage
ctx := context.Background()
svc.BuildIndexes(ctx)
// Hybrid RRF search
opts := search.DefaultSearchOptions()
opts.Limit = 10
opts.MinSimilarity = 0.5
response, err := svc.Search(ctx, "authentication security", embedding, opts)
// Results include RRF metadata
for _, r := range response.Results {
fmt.Printf("%s: rrf=%.4f vectorRank=%d bm25Rank=%d\n",
r.ID, r.RRFScore, r.VectorRank, r.BM25Rank)
}type SearchOptions struct {
Limit int // Max results (default: 50)
MinSimilarity float64 // Vector threshold (default: 0.5)
Types []string // Filter by node labels
// RRF configuration
RRFK float64 // RRF constant (default: 60)
VectorWeight float64 // Vector weight (default: 1.0)
BM25Weight float64 // BM25 weight (default: 1.0)
MinRRFScore float64 // Min RRF score (default: 0.01)
}The search automatically falls back when needed:
- RRF Hybrid (if embedding provided)
- Vector Only (if BM25 returns no results)
- Full-Text Only (if no embedding or vector search fails)
Search results are cached by the unified search service so that repeated identical requests (same query and options) return immediately from memory. The cache is shared by all entry points (HTTP /nornicdb/search, Cypher vector procedures, MCP, etc.).
- Key: Query text + options (limit, types/labels, rerank, MMR settings). Same inputs ⇒ cache hit.
- Size: Up to 1000 entries (LRU); entries expire after 5 minutes (TTL).
- Invalidation: The cache is cleared whenever the index changes (
IndexNodeorRemoveNode), so results stay correct after updates.
Use the same query and options for repeated calls to benefit from the cache (e.g. same search box query and limit).
| Operation | Scale | Time |
|---|---|---|
| Vector Search | 10K vectors | ~8.5ms |
| BM25 Search | 10K documents | ~255µs |
| RRF Fusion | 100 candidates | ~27µs |
| Index Build | 38K nodes | ~5.4s |
# Run all search tests
go test -v ./pkg/search/...
# Run with real Neo4j data
go test -v ./pkg/search/... -run RealData
# Run benchmarks
go test -bench=. ./pkg/search/...search.go- Main service with RRF fusionvector_index.go- Cosine similarity searchfulltext_index.go- BM25 inverted indexsearch_test.go- Comprehensive tests
- HNSW Index - O(log n) vector search vs current O(n)
- GPU Acceleration - Metal/CUDA for vector operations
- Stemming - Better text normalization
- Field Boosting - Weight title matches higher
- Phrase Search - Exact phrase matching