feat: hierarchical memory retrieval#1374
Conversation
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
There was a problem hiding this comment.
Pull request overview
This PR implements a hierarchical memory retrieval system with hybrid scoring for the semantic router, enabling multi-tier memory organization with category-based search and BM25/n-gram/vector score fusion.
Changes:
- Adds hierarchical memory structure with category nodes, parent-child relationships, and multi-level summaries (L0 abstract, L1 overview, L2 content)
- Implements hybrid scoring that combines vector similarity, BM25 keyword matching, and character n-gram Jaccard similarity
- Adds group-level memory sharing with visibility controls (user/group/public) and cross-memory relations
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/make/build-run-test.mk | Adds makefile targets for running hierarchical memory tests |
| src/semantic-router/pkg/memory/types.go | Defines hierarchical memory types including visibility, relations, and hybrid config |
| src/semantic-router/pkg/memory/testdata/generate_source_dataset.go | Generator script for source code evaluation dataset |
| src/semantic-router/pkg/memory/testdata/evaluation_dataset.json | Evaluation dataset with 30 memories across 6 clusters |
| src/semantic-router/pkg/memory/store.go | Adds HierarchicalStore interface and type assertions |
| src/semantic-router/pkg/memory/relations.go | Implements automatic bidirectional memory relation linking |
| src/semantic-router/pkg/memory/milvus_store.go | Updates Milvus schema with hierarchical fields |
| src/semantic-router/pkg/memory/milvus_hierarchical.go | Milvus implementation of hierarchical retrieval |
| src/semantic-router/pkg/memory/inmemory_store.go | Adds relations map to in-memory store |
| src/semantic-router/pkg/memory/inmemory_hierarchical.go | In-memory implementation of hierarchical retrieval |
| src/semantic-router/pkg/memory/hybrid_score.go | BM25, n-gram indexing, and score fusion logic |
| src/semantic-router/pkg/memory/hybrid_hierarchical_comparison_test.go | Three-way comparison tests showing +6.2% improvement |
| src/semantic-router/pkg/memory/hierarchical_test.go | Unit tests for hierarchical retrieval components |
| src/semantic-router/pkg/memory/hierarchical_retrieve.go | Generic two-phase hierarchical retrieval algorithm |
| src/semantic-router/pkg/memory/hierarchical_comparison_test.go | Precision/recall comparison tests |
| src/semantic-router/pkg/memory/hierarchical_benchmark_test.go | Performance benchmarks for retrieval operations |
| src/semantic-router/pkg/memory/categorizer.go | Auto-categorization and summary generation logic |
| src/semantic-router/pkg/extproc/req_filter_memory_test.go | Tests for hybrid config wiring |
| src/semantic-router/pkg/extproc/req_filter_memory.go | Integrates hybrid scoring into request filter |
| src/semantic-router/pkg/extproc/processor_req_body.go | Adds hierarchical/group retrieval to request processor |
| src/semantic-router/pkg/config/config_test.go | Tests for new hierarchical/hybrid config fields |
| src/semantic-router/pkg/config/config.go | Adds hierarchical and hybrid config fields |
| scripts/test-retrieval-api.sh | End-to-end test script for hierarchical retrieval |
| e2e/testing/mock-vllm-echo.py | Echo mock for memory injection verification |
| config/testing/envoy-retrieval-test.yaml | Envoy config for retrieval tests |
| config/testing/config.memory-hierarchical.yaml | Router config for hierarchical testing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| MemoryTypeSemantic MemoryType = "semantic" | ||
|
|
||
| // MemoryTypeProcedural represents instructions, how-to, steps. | ||
| // Example: "To deploy payment-service: run npm build, then docker push" | ||
| MemoryTypeProcedural MemoryType = "procedural" | ||
|
|
||
| // MemoryTypeEpisodic represents session summaries, past events. | ||
| // Example: "On Dec 29 2024, user planned Hawaii vacation with $10K budget" | ||
| MemoryTypeEpisodic MemoryType = "episodic" |
There was a problem hiding this comment.
The example comments for these memory types were removed. Consider restoring brief inline examples to help developers understand the difference between semantic, procedural, and episodic memory types.
| schema := &entity.Schema{ | ||
| CollectionName: m.collectionName, | ||
| Description: "Agentic Memory storage for cross-session context", | ||
| Description: "Agentic Memory storage with hierarchical organization and group sharing", |
There was a problem hiding this comment.
Corrected spelling of 'Agentic' to 'Agentive' or 'Agent-based'. 'Agentic' is not a standard English term in technical contexts.
| SKIP_SEED="${SKIP_SEED:-0}" | ||
| EXTRACTION_WAIT="${EXTRACTION_WAIT:-12}" | ||
|
|
||
| USER_ID="retrieval_test_$(date +%s)" |
There was a problem hiding this comment.
The USER_ID uses epoch timestamp which may collide if tests run in parallel within the same second. Consider adding a random suffix or using date +%s%N for nanosecond precision.
| USER_ID="retrieval_test_$(date +%s)" | |
| USER_ID="retrieval_test_$(date +%s%N)" |
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 27 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // FusedScore computes the hybrid score for a single memory given its cosine | ||
| // similarity and the query string. Returns the fused score. | ||
| func (s *MemHybridScorer) FusedScore(memID string, cosineSim float32, query string) float32 { | ||
| bm25Scores := s.bm25.Score(query, s.cfg.BM25K1, s.cfg.BM25B) | ||
| ngramScores := s.ngram.Score(query) | ||
| return s.fuseOne(memID, cosineSim, bm25Scores, ngramScores) | ||
| } |
There was a problem hiding this comment.
MemHybridScorer.FusedScore recomputes BM25 and n-gram scores for the entire corpus on every call, making callers that score many memories (e.g., hierarchical retrieval loops) effectively O(N²) per query. Cache per-query score maps inside MemHybridScorer or use the existing FusedScores batch API in callers to compute BM25/ngram once per query.
| func BuildGroupFilter(userID string, groupIDs []string, includeGroup bool) string { | ||
| if !includeGroup || len(groupIDs) == 0 { | ||
| return fmt.Sprintf("user_id == \"%s\"", userID) | ||
| } |
There was a problem hiding this comment.
BuildGroupFilter drops public memories when includeGroup=true but groupIDs is empty because it falls back to user_id == .... That conflicts with the VisibilityPublic semantics ("any user") and with InMemoryStore.passesAccessFilter (which admits public when IncludeGroupLevel is true). Consider adding an explicit || visibility == "public" clause when includeGroup is enabled, regardless of groupIDs.
|
|
||
| // CrossGroup allows linking memories across group boundaries when both are | ||
| // visible to each other (group or public visibility). | ||
| CrossGroup bool |
There was a problem hiding this comment.
AutoLinkOptions.CrossGroup is currently unused (AutoLinkNewMemory always searches with RetrieveOptions{UserID: newMem.UserID} and never broadens scope/filters for cross-group visibility). Either implement the cross-group retrieval logic or remove the option to avoid a misleading configuration knob.
| // CrossGroup allows linking memories across group boundaries when both are | |
| // visible to each other (group or public visibility). | |
| CrossGroup bool |
| // +build ignore | ||
|
|
There was a problem hiding this comment.
This generator uses the legacy // +build ignore tag only. For Go 1.17+ compatibility and go vet/tooling, add the new build constraint form too (i.e., //go:build ignore plus the existing // +build ignore).
| // Define schema for agentic memory (v2: includes hierarchical + group fields) | ||
| schema := &entity.Schema{ | ||
| CollectionName: m.collectionName, | ||
| Description: "Agentic Memory storage for cross-session context", | ||
| Description: "Agentic Memory storage with hierarchical organization and group sharing", | ||
| AutoID: false, |
There was a problem hiding this comment.
ensureCollection now defines a v2 schema with new hierarchical fields, but when the collection already exists the function returns without validating/migrating the schema. Upgrades against an existing collection can silently miss required fields (group_id/parent_id/is_category/visibility/abstract), breaking hierarchical retrieval. Consider a versioned collection name, a schema check with a clear error, or an explicit migration path.
| s.mu.RLock() | ||
| defer s.mu.RUnlock() | ||
|
|
||
| queryEmbedding, err := GenerateEmbedding(opts.Query, s.embeddingConfig) |
There was a problem hiding this comment.
HierarchicalRetrieveWithConfig always generates queryEmbedding using s.embeddingConfig, and link expansion reuses that embedding. opts.LinkEmbeddingConfig is never honored in the InMemory hierarchical implementation, so callers can’t control the embedding model used for scoring linked memories as advertised by HierarchicalRetrieveOptions. Consider selecting the embedding config based on opts.LinkEmbeddingConfig when FollowLinks is enabled (or removing the option for in-memory).
| queryEmbedding, err := GenerateEmbedding(opts.Query, s.embeddingConfig) | |
| embeddingConfig := s.embeddingConfig | |
| if opts.FollowLinks { | |
| embeddingConfig = opts.LinkEmbeddingConfig | |
| } | |
| queryEmbedding, err := GenerateEmbedding(opts.Query, embeddingConfig) |
| metadata["overview"] = memory.Overview | ||
| } | ||
| metadataJSON, err := json.Marshal(metadata) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to marshal metadata: %w", err) | ||
| } | ||
|
|
There was a problem hiding this comment.
Relations won’t persist in Milvus: StoreRelation/appendRelatedID only updates Memory.RelatedIDs and then calls upsert, but Store/upsert don’t serialize RelatedIDs into any Milvus column (metadata JSON also omits related_ids). Either add a dedicated RelatedIDs field in the collection schema or include RelatedIDs in the metadata JSON and make Get/Retrieve populate it consistently.
| func (m *MilvusStore) HierarchicalRetrieveWithConfig(ctx context.Context, opts HierarchicalRetrieveOptions, cfg MilvusHierarchicalConfig) ([]*RetrieveResult, error) { | ||
| if !m.enabled { | ||
| return nil, fmt.Errorf("milvus store is not enabled") | ||
| } | ||
|
|
||
| opts.ApplyDefaults() | ||
| cfg.ApplyDefaults() | ||
|
|
||
| limit := opts.Limit | ||
| if limit <= 0 { | ||
| limit = m.config.DefaultRetrievalLimit | ||
| } | ||
| threshold := opts.Threshold | ||
| if threshold <= 0 { | ||
| threshold = m.config.DefaultSimilarityThreshold | ||
| } | ||
|
|
||
| if opts.Query == "" { | ||
| return nil, fmt.Errorf("query is required") | ||
| } | ||
| if opts.UserID == "" && !opts.IncludeGroupLevel { | ||
| return nil, fmt.Errorf("user id or group ids required") | ||
| } | ||
|
|
||
| logging.Debugf("MilvusStore.HierarchicalRetrieve: query='%s', user_id='%s', groups=%v, limit=%d", | ||
| truncateForLog(opts.Query, 60), opts.UserID, opts.GroupIDs, limit) | ||
|
|
||
| queryEmbedding, err := GenerateEmbedding(opts.Query, m.embeddingConfig) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to generate embedding: %w", err) | ||
| } | ||
|
|
||
| baseFilter := BuildGroupFilter(opts.UserID, opts.GroupIDs, opts.IncludeGroupLevel) | ||
|
|
||
| if len(opts.Types) > 0 { | ||
| typeFilter := "(" | ||
| for i, memType := range opts.Types { | ||
| if i > 0 { |
There was a problem hiding this comment.
MilvusStore.HierarchicalRetrieveWithConfig ignores key options (opts.Hybrid and opts.FollowLinks/MaxLinkDepth/LinkEmbeddingConfig). Because MilvusStore implements HierarchicalStore, callers will take this path and never get hybrid fusion or link expansion in production. Either implement these features here (or explicitly reject them with an error) so behavior matches InMemory/generic retrieval and the plugin config wiring.
1. Graph Concepts Without Neo4jYou're implementing graph concepts here without the complexity of running a new database like Neo4j. Maybe we'll need to implement that in the future for multi-hop visualization or other graph benefits, but for now this is a great improvement. Note: Tracking issue #1293 has Neo4j as P3. 2. Retention Scoring IntegrationFuture work mentioned in tracking issue #1293 is adding quality feedback to distinguish "accessed often" from "actually useful". That's complementary and doesn't need to block this PR. 3. Memory Type RoutingThe Critical Issues1. Category Creation Race ConditionLocation: There's a race here where two concurrent requests can both create the same category, causing Milvus primary key violations. This will cause production failures under load. Related: This is the "Concurrency handling" item from tracking issue #1293 (currently P2). For now I think adding lock for category creation will solve the issue. 2. Category Pruning Edge CaseCategories are naturally protected (broad semantic match → high access count). Performance Question
This relates to "Load testing at scale" in tracking issue #1293. |
|
@yehudit1987 yes adding additional memory management is a good point. We can break it down into two approaches:
I feel the 2nd approach is more scalable, especially the vector, file, and memory are unified in #1383. In that way, we can build more memory structure like episodic, graph, etc. This sidecar can also be exposed as a claude skill too. |

Hierarchical Memory with Hybrid Retrieval
End-to-End Pipeline
flowchart LR subgraph Request["Request Path"] direction TB A[User Message] --> B[ExtProc: Request Body] B --> C[Query Rewrite<br/><i>LLM call</i>] C --> D[Memory Retrieval<br/><i>hierarchical + hybrid</i>] D --> E[Inject into<br/>System Prompt] E --> F[Route to LLM] end subgraph Response["Response Path"] direction TB G[LLM Response] --> H[ExtProc: Response Body] H --> I[Memory Extraction<br/><i>async, LLM call</i>] I --> J[Deduplication] J --> K[Categorize +<br/>Generate Embedding] K --> L[(Milvus Store)] end Request --> Response style D fill:#2d6a4f,color:#fff style I fill:#d4a373,color:#000 style L fill:#264653,color:#fffHierarchical Memory Tree
graph TD Root["👤 User Memory Space"] Root --> Cat1["📁 Programming<br/><small>IsCategory=true</small><br/><small>L0 abstract: <i>Rust, Go, systems</i></small>"] Root --> Cat2["📁 Cooking<br/><small>IsCategory=true</small><br/><small>L0 abstract: <i>Italian, pasta, herbs</i></small>"] Root --> Cat3["📁 Travel<br/><small>IsCategory=true</small><br/><small>L0 abstract: <i>Japan, Asia</i></small>"] Cat1 --> Leaf1["📝 Rust facts<br/><small>L2: User learns Rust,<br/>uses cargo, likes borrow checker</small>"] Cat1 --> Leaf2["📝 Go facts<br/><small>L2: User deploys Go<br/>microservices on K8s</small>"] Cat2 --> Leaf3["📝 Pesto recipe<br/><small>L2: Signature dish is<br/>pesto pasta, every Friday</small>"] Cat2 --> Leaf4["📝 Bread baking<br/><small>L2: Bakes sourdough<br/>on weekends</small>"] Cat3 --> Leaf5["📝 Tokyo trip<br/><small>L2: Visited Shibuya,<br/>Tsukiji market</small>"] Cat3 --> Leaf6["📝 Kyoto trip<br/><small>L2: Visited temples,<br/>bamboo forest</small>"] Leaf1 -. "RelatedIDs<br/>(cross-link)" .-> Leaf5 style Root fill:#1b263b,color:#fff style Cat1 fill:#2d6a4f,color:#fff style Cat2 fill:#2d6a4f,color:#fff style Cat3 fill:#2d6a4f,color:#fff style Leaf1 fill:#457b9d,color:#fff style Leaf2 fill:#457b9d,color:#fff style Leaf3 fill:#457b9d,color:#fff style Leaf4 fill:#457b9d,color:#fff style Leaf5 fill:#457b9d,color:#fff style Leaf6 fill:#457b9d,color:#fffMulti-Tier Summaries
graph LR L0["<b>L0: Abstract</b><br/>Short phrase<br/><i>Fast candidate scoring</i>"] --> L1["<b>L1: Overview</b><br/>Paragraph<br/><i>Reranking & navigation</i>"] --> L2["<b>L2: Content</b><br/>Full detail<br/><i>Injected into LLM context</i>"] style L0 fill:#e9c46a,color:#000 style L1 fill:#f4a261,color:#000 style L2 fill:#e76f51,color:#fffMemory Storage Pipeline
flowchart TD A["Conversation Turn<br/>(user + assistant messages)"] --> B["MemoryExtractor.ProcessResponse()"] B --> C["Build extraction prompt"] C --> D["LLM Call<br/><i>external_models: memory_extraction</i>"] D --> E["Parse JSON facts<br/><code>[]ExtractedFact</code>"] E --> F{"Similar memory<br/>already exists?"} F -- "Yes (score > 0.9)" --> G["Update existing<br/>memory"] F -- "No" --> H["Create new memory"] H --> I["extractTopic()<br/><i>keyword-based categorization</i>"] I --> J["Find or create<br/>category node"] J --> K["Set ParentID,<br/>Abstract (L0),<br/>Overview (L1)"] K --> L["GenerateEmbedding()<br/><i>BERT model</i>"] L --> M[("Store in Milvus<br/><small>content, embedding,<br/>user_id, parent_id,<br/>is_category, group_id,<br/>visibility</small>")] style B fill:#d4a373,color:#000 style D fill:#bc6c25,color:#fff style M fill:#264653,color:#fffTwo-Phase Hierarchical Retrieval
flowchart TD Q["User Query"] --> QR["Query Rewrite (optional)<br/><i>LLM call via memory_rewrite model</i>"] QR --> P1 subgraph P1["PHASE 1 — Broad Category Search"] direction TB S1["Milvus vector search<br/><small>threshold × 0.8 (relaxed)</small><br/><small>limit = max(categorySearchTopK, limit×4)</small>"] S1 --> Split{"IsCategory?"} Split -- "true" --> CatQ["Category nodes → <b>Priority Queue</b><br/><small>seeded if score ≥ threshold × 0.8</small>"] Split -- "false" --> Leaves["Leaf memories → <b>Collected</b><br/><small>if score ≥ threshold</small>"] end P1 --> P2 subgraph P2["PHASE 2 — Drill-Down with Score Propagation"] direction TB Pop["Pop top-scoring category<br/>from priority queue"] --> Search["Search children<br/><small>where ParentID == category.ID</small>"] Search --> ChildType{"Child type?"} ChildType -- "Category" --> Push["Push to priority queue<br/><small>with propagated score</small>"] ChildType -- "Leaf" --> Prop["Score Propagation:<br/><code>α·child + (1-α)·parent</code>"] Prop --> Thresh{"score ≥<br/>threshold?"} Thresh -- "Yes" --> Collect["Add to collected results"] Thresh -- "No" --> Discard["Discard"] Push --> Conv{"Top-K set<br/>unchanged for<br/>3 rounds?"} Collect --> Conv Conv -- "No" --> Pop Conv -- "Yes" --> Done["Convergence → stop"] end P2 --> TopK["Sort collected by score → Top-K"] TopK --> LinkExp subgraph LinkExp["PHASE 3 — Graph Expansion (optional, follow_links: true)"] direction TB Scan["For each result, follow<br/><b>RelatedIDs</b> cross-links"] --> Fetch["Fetch linked memory<br/><small>store.Get(linkedID)</small>"] Fetch --> Score["Score with same pipeline:<br/><code>cosineSim(queryEmb, linked.Emb)</code><br/>+ hybrid fusion if enabled"] Score --> Blend["Blend:<br/><code>referrer.Score × 0.8 + directScore × 0.2</code>"] Blend --> LinkThresh{"blended ≥<br/>threshold?"} LinkThresh -- "Yes" --> LinkAdd["Add to results<br/><small>+ push to next-hop frontier</small>"] LinkThresh -- "No" --> LinkSkip["Skip"] LinkAdd --> Hop{"more hops?<br/><small>(up to MaxLinkDepth)</small>"} Hop -- "Yes" --> Scan Hop -- "No" --> LinkDone["Re-sort + trim to Top-K"] end LinkExp --> Inject["Format as system prompt context<br/><b>## User's Relevant Context</b>"] style TopK fill:#2d6a4f,color:#fff style Inject fill:#e76f51,color:#fff style LinkExp fill:noneHybrid Scoring (applied at each phase)
flowchart LR subgraph Signals["Three Scoring Signals"] direction TB V["🔢 Vector Cosine<br/><small>embedding similarity<br/>from Milvus ANN search</small>"] B["📖 BM25 Keyword<br/><small>TF-IDF term matching<br/>(MemBM25Index)</small>"] N["🔤 N-gram Jaccard<br/><small>character n-gram overlap<br/>(MemNgramIndex)</small>"] end subgraph Fusion["Score Fusion"] direction TB W["<b>Weighted</b><br/><code>wV·cos + wB·bm25 + wN·ngram</code><br/><small>default: 0.7 / 0.2 / 0.1</small>"] R["<b>RRF</b><br/><code>Σ 1/(k + rank_i)</code><br/><small>reciprocal rank fusion</small>"] end V --> Fusion B --> Fusion N --> Fusion Fusion --> Out["Fused Score<br/><small>used for ranking<br/>and threshold filtering</small>"] style V fill:#457b9d,color:#fff style B fill:#e9c46a,color:#000 style N fill:#f4a261,color:#000 style W fill:#2d6a4f,color:#fff style R fill:#2d6a4f,color:#fff style Out fill:#e76f51,color:#fffGroup-Level Memory Sharing
flowchart TD subgraph Access["Visibility Levels"] direction LR U["🔒 <b>user</b><br/>Owner only"] G["👥 <b>group</b><br/>Same GroupID members"] P["🌐 <b>public</b><br/>Any user"] end subgraph Filter["Milvus Filter Expression"] F["<code>(user_id == 'alice')</code><br/><code>OR</code><br/><code>(group_id IN ['team-backend']</code><br/><code> AND visibility IN ['group','public'])</code>"] end Access --> Filter style U fill:#264653,color:#fff style G fill:#2a9d8f,color:#fff style P fill:#e9c46a,color:#000 style F fill:#1b263b,color:#fffConfiguration
Key Source Files
pkg/memory/types.goMemorystruct:ParentID,IsCategory,Abstract,Overview,Visibility,RelatedIDspkg/memory/hierarchical_retrieve.goexpandViaLinkspkg/memory/hybrid_score.goMemBM25Index,MemNgramIndex,MemHybridScorer— score fusionpkg/memory/inmemory_hierarchical.goHierarchicalStoreimplementationpkg/memory/milvus_hierarchical.goHierarchicalStoreimplementationpkg/memory/extractor.gopkg/memory/categorizer.gopkg/extproc/processor_req_body.gopkg/extproc/req_filter_memory.gopkg/config/config.goMemoryPluginConfigwith hierarchical + hybrid fieldsEvaluation Results
Three-Way Comparison: Flat vs Hierarchical vs Hierarchical+Hybrid
Dataset: 30 memories across 6 topic clusters (deployment, memory, safety, rag, architecture, evaluation), with one query per cluster. Retrieval at k=5, threshold 0.30.
Per-Query Precision@5
Averages
Deltas
Weight Sweep: Effect of BM25 and N-gram Weight
Hybrid Score Unit Test
Validates that BM25/n-gram fusion correctly boosts documents with exact keyword overlap.
Query:
"Helm charts Kubernetes deployment"Doc A (matching keywords) retains the highest fused score; docs without relevant terms are penalized.
E2E Integration Test
Seeds 5 topic memories (technology, cooking, travel, sports, music) through the full Envoy → ExtProc → LLM extraction → Milvus pipeline, then verifies retrieval in new sessions:
Cross-Document Link Expansion: Four-Way Strategy Comparison
Memories are organized across 4 categories (DevOps, Finance, ML, Compliance). Two cross-domain links are created via
RelatedIDs:Four retrieval strategies are tested against 2 queries designed to find the direct match AND the linked cross-domain memory:
Key findings:
helm-deploy. Cannot reach the Finance subtree because there is no semantic path between "Kubernetes Helm charts" and "quarterly spend allocation."helm-deploy, follows itsRelatedIDsto fetchfinance-budget, scores it via embedding cosine similarity (0.826 blended), and adds it to results. 100% cross-category recall.TestFollowLinks_MultiHop: Chain of 3 memories linked
a → b → cwith decreasing semantic similarity to the query. WithMaxLinkDepth=1, onlyaandbare found. WithMaxLinkDepth=2, all three are found through two hops of traversal.Related Work Context
The cross-document linking problem is well-studied in recent research:
Our
RelatedIDsapproach is lightweight by comparison — no NER, no entity resolution, no proposition extraction. Links are explicit metadata that can be set by the application, an LLM, or a human. The four-way test above demonstrates that this simple mechanism bridges the cross-category gap that neither tree traversal nor hybrid search can close on their own.Interpretation
v=0.7 b=0.2 n=0.1or RRF improve weak clusters without degrading strong ones. Over-weighting BM25 (≥0.3) hurts clusters where keyword overlap is misleading.follow_links: true) discovers cross-category memories that no tree-only or hybrid-only strategy can find. When the linked memory shares zero vocabulary with the query, only the explicitRelatedIDslink provides a retrieval path. Linked memories are scored with embedding cosine similarity (not hybrid — BM25/n-gram would penalize the cross-domain vocabulary gap), blended with the referrer's score as the primary relevance signal. Multi-hop traversal (max_link_depth: 2+) extends reach along relation chains.