feat: hierarchical memory retrieval by rootfs · Pull Request #1374 · vllm-project/semantic-router

rootfs · 2026-02-23T14:27:36Z

Hierarchical Memory with Hybrid Retrieval

End-to-End Pipeline

flowchart LR
    subgraph Request["Request Path"]
        direction TB
        A[User Message] --> B[ExtProc: Request Body]
        B --> C[Query Rewrite<br/><i>LLM call</i>]
        C --> D[Memory Retrieval<br/><i>hierarchical + hybrid</i>]
        D --> E[Inject into<br/>System Prompt]
        E --> F[Route to LLM]
    end

    subgraph Response["Response Path"]
        direction TB
        G[LLM Response] --> H[ExtProc: Response Body]
        H --> I[Memory Extraction<br/><i>async, LLM call</i>]
        I --> J[Deduplication]
        J --> K[Categorize +<br/>Generate Embedding]
        K --> L[(Milvus Store)]
    end

    Request --> Response

    style D fill:#2d6a4f,color:#fff
    style I fill:#d4a373,color:#000
    style L fill:#264653,color:#fff

Hierarchical Memory Tree

graph TD
    Root["👤 User Memory Space"]

    Root --> Cat1["📁 Programming<br/><small>IsCategory=true</small><br/><small>L0 abstract: <i>Rust, Go, systems</i></small>"]
    Root --> Cat2["📁 Cooking<br/><small>IsCategory=true</small><br/><small>L0 abstract: <i>Italian, pasta, herbs</i></small>"]
    Root --> Cat3["📁 Travel<br/><small>IsCategory=true</small><br/><small>L0 abstract: <i>Japan, Asia</i></small>"]

    Cat1 --> Leaf1["📝 Rust facts<br/><small>L2: User learns Rust,<br/>uses cargo, likes borrow checker</small>"]
    Cat1 --> Leaf2["📝 Go facts<br/><small>L2: User deploys Go<br/>microservices on K8s</small>"]

    Cat2 --> Leaf3["📝 Pesto recipe<br/><small>L2: Signature dish is<br/>pesto pasta, every Friday</small>"]
    Cat2 --> Leaf4["📝 Bread baking<br/><small>L2: Bakes sourdough<br/>on weekends</small>"]

    Cat3 --> Leaf5["📝 Tokyo trip<br/><small>L2: Visited Shibuya,<br/>Tsukiji market</small>"]
    Cat3 --> Leaf6["📝 Kyoto trip<br/><small>L2: Visited temples,<br/>bamboo forest</small>"]

    Leaf1 -. "RelatedIDs<br/>(cross-link)" .-> Leaf5

    style Root fill:#1b263b,color:#fff
    style Cat1 fill:#2d6a4f,color:#fff
    style Cat2 fill:#2d6a4f,color:#fff
    style Cat3 fill:#2d6a4f,color:#fff
    style Leaf1 fill:#457b9d,color:#fff
    style Leaf2 fill:#457b9d,color:#fff
    style Leaf3 fill:#457b9d,color:#fff
    style Leaf4 fill:#457b9d,color:#fff
    style Leaf5 fill:#457b9d,color:#fff
    style Leaf6 fill:#457b9d,color:#fff

Multi-Tier Summaries

graph LR
    L0["<b>L0: Abstract</b><br/>Short phrase<br/><i>Fast candidate scoring</i>"] --> L1["<b>L1: Overview</b><br/>Paragraph<br/><i>Reranking & navigation</i>"] --> L2["<b>L2: Content</b><br/>Full detail<br/><i>Injected into LLM context</i>"]

    style L0 fill:#e9c46a,color:#000
    style L1 fill:#f4a261,color:#000
    style L2 fill:#e76f51,color:#fff

Memory Storage Pipeline

flowchart TD
    A["Conversation Turn<br/>(user + assistant messages)"] --> B["MemoryExtractor.ProcessResponse()"]

    B --> C["Build extraction prompt"]
    C --> D["LLM Call<br/><i>external_models: memory_extraction</i>"]
    D --> E["Parse JSON facts<br/><code>[]ExtractedFact</code>"]

    E --> F{"Similar memory<br/>already exists?"}
    F -- "Yes (score > 0.9)" --> G["Update existing<br/>memory"]
    F -- "No" --> H["Create new memory"]

    H --> I["extractTopic()<br/><i>keyword-based categorization</i>"]
    I --> J["Find or create<br/>category node"]
    J --> K["Set ParentID,<br/>Abstract (L0),<br/>Overview (L1)"]
    K --> L["GenerateEmbedding()<br/><i>BERT model</i>"]
    L --> M[("Store in Milvus<br/><small>content, embedding,<br/>user_id, parent_id,<br/>is_category, group_id,<br/>visibility</small>")]

    style B fill:#d4a373,color:#000
    style D fill:#bc6c25,color:#fff
    style M fill:#264653,color:#fff

Two-Phase Hierarchical Retrieval

flowchart TD
    Q["User Query"] --> QR["Query Rewrite (optional)<br/><i>LLM call via memory_rewrite model</i>"]

    QR --> P1

    subgraph P1["PHASE 1 — Broad Category Search"]
        direction TB
        S1["Milvus vector search<br/><small>threshold × 0.8 (relaxed)</small><br/><small>limit = max(categorySearchTopK, limit×4)</small>"]
        S1 --> Split{"IsCategory?"}
        Split -- "true" --> CatQ["Category nodes → <b>Priority Queue</b><br/><small>seeded if score ≥ threshold × 0.8</small>"]
        Split -- "false" --> Leaves["Leaf memories → <b>Collected</b><br/><small>if score ≥ threshold</small>"]
    end

    P1 --> P2

    subgraph P2["PHASE 2 — Drill-Down with Score Propagation"]
        direction TB
        Pop["Pop top-scoring category<br/>from priority queue"] --> Search["Search children<br/><small>where ParentID == category.ID</small>"]
        Search --> ChildType{"Child type?"}
        ChildType -- "Category" --> Push["Push to priority queue<br/><small>with propagated score</small>"]
        ChildType -- "Leaf" --> Prop["Score Propagation:<br/><code>α·child + (1-α)·parent</code>"]
        Prop --> Thresh{"score ≥<br/>threshold?"}
        Thresh -- "Yes" --> Collect["Add to collected results"]
        Thresh -- "No" --> Discard["Discard"]
        Push --> Conv{"Top-K set<br/>unchanged for<br/>3 rounds?"}
        Collect --> Conv
        Conv -- "No" --> Pop
        Conv -- "Yes" --> Done["Convergence → stop"]
    end

    P2 --> TopK["Sort collected by score → Top-K"]

    TopK --> LinkExp

    subgraph LinkExp["PHASE 3 — Graph Expansion (optional, follow_links: true)"]
        direction TB
        Scan["For each result, follow<br/><b>RelatedIDs</b> cross-links"] --> Fetch["Fetch linked memory<br/><small>store.Get(linkedID)</small>"]
        Fetch --> Score["Score with same pipeline:<br/><code>cosineSim(queryEmb, linked.Emb)</code><br/>+ hybrid fusion if enabled"]
        Score --> Blend["Blend:<br/><code>referrer.Score × 0.8 + directScore × 0.2</code>"]
        Blend --> LinkThresh{"blended ≥<br/>threshold?"}
        LinkThresh -- "Yes" --> LinkAdd["Add to results<br/><small>+ push to next-hop frontier</small>"]
        LinkThresh -- "No" --> LinkSkip["Skip"]
        LinkAdd --> Hop{"more hops?<br/><small>(up to MaxLinkDepth)</small>"}
        Hop -- "Yes" --> Scan
        Hop -- "No" --> LinkDone["Re-sort + trim to Top-K"]
    end

    LinkExp --> Inject["Format as system prompt context<br/><b>## User's Relevant Context</b>"]

    style TopK fill:#2d6a4f,color:#fff
    style Inject fill:#e76f51,color:#fff
    style LinkExp fill:none

Hybrid Scoring (applied at each phase)

flowchart LR
    subgraph Signals["Three Scoring Signals"]
        direction TB
        V["🔢 Vector Cosine<br/><small>embedding similarity<br/>from Milvus ANN search</small>"]
        B["📖 BM25 Keyword<br/><small>TF-IDF term matching<br/>(MemBM25Index)</small>"]
        N["🔤 N-gram Jaccard<br/><small>character n-gram overlap<br/>(MemNgramIndex)</small>"]
    end

    subgraph Fusion["Score Fusion"]
        direction TB
        W["<b>Weighted</b><br/><code>wV·cos + wB·bm25 + wN·ngram</code><br/><small>default: 0.7 / 0.2 / 0.1</small>"]
        R["<b>RRF</b><br/><code>Σ 1/(k + rank_i)</code><br/><small>reciprocal rank fusion</small>"]
    end

    V --> Fusion
    B --> Fusion
    N --> Fusion

    Fusion --> Out["Fused Score<br/><small>used for ranking<br/>and threshold filtering</small>"]

    style V fill:#457b9d,color:#fff
    style B fill:#e9c46a,color:#000
    style N fill:#f4a261,color:#000
    style W fill:#2d6a4f,color:#fff
    style R fill:#2d6a4f,color:#fff
    style Out fill:#e76f51,color:#fff

Group-Level Memory Sharing

flowchart TD
    subgraph Access["Visibility Levels"]
        direction LR
        U["🔒 <b>user</b><br/>Owner only"]
        G["👥 <b>group</b><br/>Same GroupID members"]
        P["🌐 <b>public</b><br/>Any user"]
    end

    subgraph Filter["Milvus Filter Expression"]
        F["<code>(user_id == 'alice')</code><br/><code>OR</code><br/><code>(group_id IN ['team-backend']</code><br/><code> AND visibility IN ['group','public'])</code>"]
    end

    Access --> Filter

    style U fill:#264653,color:#fff
    style G fill:#2a9d8f,color:#fff
    style P fill:#e9c46a,color:#000
    style F fill:#1b263b,color:#fff

Configuration

# Per-decision plugin config (in decisions[].plugins[])
- type: "memory"
  configuration:
    enabled: true
    retrieval_limit: 10          # max memories to inject
    similarity_threshold: 0.30   # minimum score cutoff
    auto_store: true             # extract facts from conversations
    hierarchical_search: true    # two-phase category → drill-down
    max_depth: 3                 # max tree depth to traverse
    hybrid_search: true          # BM25 + n-gram fusion
    hybrid_mode: "weighted"      # "weighted" or "rrf"
    follow_links: true           # graph expansion via RelatedIDs cross-links
    max_link_depth: 1            # hops to follow (1 = direct links only)

Key Source Files

File	Role
`pkg/memory/types.go`	`Memory` struct: `ParentID`, `IsCategory`, `Abstract`, `Overview`, `Visibility`, `RelatedIDs`
`pkg/memory/hierarchical_retrieve.go`	Two-phase search: category scan → drill-down with score propagation + graph expansion via `expandViaLinks`
`pkg/memory/hybrid_score.go`	`MemBM25Index`, `MemNgramIndex`, `MemHybridScorer` — score fusion
`pkg/memory/inmemory_hierarchical.go`	In-memory `HierarchicalStore` implementation
`pkg/memory/milvus_hierarchical.go`	Milvus-backed `HierarchicalStore` implementation
`pkg/memory/extractor.go`	LLM-based fact extraction + deduplication
`pkg/memory/categorizer.go`	Topic extraction, abstract/overview generation, parent assignment
`pkg/extproc/processor_req_body.go`	Wires retrieval into ExtProc pipeline, injects memories
`pkg/extproc/req_filter_memory.go`	Query rewriting, hybrid config builder, memory formatting
`pkg/config/config.go`	`MemoryPluginConfig` with hierarchical + hybrid fields

Evaluation Results

Three-Way Comparison: Flat vs Hierarchical vs Hierarchical+Hybrid

Dataset: 30 memories across 6 topic clusters (deployment, memory, safety, rag, architecture, evaluation), with one query per cluster. Retrieval at k=5, threshold 0.30.

go test -v -tags milvus -run TestHybridHierarchical_ThreeWayComparison ./pkg/memory/

Per-Query Precision@5

Query	Cluster	Flat P@5	Hier-Cos P@5	Hier-Hybrid P@5
Kubernetes deployment pipeline with Helm...	deployment	0.60	0.60	0.60
Memory retrieval and retention scoring...	memory	0.20	0.20	0.40
Jailbreak detection and PII safety guardrails...	safety	0.80	0.80	0.80
Hybrid RAG search combine vector similarity, BM25...	rag	0.40	0.40	0.40
ExtProc signal engine architecture route requests...	architecture	0.80	0.80	0.80
Metrics in the end-to-end evaluation and benchmark...	evaluation	0.40	0.40	0.40

Averages

Method	Avg P@5	Avg R@5	Avg Purity
Flat (cosine)	0.5333	0.5333	0.5333
Hier (cosine)	0.5333	0.5333	0.5333
Hier (hybrid)	0.5667	0.5667	0.5667

Deltas

DELTA Precision:
  hier-cosine vs flat:      +0.0000  (+0.0%)
  hier-hybrid vs flat:      +0.0333  (+6.2%)
  hier-hybrid vs hier-cos:  +0.0333  (+6.2%)

DELTA Recall:
  hier-cosine vs flat:      +0.0000  (+0.0%)
  hier-hybrid vs flat:      +0.0333  (+6.2%)
  hier-hybrid vs hier-cos:  +0.0333  (+6.2%)

DELTA Purity:
  hier-cosine vs flat:      +0.0000  (+0.0%)
  hier-hybrid vs flat:      +0.0333  (+6.2%)
  hier-hybrid vs hier-cos:  +0.0333  (+6.2%)

Weight Sweep: Effect of BM25 and N-gram Weight

go test -v -tags milvus -run TestHybridHierarchical_WeightSweep ./pkg/memory/

Weights	deployment	memory	safety	rag	architecture	evaluation	Avg P@K
pure-cosine (nil)	0.60	0.20	0.80	0.40	0.80	0.40	0.5333
v=1.0 b=0.0 n=0.0	0.60	0.20	0.80	0.40	0.80	0.40	0.5333
v=0.8 b=0.1 n=0.1	0.60	0.40	0.80	0.40	0.80	0.40	0.5667
v=0.7 b=0.2 n=0.1	0.60	0.40	0.80	0.40	0.80	0.40	0.5667
v=0.6 b=0.3 n=0.1	0.60	0.40	0.60	0.40	0.80	0.40	0.5333
v=0.5 b=0.3 n=0.2	0.60	0.40	0.60	0.40	0.80	0.40	0.5333
v=0.5 b=0.5 n=0.0	0.60	0.40	0.60	0.40	0.80	0.40	0.5333
v=0.4 b=0.4 n=0.2	0.60	0.60	0.60	0.40	0.80	0.40	0.5667
rrf (default)	0.60	0.40	0.80	0.40	0.80	0.40	0.5667

Hybrid Score Unit Test

Validates that BM25/n-gram fusion correctly boosts documents with exact keyword overlap.

Query: "Helm charts Kubernetes deployment"

Doc	Content	Cosine	Fused	Delta
A	Kubernetes/Helm (exact terms)	0.800	0.767	-0.033
B	BM25 text (partial terms)	0.750	0.531	-0.219
C	cat/mat (no terms)	0.700	0.491	-0.209

Doc A (matching keywords) retains the highest fused score; docs without relevant terms are penalized.

E2E Integration Test

make test-retrieval-api    # 10/10 passed

Seeds 5 topic memories (technology, cooking, travel, sports, music) through the full Envoy → ExtProc → LLM extraction → Milvus pipeline, then verifies retrieval in new sessions:

Phase	Tests	Passed	What it validates
Phase 1: Seed	5	5	Messages accepted through /v1/responses
Phase 2: Storage	1	1	Memories extracted and stored in Milvus
Phase 3: Semantic retrieval	5	5	Queries in new sessions retrieve relevant memories (keywords appear only via injection)
Phase 4: Hybrid keyword	4	4	BM25 boosts exact-match queries

Cross-Document Link Expansion: Four-Way Strategy Comparison

go test -v -run TestRelatedIDs_CrossCategoryComparison ./pkg/memory/

Memories are organized across 4 categories (DevOps, Finance, ML, Compliance). Two cross-domain links are created via RelatedIDs:

DevOps "Helm charts deployment" ↔ Finance "quarterly spend allocation" (zero vocabulary overlap)
ML "GPU distributed training" ↔ Compliance "GDPR data retention" (zero vocabulary overlap)

Four retrieval strategies are tested against 2 queries designed to find the direct match AND the linked cross-domain memory:

Strategy	Algorithm	Direct Match	Cross-Category Linked
Tree-Cosine	hierarchical tree traversal, cosine scoring	2/2	0/2 (0%)
Tree-Hybrid	hierarchical tree traversal, BM25 + n-gram + cosine	2/2	0/2 (0%)
Tree-Cosine + Links	tree-cosine + RelatedIDs graph expansion	2/2	2/2 (100%)
Tree-Hybrid + Links	tree-hybrid + RelatedIDs graph expansion	2/2	1/2 (50%)

Key findings:

Tree-Cosine (similar to LLM-based tree traversal approaches): Drills into the DevOps subtree and finds helm-deploy. Cannot reach the Finance subtree because there is no semantic path between "Kubernetes Helm charts" and "quarterly spend allocation."
Tree-Hybrid (similar to hybrid dense+sparse search approaches): Even with BM25 and n-gram matching on top of cosine, there are zero shared keywords between the DevOps query and the Finance memory. Hybrid scoring cannot bridge vocabulary-disjoint domains.
Tree-Cosine + Links: After finding helm-deploy, follows its RelatedIDs to fetch finance-budget, scores it via embedding cosine similarity (0.826 blended), and adds it to results. 100% cross-category recall.
Tree-Hybrid + Links: The referrer's propagated score is lower under hybrid scoring (category nodes score lower on BM25), reducing the blended link score. Still finds 1/2 linked memories — a known tradeoff where hybrid's keyword penalty on category propagation reduces the referrer's contribution to link blending.

TestFollowLinks_MultiHop: Chain of 3 memories linked a → b → c with decreasing semantic similarity to the query. With MaxLinkDepth=1, only a and b are found. With MaxLinkDepth=2, all three are found through two hops of traversal.

Related Work Context

The cross-document linking problem is well-studied in recent research:

Cross-partition KG linking (BridgeRAG, ICLR 2026): Uses shared named entities as conduits between document-level knowledge graphs. Requires NER and entity resolution pipelines.
Hierarchical Lexical Graph (HLG, KDD 2025): Three-tier index with entity-relationship links across documents. Achieves +23.1% recall over chunk-based RAG. Requires proposition extraction.
Heterogeneous multi-store fusion (HetaRAG): Routes queries across vector, KG, full-text, and SQL stores. Cross-document recall comes from combining modalities.

Our RelatedIDs approach is lightweight by comparison — no NER, no entity resolution, no proposition extraction. Links are explicit metadata that can be set by the application, an LLM, or a human. The four-way test above demonstrates that this simple mechanism bridges the cross-category gap that neither tree traversal nor hybrid search can close on their own.

Interpretation

Hierarchical structure alone does not change results on a small, well-embedded dataset — the category drill-down converges to the same results as flat search.
Adding hybrid scoring (BM25 + n-gram) provides a measurable +6.2% improvement by boosting documents that share exact terms with the query — particularly for queries where semantic similarity alone is ambiguous (the "memory" cluster precision doubled from 0.20 to 0.40).
Optimal weights: v=0.7 b=0.2 n=0.1 or RRF improve weak clusters without degrading strong ones. Over-weighting BM25 (≥0.3) hurts clusters where keyword overlap is misleading.
Graph expansion (follow_links: true) discovers cross-category memories that no tree-only or hybrid-only strategy can find. When the linked memory shares zero vocabulary with the query, only the explicit RelatedIDs link provides a retrieval path. Linked memories are scored with embedding cosine similarity (not hybrid — BM25/n-gram would penalize the cross-domain vocabulary gap), blended with the referrer's score as the primary relevance signal. Multi-hop traversal (max_link_depth: 2+) extends reach along relation chains.
The E2E test confirms the full pipeline works end-to-end: extraction, storage, hierarchical retrieval, hybrid scoring, and memory injection into the LLM system prompt all function correctly through the Envoy ExtProc pipeline.

Signed-off-by: Huamin Chen <hchen@redhat.com>

netlify · 2026-02-23T14:27:45Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`bd4fe39`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/699cc4720a9f870008d8e51f
😎 Deploy Preview	https://deploy-preview-1374--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2026-02-23T14:28:06Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `config`

Owners: @rootfs, @Xunzhuo
Files changed:

config/testing/config.memory-hierarchical.yaml
config/testing/envoy-retrieval-test.yaml

📁 `e2e`

Owners: @Xunzhuo
Files changed:

e2e/testing/mock-vllm-echo.py

📁 `Root Directory`

Owners: @rootfs, @Xunzhuo
Files changed:

scripts/test-retrieval-api.sh

📁 `src`

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

src/semantic-router/pkg/config/config.go
src/semantic-router/pkg/config/config_test.go
src/semantic-router/pkg/extproc/processor_req_body.go
src/semantic-router/pkg/extproc/req_filter_memory.go
src/semantic-router/pkg/extproc/req_filter_memory_test.go
src/semantic-router/pkg/memory/categorizer.go
src/semantic-router/pkg/memory/hierarchical_benchmark_test.go
src/semantic-router/pkg/memory/hierarchical_comparison_test.go
src/semantic-router/pkg/memory/hierarchical_retrieve.go
src/semantic-router/pkg/memory/hierarchical_test.go
src/semantic-router/pkg/memory/hybrid_hierarchical_comparison_test.go
src/semantic-router/pkg/memory/hybrid_score.go
src/semantic-router/pkg/memory/inmemory_hierarchical.go
src/semantic-router/pkg/memory/inmemory_store.go
src/semantic-router/pkg/memory/milvus_hierarchical.go
src/semantic-router/pkg/memory/milvus_store.go
src/semantic-router/pkg/memory/relations.go
src/semantic-router/pkg/memory/store.go
src/semantic-router/pkg/memory/testdata/evaluation_dataset.json
src/semantic-router/pkg/memory/testdata/evaluation_source.json
src/semantic-router/pkg/memory/testdata/generate_source_dataset.go
src/semantic-router/pkg/memory/types.go

📁 `tools`

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

tools/make/build-run-test.mk

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Copilot

Pull request overview

This PR implements a hierarchical memory retrieval system with hybrid scoring for the semantic router, enabling multi-tier memory organization with category-based search and BM25/n-gram/vector score fusion.

Changes:

Adds hierarchical memory structure with category nodes, parent-child relationships, and multi-level summaries (L0 abstract, L1 overview, L2 content)
Implements hybrid scoring that combines vector similarity, BM25 keyword matching, and character n-gram Jaccard similarity
Adds group-level memory sharing with visibility controls (user/group/public) and cross-memory relations

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tools/make/build-run-test.mk	Adds makefile targets for running hierarchical memory tests
src/semantic-router/pkg/memory/types.go	Defines hierarchical memory types including visibility, relations, and hybrid config
src/semantic-router/pkg/memory/testdata/generate_source_dataset.go	Generator script for source code evaluation dataset
src/semantic-router/pkg/memory/testdata/evaluation_dataset.json	Evaluation dataset with 30 memories across 6 clusters
src/semantic-router/pkg/memory/store.go	Adds HierarchicalStore interface and type assertions
src/semantic-router/pkg/memory/relations.go	Implements automatic bidirectional memory relation linking
src/semantic-router/pkg/memory/milvus_store.go	Updates Milvus schema with hierarchical fields
src/semantic-router/pkg/memory/milvus_hierarchical.go	Milvus implementation of hierarchical retrieval
src/semantic-router/pkg/memory/inmemory_store.go	Adds relations map to in-memory store
src/semantic-router/pkg/memory/inmemory_hierarchical.go	In-memory implementation of hierarchical retrieval
src/semantic-router/pkg/memory/hybrid_score.go	BM25, n-gram indexing, and score fusion logic
src/semantic-router/pkg/memory/hybrid_hierarchical_comparison_test.go	Three-way comparison tests showing +6.2% improvement
src/semantic-router/pkg/memory/hierarchical_test.go	Unit tests for hierarchical retrieval components
src/semantic-router/pkg/memory/hierarchical_retrieve.go	Generic two-phase hierarchical retrieval algorithm
src/semantic-router/pkg/memory/hierarchical_comparison_test.go	Precision/recall comparison tests
src/semantic-router/pkg/memory/hierarchical_benchmark_test.go	Performance benchmarks for retrieval operations
src/semantic-router/pkg/memory/categorizer.go	Auto-categorization and summary generation logic
src/semantic-router/pkg/extproc/req_filter_memory_test.go	Tests for hybrid config wiring
src/semantic-router/pkg/extproc/req_filter_memory.go	Integrates hybrid scoring into request filter
src/semantic-router/pkg/extproc/processor_req_body.go	Adds hierarchical/group retrieval to request processor
src/semantic-router/pkg/config/config_test.go	Tests for new hierarchical/hybrid config fields
src/semantic-router/pkg/config/config.go	Adds hierarchical and hybrid config fields
scripts/test-retrieval-api.sh	End-to-end test script for hierarchical retrieval
e2e/testing/mock-vllm-echo.py	Echo mock for memory injection verification
config/testing/envoy-retrieval-test.yaml	Envoy config for retrieval tests
config/testing/config.memory-hierarchical.yaml	Router config for hierarchical testing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-23T14:33:46Z

 	MemoryTypeSemantic MemoryType = "semantic"

 	// MemoryTypeProcedural represents instructions, how-to, steps.
-	// Example: "To deploy payment-service: run npm build, then docker push"
 	MemoryTypeProcedural MemoryType = "procedural"

 	// MemoryTypeEpisodic represents session summaries, past events.
-	// Example: "On Dec 29 2024, user planned Hawaii vacation with $10K budget"
 	MemoryTypeEpisodic MemoryType = "episodic"


The example comments for these memory types were removed. Consider restoring brief inline examples to help developers understand the difference between semantic, procedural, and episodic memory types.

Copilot · 2026-02-23T14:33:46Z

 	schema := &entity.Schema{
 		CollectionName: m.collectionName,
-		Description:    "Agentic Memory storage for cross-session context",
+		Description:    "Agentic Memory storage with hierarchical organization and group sharing",


Corrected spelling of 'Agentic' to 'Agentive' or 'Agent-based'. 'Agentic' is not a standard English term in technical contexts.

Copilot · 2026-02-23T14:33:46Z

+SKIP_SEED="${SKIP_SEED:-0}"
+EXTRACTION_WAIT="${EXTRACTION_WAIT:-12}"
+
+USER_ID="retrieval_test_$(date +%s)"


The USER_ID uses epoch timestamp which may collide if tests run in parallel within the same second. Consider adding a random suffix or using date +%s%N for nanosecond precision.

Suggested change

USER_ID="retrieval_test_$(date +%s)"

USER_ID="retrieval_test_$(date +%s%N)"

Signed-off-by: Huamin Chen <hchen@redhat.com>

Copilot

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-23T21:24:24Z

+// FusedScore computes the hybrid score for a single memory given its cosine
+// similarity and the query string. Returns the fused score.
+func (s *MemHybridScorer) FusedScore(memID string, cosineSim float32, query string) float32 {
+	bm25Scores := s.bm25.Score(query, s.cfg.BM25K1, s.cfg.BM25B)
+	ngramScores := s.ngram.Score(query)
+	return s.fuseOne(memID, cosineSim, bm25Scores, ngramScores)
+}


MemHybridScorer.FusedScore recomputes BM25 and n-gram scores for the entire corpus on every call, making callers that score many memories (e.g., hierarchical retrieval loops) effectively O(N²) per query. Cache per-query score maps inside MemHybridScorer or use the existing FusedScores batch API in callers to compute BM25/ngram once per query.

Copilot · 2026-02-23T21:24:24Z

+func BuildGroupFilter(userID string, groupIDs []string, includeGroup bool) string {
+	if !includeGroup || len(groupIDs) == 0 {
+		return fmt.Sprintf("user_id == \"%s\"", userID)
+	}


BuildGroupFilter drops public memories when includeGroup=true but groupIDs is empty because it falls back to user_id == .... That conflicts with the VisibilityPublic semantics ("any user") and with InMemoryStore.passesAccessFilter (which admits public when IncludeGroupLevel is true). Consider adding an explicit || visibility == "public" clause when includeGroup is enabled, regardless of groupIDs.

Copilot · 2026-02-23T21:24:24Z

+
+	// CrossGroup allows linking memories across group boundaries when both are
+	// visible to each other (group or public visibility).
+	CrossGroup bool


AutoLinkOptions.CrossGroup is currently unused (AutoLinkNewMemory always searches with RetrieveOptions{UserID: newMem.UserID} and never broadens scope/filters for cross-group visibility). Either implement the cross-group retrieval logic or remove the option to avoid a misleading configuration knob.

Suggested change

// CrossGroup allows linking memories across group boundaries when both are

// visible to each other (group or public visibility).

CrossGroup bool

Copilot · 2026-02-23T21:24:25Z

+// +build ignore
+


This generator uses the legacy // +build ignore tag only. For Go 1.17+ compatibility and go vet/tooling, add the new build constraint form too (i.e., //go:build ignore plus the existing // +build ignore).

Copilot · 2026-02-23T21:24:25Z

+	// Define schema for agentic memory (v2: includes hierarchical + group fields)
 	schema := &entity.Schema{
 		CollectionName: m.collectionName,
-		Description:    "Agentic Memory storage for cross-session context",
+		Description:    "Agentic Memory storage with hierarchical organization and group sharing",
 		AutoID:         false,


ensureCollection now defines a v2 schema with new hierarchical fields, but when the collection already exists the function returns without validating/migrating the schema. Upgrades against an existing collection can silently miss required fields (group_id/parent_id/is_category/visibility/abstract), breaking hierarchical retrieval. Consider a versioned collection name, a schema check with a clear error, or an explicit migration path.

Copilot · 2026-02-23T21:24:25Z

+	s.mu.RLock()
+	defer s.mu.RUnlock()
+
+	queryEmbedding, err := GenerateEmbedding(opts.Query, s.embeddingConfig)


HierarchicalRetrieveWithConfig always generates queryEmbedding using s.embeddingConfig, and link expansion reuses that embedding. opts.LinkEmbeddingConfig is never honored in the InMemory hierarchical implementation, so callers can’t control the embedding model used for scoring linked memories as advertised by HierarchicalRetrieveOptions. Consider selecting the embedding config based on opts.LinkEmbeddingConfig when FollowLinks is enabled (or removing the option for in-memory).

Suggested change

queryEmbedding, err := GenerateEmbedding(opts.Query, s.embeddingConfig)

embeddingConfig := s.embeddingConfig

if opts.FollowLinks {

embeddingConfig = opts.LinkEmbeddingConfig

}

queryEmbedding, err := GenerateEmbedding(opts.Query, embeddingConfig)

Copilot · 2026-02-23T21:24:25Z

+		metadata["overview"] = memory.Overview
+	}
 	metadataJSON, err := json.Marshal(metadata)
 	if err != nil {
 		return fmt.Errorf("failed to marshal metadata: %w", err)
 	}



Relations won’t persist in Milvus: StoreRelation/appendRelatedID only updates Memory.RelatedIDs and then calls upsert, but Store/upsert don’t serialize RelatedIDs into any Milvus column (metadata JSON also omits related_ids). Either add a dedicated RelatedIDs field in the collection schema or include RelatedIDs in the metadata JSON and make Get/Retrieve populate it consistently.

Copilot · 2026-02-23T21:24:26Z

+func (m *MilvusStore) HierarchicalRetrieveWithConfig(ctx context.Context, opts HierarchicalRetrieveOptions, cfg MilvusHierarchicalConfig) ([]*RetrieveResult, error) {
+	if !m.enabled {
+		return nil, fmt.Errorf("milvus store is not enabled")
+	}
+
+	opts.ApplyDefaults()
+	cfg.ApplyDefaults()
+
+	limit := opts.Limit
+	if limit <= 0 {
+		limit = m.config.DefaultRetrievalLimit
+	}
+	threshold := opts.Threshold
+	if threshold <= 0 {
+		threshold = m.config.DefaultSimilarityThreshold
+	}
+
+	if opts.Query == "" {
+		return nil, fmt.Errorf("query is required")
+	}
+	if opts.UserID == "" && !opts.IncludeGroupLevel {
+		return nil, fmt.Errorf("user id or group ids required")
+	}
+
+	logging.Debugf("MilvusStore.HierarchicalRetrieve: query='%s', user_id='%s', groups=%v, limit=%d",
+		truncateForLog(opts.Query, 60), opts.UserID, opts.GroupIDs, limit)
+
+	queryEmbedding, err := GenerateEmbedding(opts.Query, m.embeddingConfig)
+	if err != nil {
+		return nil, fmt.Errorf("failed to generate embedding: %w", err)
+	}
+
+	baseFilter := BuildGroupFilter(opts.UserID, opts.GroupIDs, opts.IncludeGroupLevel)
+
+	if len(opts.Types) > 0 {
+		typeFilter := "("
+		for i, memType := range opts.Types {
+			if i > 0 {


MilvusStore.HierarchicalRetrieveWithConfig ignores key options (opts.Hybrid and opts.FollowLinks/MaxLinkDepth/LinkEmbeddingConfig). Because MilvusStore implements HierarchicalStore, callers will take this path and never get hybrid fusion or link expansion in production. Either implement these features here (or explicitly reject them with an error) so behavior matches InMemory/generic retrieval and the plugin config wiring.

yehuditkerido · 2026-02-25T12:26:57Z

1. Graph Concepts Without Neo4j

You're implementing graph concepts here without the complexity of running a new database like Neo4j. Maybe we'll need to implement that in the future for multi-hop visualization or other graph benefits, but for now this is a great improvement.

Note: Tracking issue #1293 has Neo4j as P3.

2. Retention Scoring Integration

Future work mentioned in tracking issue #1293 is adding quality feedback to distinguish "accessed often" from "actually useful". That's complementary and doesn't need to block this PR.

3. Memory Type Routing

The Types field implementation looks good. This addresses the "Memory type routing" item from tracking issue #1293.

Critical Issues

1. Category Creation Race Condition

Location: categorizer.go:1246-1253

There's a race here where two concurrent requests can both create the same category, causing Milvus primary key violations.

This will cause production failures under load.

Related: This is the "Concurrency handling" item from tracking issue #1293 (currently P2). For now I think adding lock for category creation will solve the issue.

2. Category Pruning Edge Case

Categories are naturally protected (broad semantic match → high access count).
But what's the policy if one IS deleted? Children have ParentID pointing to
deleted node. Worth documenting the intended behavior.

Performance Question

Does the new retrieval methods adds latency? Any chance it can get significant for 2 phases retrieval?

This relates to "Load testing at scale" in tracking issue #1293.

rootfs · 2026-02-26T03:09:04Z

@yehudit1987 yes adding additional memory management is a good point. We can break it down into two approaches:

Router self service: router manages all chat history and injection.
Memory management sidecar. The chat history or external knowledge can be managed by a sidecar and router just retrieves them.

I feel the 2nd approach is more scalable, especially the vector, file, and memory are unified in #1383. In that way, we can build more memory structure like episodic, graph, etc. This sidecar can also be exposed as a claude skill too.

Huamin Chen added 2 commits February 22, 2026 21:18

feat: add hierachical memory search

cbb3de4

Signed-off-by: Huamin Chen <hchen@redhat.com>

add hierachical tests

18ad117

Signed-off-by: Huamin Chen <hchen@redhat.com>

rootfs requested a review from Xunzhuo as a code owner February 23, 2026 14:27

rootfs marked this pull request as draft February 23, 2026 14:27

github-actions Bot assigned rootfs, wangchen615, Xunzhuo and yuluo-yx Feb 23, 2026

rootfs requested review from Copilot, yehuditkerido and yossiovadia February 23, 2026 14:28

Copilot AI reviewed Feb 23, 2026

View reviewed changes

Copilot started reviewing on behalf of rootfs February 23, 2026 14:40 View session

Huamin Chen added 2 commits February 23, 2026 20:40

lint

b6df86e

Signed-off-by: Huamin Chen <hchen@redhat.com>

wire the cross link search

36608d7

Signed-off-by: Huamin Chen <hchen@redhat.com>

rootfs requested a review from Copilot February 23, 2026 21:14

Copilot started reviewing on behalf of rootfs February 23, 2026 21:15 View session

persist relation id

bd4fe39

Signed-off-by: Huamin Chen <hchen@redhat.com>

Copilot AI reviewed Feb 23, 2026

View reviewed changes

	USER_ID="retrieval_test_$(date +%s)"
	USER_ID="retrieval_test_$(date +%s%N)"

-	queryEmbedding, err := GenerateEmbedding(opts.Query, s.embeddingConfig)
+	embeddingConfig := s.embeddingConfig
+	if opts.FollowLinks {
+		embeddingConfig = opts.LinkEmbeddingConfig
+	}
+	queryEmbedding, err := GenerateEmbedding(opts.Query, embeddingConfig)

Conversation

rootfs commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hierarchical Memory with Hybrid Retrieval

End-to-End Pipeline

Hierarchical Memory Tree

Multi-Tier Summaries

Memory Storage Pipeline

Two-Phase Hierarchical Retrieval

Hybrid Scoring (applied at each phase)

Group-Level Memory Sharing

Configuration

Key Source Files

Evaluation Results

Three-Way Comparison: Flat vs Hierarchical vs Hierarchical+Hybrid

Per-Query Precision@5

Averages

Deltas

Weight Sweep: Effect of BM25 and N-gram Weight

Hybrid Score Unit Test

E2E Integration Test

Cross-Document Link Expansion: Four-Way Strategy Comparison

Related Work Context

Interpretation

Uh oh!

netlify Bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions Bot commented Feb 23, 2026

👥 vLLM Semantic Team Notification

📁 config

📁 e2e

📁 Root Directory

📁 src

📁 tools

🎉 Thanks for your contributions!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

rootfs commented Feb 23, 2026 •

edited

Loading

netlify Bot commented Feb 23, 2026 •

edited

Loading

📁 `config`

📁 `e2e`

📁 `Root Directory`

📁 `src`

📁 `tools`

yehuditkerido commented Feb 25, 2026 •

edited

Loading