Skip to content

feat: hierarchical memory retrieval#1374

Draft
rootfs wants to merge 5 commits intovllm-project:mainfrom
rootfs:memory-hier
Draft

feat: hierarchical memory retrieval#1374
rootfs wants to merge 5 commits intovllm-project:mainfrom
rootfs:memory-hier

Conversation

@rootfs
Copy link
Copy Markdown
Collaborator

@rootfs rootfs commented Feb 23, 2026

Hierarchical Memory with Hybrid Retrieval

End-to-End Pipeline

flowchart LR
    subgraph Request["Request Path"]
        direction TB
        A[User Message] --> B[ExtProc: Request Body]
        B --> C[Query Rewrite<br/><i>LLM call</i>]
        C --> D[Memory Retrieval<br/><i>hierarchical + hybrid</i>]
        D --> E[Inject into<br/>System Prompt]
        E --> F[Route to LLM]
    end

    subgraph Response["Response Path"]
        direction TB
        G[LLM Response] --> H[ExtProc: Response Body]
        H --> I[Memory Extraction<br/><i>async, LLM call</i>]
        I --> J[Deduplication]
        J --> K[Categorize +<br/>Generate Embedding]
        K --> L[(Milvus Store)]
    end

    Request --> Response

    style D fill:#2d6a4f,color:#fff
    style I fill:#d4a373,color:#000
    style L fill:#264653,color:#fff
Loading

Hierarchical Memory Tree

graph TD
    Root["👤 User Memory Space"]

    Root --> Cat1["📁 Programming<br/><small>IsCategory=true</small><br/><small>L0 abstract: <i>Rust, Go, systems</i></small>"]
    Root --> Cat2["📁 Cooking<br/><small>IsCategory=true</small><br/><small>L0 abstract: <i>Italian, pasta, herbs</i></small>"]
    Root --> Cat3["📁 Travel<br/><small>IsCategory=true</small><br/><small>L0 abstract: <i>Japan, Asia</i></small>"]

    Cat1 --> Leaf1["📝 Rust facts<br/><small>L2: User learns Rust,<br/>uses cargo, likes borrow checker</small>"]
    Cat1 --> Leaf2["📝 Go facts<br/><small>L2: User deploys Go<br/>microservices on K8s</small>"]

    Cat2 --> Leaf3["📝 Pesto recipe<br/><small>L2: Signature dish is<br/>pesto pasta, every Friday</small>"]
    Cat2 --> Leaf4["📝 Bread baking<br/><small>L2: Bakes sourdough<br/>on weekends</small>"]

    Cat3 --> Leaf5["📝 Tokyo trip<br/><small>L2: Visited Shibuya,<br/>Tsukiji market</small>"]
    Cat3 --> Leaf6["📝 Kyoto trip<br/><small>L2: Visited temples,<br/>bamboo forest</small>"]

    Leaf1 -. "RelatedIDs<br/>(cross-link)" .-> Leaf5

    style Root fill:#1b263b,color:#fff
    style Cat1 fill:#2d6a4f,color:#fff
    style Cat2 fill:#2d6a4f,color:#fff
    style Cat3 fill:#2d6a4f,color:#fff
    style Leaf1 fill:#457b9d,color:#fff
    style Leaf2 fill:#457b9d,color:#fff
    style Leaf3 fill:#457b9d,color:#fff
    style Leaf4 fill:#457b9d,color:#fff
    style Leaf5 fill:#457b9d,color:#fff
    style Leaf6 fill:#457b9d,color:#fff
Loading

Multi-Tier Summaries

graph LR
    L0["<b>L0: Abstract</b><br/>Short phrase<br/><i>Fast candidate scoring</i>"] --> L1["<b>L1: Overview</b><br/>Paragraph<br/><i>Reranking & navigation</i>"] --> L2["<b>L2: Content</b><br/>Full detail<br/><i>Injected into LLM context</i>"]

    style L0 fill:#e9c46a,color:#000
    style L1 fill:#f4a261,color:#000
    style L2 fill:#e76f51,color:#fff
Loading

Memory Storage Pipeline

flowchart TD
    A["Conversation Turn<br/>(user + assistant messages)"] --> B["MemoryExtractor.ProcessResponse()"]

    B --> C["Build extraction prompt"]
    C --> D["LLM Call<br/><i>external_models: memory_extraction</i>"]
    D --> E["Parse JSON facts<br/><code>[]ExtractedFact</code>"]

    E --> F{"Similar memory<br/>already exists?"}
    F -- "Yes (score > 0.9)" --> G["Update existing<br/>memory"]
    F -- "No" --> H["Create new memory"]

    H --> I["extractTopic()<br/><i>keyword-based categorization</i>"]
    I --> J["Find or create<br/>category node"]
    J --> K["Set ParentID,<br/>Abstract (L0),<br/>Overview (L1)"]
    K --> L["GenerateEmbedding()<br/><i>BERT model</i>"]
    L --> M[("Store in Milvus<br/><small>content, embedding,<br/>user_id, parent_id,<br/>is_category, group_id,<br/>visibility</small>")]

    style B fill:#d4a373,color:#000
    style D fill:#bc6c25,color:#fff
    style M fill:#264653,color:#fff
Loading

Two-Phase Hierarchical Retrieval

flowchart TD
    Q["User Query"] --> QR["Query Rewrite (optional)<br/><i>LLM call via memory_rewrite model</i>"]

    QR --> P1

    subgraph P1["PHASE 1 — Broad Category Search"]
        direction TB
        S1["Milvus vector search<br/><small>threshold × 0.8 (relaxed)</small><br/><small>limit = max(categorySearchTopK, limit×4)</small>"]
        S1 --> Split{"IsCategory?"}
        Split -- "true" --> CatQ["Category nodes → <b>Priority Queue</b><br/><small>seeded if score ≥ threshold × 0.8</small>"]
        Split -- "false" --> Leaves["Leaf memories → <b>Collected</b><br/><small>if score ≥ threshold</small>"]
    end

    P1 --> P2

    subgraph P2["PHASE 2 — Drill-Down with Score Propagation"]
        direction TB
        Pop["Pop top-scoring category<br/>from priority queue"] --> Search["Search children<br/><small>where ParentID == category.ID</small>"]
        Search --> ChildType{"Child type?"}
        ChildType -- "Category" --> Push["Push to priority queue<br/><small>with propagated score</small>"]
        ChildType -- "Leaf" --> Prop["Score Propagation:<br/><code>α·child + (1-α)·parent</code>"]
        Prop --> Thresh{"score ≥<br/>threshold?"}
        Thresh -- "Yes" --> Collect["Add to collected results"]
        Thresh -- "No" --> Discard["Discard"]
        Push --> Conv{"Top-K set<br/>unchanged for<br/>3 rounds?"}
        Collect --> Conv
        Conv -- "No" --> Pop
        Conv -- "Yes" --> Done["Convergence → stop"]
    end

    P2 --> TopK["Sort collected by score → Top-K"]

    TopK --> LinkExp

    subgraph LinkExp["PHASE 3 — Graph Expansion (optional, follow_links: true)"]
        direction TB
        Scan["For each result, follow<br/><b>RelatedIDs</b> cross-links"] --> Fetch["Fetch linked memory<br/><small>store.Get(linkedID)</small>"]
        Fetch --> Score["Score with same pipeline:<br/><code>cosineSim(queryEmb, linked.Emb)</code><br/>+ hybrid fusion if enabled"]
        Score --> Blend["Blend:<br/><code>referrer.Score × 0.8 + directScore × 0.2</code>"]
        Blend --> LinkThresh{"blended ≥<br/>threshold?"}
        LinkThresh -- "Yes" --> LinkAdd["Add to results<br/><small>+ push to next-hop frontier</small>"]
        LinkThresh -- "No" --> LinkSkip["Skip"]
        LinkAdd --> Hop{"more hops?<br/><small>(up to MaxLinkDepth)</small>"}
        Hop -- "Yes" --> Scan
        Hop -- "No" --> LinkDone["Re-sort + trim to Top-K"]
    end

    LinkExp --> Inject["Format as system prompt context<br/><b>## User's Relevant Context</b>"]

    style TopK fill:#2d6a4f,color:#fff
    style Inject fill:#e76f51,color:#fff
    style LinkExp fill:none
Loading

Hybrid Scoring (applied at each phase)

flowchart LR
    subgraph Signals["Three Scoring Signals"]
        direction TB
        V["🔢 Vector Cosine<br/><small>embedding similarity<br/>from Milvus ANN search</small>"]
        B["📖 BM25 Keyword<br/><small>TF-IDF term matching<br/>(MemBM25Index)</small>"]
        N["🔤 N-gram Jaccard<br/><small>character n-gram overlap<br/>(MemNgramIndex)</small>"]
    end

    subgraph Fusion["Score Fusion"]
        direction TB
        W["<b>Weighted</b><br/><code>wV·cos + wB·bm25 + wN·ngram</code><br/><small>default: 0.7 / 0.2 / 0.1</small>"]
        R["<b>RRF</b><br/><code>Σ 1/(k + rank_i)</code><br/><small>reciprocal rank fusion</small>"]
    end

    V --> Fusion
    B --> Fusion
    N --> Fusion

    Fusion --> Out["Fused Score<br/><small>used for ranking<br/>and threshold filtering</small>"]

    style V fill:#457b9d,color:#fff
    style B fill:#e9c46a,color:#000
    style N fill:#f4a261,color:#000
    style W fill:#2d6a4f,color:#fff
    style R fill:#2d6a4f,color:#fff
    style Out fill:#e76f51,color:#fff
Loading

Group-Level Memory Sharing

flowchart TD
    subgraph Access["Visibility Levels"]
        direction LR
        U["🔒 <b>user</b><br/>Owner only"]
        G["👥 <b>group</b><br/>Same GroupID members"]
        P["🌐 <b>public</b><br/>Any user"]
    end

    subgraph Filter["Milvus Filter Expression"]
        F["<code>(user_id == 'alice')</code><br/><code>OR</code><br/><code>(group_id IN ['team-backend']</code><br/><code> AND visibility IN ['group','public'])</code>"]
    end

    Access --> Filter

    style U fill:#264653,color:#fff
    style G fill:#2a9d8f,color:#fff
    style P fill:#e9c46a,color:#000
    style F fill:#1b263b,color:#fff
Loading

Configuration

# Per-decision plugin config (in decisions[].plugins[])
- type: "memory"
  configuration:
    enabled: true
    retrieval_limit: 10          # max memories to inject
    similarity_threshold: 0.30   # minimum score cutoff
    auto_store: true             # extract facts from conversations
    hierarchical_search: true    # two-phase category → drill-down
    max_depth: 3                 # max tree depth to traverse
    hybrid_search: true          # BM25 + n-gram fusion
    hybrid_mode: "weighted"      # "weighted" or "rrf"
    follow_links: true           # graph expansion via RelatedIDs cross-links
    max_link_depth: 1            # hops to follow (1 = direct links only)

Key Source Files

File Role
pkg/memory/types.go Memory struct: ParentID, IsCategory, Abstract, Overview, Visibility, RelatedIDs
pkg/memory/hierarchical_retrieve.go Two-phase search: category scan → drill-down with score propagation + graph expansion via expandViaLinks
pkg/memory/hybrid_score.go MemBM25Index, MemNgramIndex, MemHybridScorer — score fusion
pkg/memory/inmemory_hierarchical.go In-memory HierarchicalStore implementation
pkg/memory/milvus_hierarchical.go Milvus-backed HierarchicalStore implementation
pkg/memory/extractor.go LLM-based fact extraction + deduplication
pkg/memory/categorizer.go Topic extraction, abstract/overview generation, parent assignment
pkg/extproc/processor_req_body.go Wires retrieval into ExtProc pipeline, injects memories
pkg/extproc/req_filter_memory.go Query rewriting, hybrid config builder, memory formatting
pkg/config/config.go MemoryPluginConfig with hierarchical + hybrid fields

Evaluation Results

Three-Way Comparison: Flat vs Hierarchical vs Hierarchical+Hybrid

Dataset: 30 memories across 6 topic clusters (deployment, memory, safety, rag, architecture, evaluation), with one query per cluster. Retrieval at k=5, threshold 0.30.

go test -v -tags milvus -run TestHybridHierarchical_ThreeWayComparison ./pkg/memory/

Per-Query Precision@5

Query Cluster Flat P@5 Hier-Cos P@5 Hier-Hybrid P@5
Kubernetes deployment pipeline with Helm... deployment 0.60 0.60 0.60
Memory retrieval and retention scoring... memory 0.20 0.20 0.40
Jailbreak detection and PII safety guardrails... safety 0.80 0.80 0.80
Hybrid RAG search combine vector similarity, BM25... rag 0.40 0.40 0.40
ExtProc signal engine architecture route requests... architecture 0.80 0.80 0.80
Metrics in the end-to-end evaluation and benchmark... evaluation 0.40 0.40 0.40

Averages

Method Avg P@5 Avg R@5 Avg Purity
Flat (cosine) 0.5333 0.5333 0.5333
Hier (cosine) 0.5333 0.5333 0.5333
Hier (hybrid) 0.5667 0.5667 0.5667

Deltas

DELTA Precision:
  hier-cosine vs flat:      +0.0000  (+0.0%)
  hier-hybrid vs flat:      +0.0333  (+6.2%)
  hier-hybrid vs hier-cos:  +0.0333  (+6.2%)

DELTA Recall:
  hier-cosine vs flat:      +0.0000  (+0.0%)
  hier-hybrid vs flat:      +0.0333  (+6.2%)
  hier-hybrid vs hier-cos:  +0.0333  (+6.2%)

DELTA Purity:
  hier-cosine vs flat:      +0.0000  (+0.0%)
  hier-hybrid vs flat:      +0.0333  (+6.2%)
  hier-hybrid vs hier-cos:  +0.0333  (+6.2%)

Weight Sweep: Effect of BM25 and N-gram Weight

go test -v -tags milvus -run TestHybridHierarchical_WeightSweep ./pkg/memory/
Weights deployment memory safety rag architecture evaluation Avg P@K
pure-cosine (nil) 0.60 0.20 0.80 0.40 0.80 0.40 0.5333
v=1.0 b=0.0 n=0.0 0.60 0.20 0.80 0.40 0.80 0.40 0.5333
v=0.8 b=0.1 n=0.1 0.60 0.40 0.80 0.40 0.80 0.40 0.5667
v=0.7 b=0.2 n=0.1 0.60 0.40 0.80 0.40 0.80 0.40 0.5667
v=0.6 b=0.3 n=0.1 0.60 0.40 0.60 0.40 0.80 0.40 0.5333
v=0.5 b=0.3 n=0.2 0.60 0.40 0.60 0.40 0.80 0.40 0.5333
v=0.5 b=0.5 n=0.0 0.60 0.40 0.60 0.40 0.80 0.40 0.5333
v=0.4 b=0.4 n=0.2 0.60 0.60 0.60 0.40 0.80 0.40 0.5667
rrf (default) 0.60 0.40 0.80 0.40 0.80 0.40 0.5667

Hybrid Score Unit Test

Validates that BM25/n-gram fusion correctly boosts documents with exact keyword overlap.

Query: "Helm charts Kubernetes deployment"

Doc Content Cosine Fused Delta
A Kubernetes/Helm (exact terms) 0.800 0.767 -0.033
B BM25 text (partial terms) 0.750 0.531 -0.219
C cat/mat (no terms) 0.700 0.491 -0.209

Doc A (matching keywords) retains the highest fused score; docs without relevant terms are penalized.

E2E Integration Test

make test-retrieval-api    # 10/10 passed

Seeds 5 topic memories (technology, cooking, travel, sports, music) through the full Envoy → ExtProc → LLM extraction → Milvus pipeline, then verifies retrieval in new sessions:

Phase Tests Passed What it validates
Phase 1: Seed 5 5 Messages accepted through /v1/responses
Phase 2: Storage 1 1 Memories extracted and stored in Milvus
Phase 3: Semantic retrieval 5 5 Queries in new sessions retrieve relevant memories (keywords appear only via injection)
Phase 4: Hybrid keyword 4 4 BM25 boosts exact-match queries

Cross-Document Link Expansion: Four-Way Strategy Comparison

go test -v -run TestRelatedIDs_CrossCategoryComparison ./pkg/memory/

Memories are organized across 4 categories (DevOps, Finance, ML, Compliance). Two cross-domain links are created via RelatedIDs:

  • DevOps "Helm charts deployment" ↔ Finance "quarterly spend allocation" (zero vocabulary overlap)
  • ML "GPU distributed training" ↔ Compliance "GDPR data retention" (zero vocabulary overlap)

Four retrieval strategies are tested against 2 queries designed to find the direct match AND the linked cross-domain memory:

Strategy Algorithm Direct Match Cross-Category Linked
Tree-Cosine hierarchical tree traversal, cosine scoring 2/2 0/2 (0%)
Tree-Hybrid hierarchical tree traversal, BM25 + n-gram + cosine 2/2 0/2 (0%)
Tree-Cosine + Links tree-cosine + RelatedIDs graph expansion 2/2 2/2 (100%)
Tree-Hybrid + Links tree-hybrid + RelatedIDs graph expansion 2/2 1/2 (50%)

Key findings:

  • Tree-Cosine (similar to LLM-based tree traversal approaches): Drills into the DevOps subtree and finds helm-deploy. Cannot reach the Finance subtree because there is no semantic path between "Kubernetes Helm charts" and "quarterly spend allocation."
  • Tree-Hybrid (similar to hybrid dense+sparse search approaches): Even with BM25 and n-gram matching on top of cosine, there are zero shared keywords between the DevOps query and the Finance memory. Hybrid scoring cannot bridge vocabulary-disjoint domains.
  • Tree-Cosine + Links: After finding helm-deploy, follows its RelatedIDs to fetch finance-budget, scores it via embedding cosine similarity (0.826 blended), and adds it to results. 100% cross-category recall.
  • Tree-Hybrid + Links: The referrer's propagated score is lower under hybrid scoring (category nodes score lower on BM25), reducing the blended link score. Still finds 1/2 linked memories — a known tradeoff where hybrid's keyword penalty on category propagation reduces the referrer's contribution to link blending.

TestFollowLinks_MultiHop: Chain of 3 memories linked a → b → c with decreasing semantic similarity to the query. With MaxLinkDepth=1, only a and b are found. With MaxLinkDepth=2, all three are found through two hops of traversal.

Related Work Context

The cross-document linking problem is well-studied in recent research:

  • Cross-partition KG linking (BridgeRAG, ICLR 2026): Uses shared named entities as conduits between document-level knowledge graphs. Requires NER and entity resolution pipelines.
  • Hierarchical Lexical Graph (HLG, KDD 2025): Three-tier index with entity-relationship links across documents. Achieves +23.1% recall over chunk-based RAG. Requires proposition extraction.
  • Heterogeneous multi-store fusion (HetaRAG): Routes queries across vector, KG, full-text, and SQL stores. Cross-document recall comes from combining modalities.

Our RelatedIDs approach is lightweight by comparison — no NER, no entity resolution, no proposition extraction. Links are explicit metadata that can be set by the application, an LLM, or a human. The four-way test above demonstrates that this simple mechanism bridges the cross-category gap that neither tree traversal nor hybrid search can close on their own.

Interpretation

  1. Hierarchical structure alone does not change results on a small, well-embedded dataset — the category drill-down converges to the same results as flat search.
  2. Adding hybrid scoring (BM25 + n-gram) provides a measurable +6.2% improvement by boosting documents that share exact terms with the query — particularly for queries where semantic similarity alone is ambiguous (the "memory" cluster precision doubled from 0.20 to 0.40).
  3. Optimal weights: v=0.7 b=0.2 n=0.1 or RRF improve weak clusters without degrading strong ones. Over-weighting BM25 (≥0.3) hurts clusters where keyword overlap is misleading.
  4. Graph expansion (follow_links: true) discovers cross-category memories that no tree-only or hybrid-only strategy can find. When the linked memory shares zero vocabulary with the query, only the explicit RelatedIDs link provides a retrieval path. Linked memories are scored with embedding cosine similarity (not hybrid — BM25/n-gram would penalize the cross-domain vocabulary gap), blended with the referrer's score as the primary relevance signal. Multi-hop traversal (max_link_depth: 2+) extends reach along relation chains.
  5. The E2E test confirms the full pipeline works end-to-end: extraction, storage, hierarchical retrieval, hybrid scoring, and memory injection into the LLM system prompt all function correctly through the Envoy ExtProc pipeline.

Huamin Chen added 2 commits February 22, 2026 21:18
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
@rootfs rootfs requested a review from Xunzhuo as a code owner February 23, 2026 14:27
@netlify
Copy link
Copy Markdown

netlify Bot commented Feb 23, 2026

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit bd4fe39
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/699cc4720a9f870008d8e51f
😎 Deploy Preview https://deploy-preview-1374--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@rootfs rootfs marked this pull request as draft February 23, 2026 14:27
@github-actions
Copy link
Copy Markdown
Contributor

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 config

Owners: @rootfs, @Xunzhuo
Files changed:

  • config/testing/config.memory-hierarchical.yaml
  • config/testing/envoy-retrieval-test.yaml

📁 e2e

Owners: @Xunzhuo
Files changed:

  • e2e/testing/mock-vllm-echo.py

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • scripts/test-retrieval-api.sh

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/pkg/config/config.go
  • src/semantic-router/pkg/config/config_test.go
  • src/semantic-router/pkg/extproc/processor_req_body.go
  • src/semantic-router/pkg/extproc/req_filter_memory.go
  • src/semantic-router/pkg/extproc/req_filter_memory_test.go
  • src/semantic-router/pkg/memory/categorizer.go
  • src/semantic-router/pkg/memory/hierarchical_benchmark_test.go
  • src/semantic-router/pkg/memory/hierarchical_comparison_test.go
  • src/semantic-router/pkg/memory/hierarchical_retrieve.go
  • src/semantic-router/pkg/memory/hierarchical_test.go
  • src/semantic-router/pkg/memory/hybrid_hierarchical_comparison_test.go
  • src/semantic-router/pkg/memory/hybrid_score.go
  • src/semantic-router/pkg/memory/inmemory_hierarchical.go
  • src/semantic-router/pkg/memory/inmemory_store.go
  • src/semantic-router/pkg/memory/milvus_hierarchical.go
  • src/semantic-router/pkg/memory/milvus_store.go
  • src/semantic-router/pkg/memory/relations.go
  • src/semantic-router/pkg/memory/store.go
  • src/semantic-router/pkg/memory/testdata/evaluation_dataset.json
  • src/semantic-router/pkg/memory/testdata/evaluation_source.json
  • src/semantic-router/pkg/memory/testdata/generate_source_dataset.go
  • src/semantic-router/pkg/memory/types.go

📁 tools

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

  • tools/make/build-run-test.mk

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a hierarchical memory retrieval system with hybrid scoring for the semantic router, enabling multi-tier memory organization with category-based search and BM25/n-gram/vector score fusion.

Changes:

  • Adds hierarchical memory structure with category nodes, parent-child relationships, and multi-level summaries (L0 abstract, L1 overview, L2 content)
  • Implements hybrid scoring that combines vector similarity, BM25 keyword matching, and character n-gram Jaccard similarity
  • Adds group-level memory sharing with visibility controls (user/group/public) and cross-memory relations

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tools/make/build-run-test.mk Adds makefile targets for running hierarchical memory tests
src/semantic-router/pkg/memory/types.go Defines hierarchical memory types including visibility, relations, and hybrid config
src/semantic-router/pkg/memory/testdata/generate_source_dataset.go Generator script for source code evaluation dataset
src/semantic-router/pkg/memory/testdata/evaluation_dataset.json Evaluation dataset with 30 memories across 6 clusters
src/semantic-router/pkg/memory/store.go Adds HierarchicalStore interface and type assertions
src/semantic-router/pkg/memory/relations.go Implements automatic bidirectional memory relation linking
src/semantic-router/pkg/memory/milvus_store.go Updates Milvus schema with hierarchical fields
src/semantic-router/pkg/memory/milvus_hierarchical.go Milvus implementation of hierarchical retrieval
src/semantic-router/pkg/memory/inmemory_store.go Adds relations map to in-memory store
src/semantic-router/pkg/memory/inmemory_hierarchical.go In-memory implementation of hierarchical retrieval
src/semantic-router/pkg/memory/hybrid_score.go BM25, n-gram indexing, and score fusion logic
src/semantic-router/pkg/memory/hybrid_hierarchical_comparison_test.go Three-way comparison tests showing +6.2% improvement
src/semantic-router/pkg/memory/hierarchical_test.go Unit tests for hierarchical retrieval components
src/semantic-router/pkg/memory/hierarchical_retrieve.go Generic two-phase hierarchical retrieval algorithm
src/semantic-router/pkg/memory/hierarchical_comparison_test.go Precision/recall comparison tests
src/semantic-router/pkg/memory/hierarchical_benchmark_test.go Performance benchmarks for retrieval operations
src/semantic-router/pkg/memory/categorizer.go Auto-categorization and summary generation logic
src/semantic-router/pkg/extproc/req_filter_memory_test.go Tests for hybrid config wiring
src/semantic-router/pkg/extproc/req_filter_memory.go Integrates hybrid scoring into request filter
src/semantic-router/pkg/extproc/processor_req_body.go Adds hierarchical/group retrieval to request processor
src/semantic-router/pkg/config/config_test.go Tests for new hierarchical/hybrid config fields
src/semantic-router/pkg/config/config.go Adds hierarchical and hybrid config fields
scripts/test-retrieval-api.sh End-to-end test script for hierarchical retrieval
e2e/testing/mock-vllm-echo.py Echo mock for memory injection verification
config/testing/envoy-retrieval-test.yaml Envoy config for retrieval tests
config/testing/config.memory-hierarchical.yaml Router config for hierarchical testing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 14 to 20
MemoryTypeSemantic MemoryType = "semantic"

// MemoryTypeProcedural represents instructions, how-to, steps.
// Example: "To deploy payment-service: run npm build, then docker push"
MemoryTypeProcedural MemoryType = "procedural"

// MemoryTypeEpisodic represents session summaries, past events.
// Example: "On Dec 29 2024, user planned Hawaii vacation with $10K budget"
MemoryTypeEpisodic MemoryType = "episodic"
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example comments for these memory types were removed. Consider restoring brief inline examples to help developers understand the difference between semantic, procedural, and episodic memory types.

Copilot uses AI. Check for mistakes.
schema := &entity.Schema{
CollectionName: m.collectionName,
Description: "Agentic Memory storage for cross-session context",
Description: "Agentic Memory storage with hierarchical organization and group sharing",
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'Agentic' to 'Agentive' or 'Agent-based'. 'Agentic' is not a standard English term in technical contexts.

Copilot uses AI. Check for mistakes.
SKIP_SEED="${SKIP_SEED:-0}"
EXTRACTION_WAIT="${EXTRACTION_WAIT:-12}"

USER_ID="retrieval_test_$(date +%s)"
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The USER_ID uses epoch timestamp which may collide if tests run in parallel within the same second. Consider adding a random suffix or using date +%s%N for nanosecond precision.

Suggested change
USER_ID="retrieval_test_$(date +%s)"
USER_ID="retrieval_test_$(date +%s%N)"

Copilot uses AI. Check for mistakes.
Huamin Chen added 2 commits February 23, 2026 20:40
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +153 to +159
// FusedScore computes the hybrid score for a single memory given its cosine
// similarity and the query string. Returns the fused score.
func (s *MemHybridScorer) FusedScore(memID string, cosineSim float32, query string) float32 {
bm25Scores := s.bm25.Score(query, s.cfg.BM25K1, s.cfg.BM25B)
ngramScores := s.ngram.Score(query)
return s.fuseOne(memID, cosineSim, bm25Scores, ngramScores)
}
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MemHybridScorer.FusedScore recomputes BM25 and n-gram scores for the entire corpus on every call, making callers that score many memories (e.g., hierarchical retrieval loops) effectively O(N²) per query. Cache per-query score maps inside MemHybridScorer or use the existing FusedScores batch API in callers to compute BM25/ngram once per query.

Copilot uses AI. Check for mistakes.
Comment on lines +338 to +341
func BuildGroupFilter(userID string, groupIDs []string, includeGroup bool) string {
if !includeGroup || len(groupIDs) == 0 {
return fmt.Sprintf("user_id == \"%s\"", userID)
}
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BuildGroupFilter drops public memories when includeGroup=true but groupIDs is empty because it falls back to user_id == .... That conflicts with the VisibilityPublic semantics ("any user") and with InMemoryStore.passesAccessFilter (which admits public when IncludeGroupLevel is true). Consider adding an explicit || visibility == "public" clause when includeGroup is enabled, regardless of groupIDs.

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +38

// CrossGroup allows linking memories across group boundaries when both are
// visible to each other (group or public visibility).
CrossGroup bool
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoLinkOptions.CrossGroup is currently unused (AutoLinkNewMemory always searches with RetrieveOptions{UserID: newMem.UserID} and never broadens scope/filters for cross-group visibility). Either implement the cross-group retrieval logic or remove the option to avoid a misleading configuration knob.

Suggested change
// CrossGroup allows linking memories across group boundaries when both are
// visible to each other (group or public visibility).
CrossGroup bool

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +2
// +build ignore

Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generator uses the legacy // +build ignore tag only. For Go 1.17+ compatibility and go vet/tooling, add the new build constraint form too (i.e., //go:build ignore plus the existing // +build ignore).

Copilot uses AI. Check for mistakes.
Comment on lines +123 to 127
// Define schema for agentic memory (v2: includes hierarchical + group fields)
schema := &entity.Schema{
CollectionName: m.collectionName,
Description: "Agentic Memory storage for cross-session context",
Description: "Agentic Memory storage with hierarchical organization and group sharing",
AutoID: false,
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensureCollection now defines a v2 schema with new hierarchical fields, but when the collection already exists the function returns without validating/migrating the schema. Upgrades against an existing collection can silently miss required fields (group_id/parent_id/is_category/visibility/abstract), breaking hierarchical retrieval. Consider a versioned collection name, a schema check with a clear error, or an explicit migration path.

Copilot uses AI. Check for mistakes.
s.mu.RLock()
defer s.mu.RUnlock()

queryEmbedding, err := GenerateEmbedding(opts.Query, s.embeddingConfig)
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HierarchicalRetrieveWithConfig always generates queryEmbedding using s.embeddingConfig, and link expansion reuses that embedding. opts.LinkEmbeddingConfig is never honored in the InMemory hierarchical implementation, so callers can’t control the embedding model used for scoring linked memories as advertised by HierarchicalRetrieveOptions. Consider selecting the embedding config based on opts.LinkEmbeddingConfig when FollowLinks is enabled (or removing the option for in-memory).

Suggested change
queryEmbedding, err := GenerateEmbedding(opts.Query, s.embeddingConfig)
embeddingConfig := s.embeddingConfig
if opts.FollowLinks {
embeddingConfig = opts.LinkEmbeddingConfig
}
queryEmbedding, err := GenerateEmbedding(opts.Query, embeddingConfig)

Copilot uses AI. Check for mistakes.
Comment on lines +670 to 676
metadata["overview"] = memory.Overview
}
metadataJSON, err := json.Marshal(metadata)
if err != nil {
return fmt.Errorf("failed to marshal metadata: %w", err)
}

Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relations won’t persist in Milvus: StoreRelation/appendRelatedID only updates Memory.RelatedIDs and then calls upsert, but Store/upsert don’t serialize RelatedIDs into any Milvus column (metadata JSON also omits related_ids). Either add a dedicated RelatedIDs field in the collection schema or include RelatedIDs in the metadata JSON and make Get/Retrieve populate it consistently.

Copilot uses AI. Check for mistakes.
Comment on lines +79 to +116
func (m *MilvusStore) HierarchicalRetrieveWithConfig(ctx context.Context, opts HierarchicalRetrieveOptions, cfg MilvusHierarchicalConfig) ([]*RetrieveResult, error) {
if !m.enabled {
return nil, fmt.Errorf("milvus store is not enabled")
}

opts.ApplyDefaults()
cfg.ApplyDefaults()

limit := opts.Limit
if limit <= 0 {
limit = m.config.DefaultRetrievalLimit
}
threshold := opts.Threshold
if threshold <= 0 {
threshold = m.config.DefaultSimilarityThreshold
}

if opts.Query == "" {
return nil, fmt.Errorf("query is required")
}
if opts.UserID == "" && !opts.IncludeGroupLevel {
return nil, fmt.Errorf("user id or group ids required")
}

logging.Debugf("MilvusStore.HierarchicalRetrieve: query='%s', user_id='%s', groups=%v, limit=%d",
truncateForLog(opts.Query, 60), opts.UserID, opts.GroupIDs, limit)

queryEmbedding, err := GenerateEmbedding(opts.Query, m.embeddingConfig)
if err != nil {
return nil, fmt.Errorf("failed to generate embedding: %w", err)
}

baseFilter := BuildGroupFilter(opts.UserID, opts.GroupIDs, opts.IncludeGroupLevel)

if len(opts.Types) > 0 {
typeFilter := "("
for i, memType := range opts.Types {
if i > 0 {
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MilvusStore.HierarchicalRetrieveWithConfig ignores key options (opts.Hybrid and opts.FollowLinks/MaxLinkDepth/LinkEmbeddingConfig). Because MilvusStore implements HierarchicalStore, callers will take this path and never get hybrid fusion or link expansion in production. Either implement these features here (or explicitly reject them with an error) so behavior matches InMemory/generic retrieval and the plugin config wiring.

Copilot uses AI. Check for mistakes.
@yehuditkerido
Copy link
Copy Markdown
Collaborator

yehuditkerido commented Feb 25, 2026

1. Graph Concepts Without Neo4j

You're implementing graph concepts here without the complexity of running a new database like Neo4j. Maybe we'll need to implement that in the future for multi-hop visualization or other graph benefits, but for now this is a great improvement.

Note: Tracking issue #1293 has Neo4j as P3.


2. Retention Scoring Integration

Future work mentioned in tracking issue #1293 is adding quality feedback to distinguish "accessed often" from "actually useful". That's complementary and doesn't need to block this PR.


3. Memory Type Routing

The Types field implementation looks good. This addresses the "Memory type routing" item from tracking issue #1293.


Critical Issues

1. Category Creation Race Condition

Location: categorizer.go:1246-1253

There's a race here where two concurrent requests can both create the same category, causing Milvus primary key violations.

This will cause production failures under load.

Related: This is the "Concurrency handling" item from tracking issue #1293 (currently P2). For now I think adding lock for category creation will solve the issue.


2. Category Pruning Edge Case

Categories are naturally protected (broad semantic match → high access count).
But what's the policy if one IS deleted? Children have ParentID pointing to
deleted node. Worth documenting the intended behavior.


Performance Question

  • Does the new retrieval methods adds latency? Any chance it can get significant for 2 phases retrieval?

This relates to "Load testing at scale" in tracking issue #1293.


@rootfs
Copy link
Copy Markdown
Collaborator Author

rootfs commented Feb 26, 2026

@yehudit1987 yes adding additional memory management is a good point. We can break it down into two approaches:

  • Router self service: router manages all chat history and injection.
  • Memory management sidecar. The chat history or external knowledge can be managed by a sidecar and router just retrieves them.

I feel the 2nd approach is more scalable, especially the vector, file, and memory are unified in #1383. In that way, we can build more memory structure like episodic, graph, etc. This sidecar can also be exposed as a claude skill too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants