[Feat]: add scheduled memory pruning with two-path strategy by abdallahsamabd · Pull Request #1373 · vllm-project/semantic-router

abdallahsamabd · 2026-02-23T13:38:50Z

Implement MemoryBank-style retention scoring R=exp(-t/S) with two complementary pruning paths:

Path 1 (event-driven): async cap enforcement on Store() when user exceeds max_memories_per_user
Path 2 (background sweep): periodic time.Ticker goroutine prunes decayed memories for inactive users in batches

Includes Prometheus metrics, graceful shutdown, multi-replica support via prune_sweep_enabled flag, config template, and documentation.

FILL IN THE PR DESCRIPTION HERE

FIX #xxxx (link existing issues this PR will resolve)

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE

Make sure the code changes pass the pre-commit checks.
Sign-off your commit by using -s when doing git commit
Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].

Detailed Checklist (Click to Expand)

Thank you for your contribution to semantic-router! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[CLI] for changes to the command-line interface tools.
[Dashboard] for changes to the dashboard or web UI.
[Doc] for documentation fixes and improvements.
[Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
[Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
The code need to be well-documented to ensure future contributors can easily understand the code.
Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

netlify · 2026-02-23T13:38:56Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`e058375`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/69adfa8861deaf0008a8e77b
😎 Deploy Preview	https://deploy-preview-1373--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2026-02-23T13:39:05Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `src`

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

src/semantic-router/pkg/config/runtime_config.go
src/semantic-router/pkg/extproc/router.go
src/semantic-router/pkg/extproc/router_build.go
src/semantic-router/pkg/extproc/server.go
src/semantic-router/pkg/memory/milvus_retry.go
src/semantic-router/pkg/memory/milvus_store.go
src/semantic-router/pkg/memory/milvus_store_prune.go
src/semantic-router/pkg/memory/prune_metrics.go
src/semantic-router/pkg/memory/pruner.go
src/semantic-router/pkg/memory/pruner_test.go
src/semantic-router/pkg/memory/score.go

📁 `tools`

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

tools/agent/structure-rules.yaml

📁 `website`

Owners: @Xunzhuo, @rootfs, @yuluo-yx
Files changed:

website/docs/installation/configuration.md
website/docs/proposals/agentic-memory.md

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

rootfs · 2026-02-23T18:57:24Z

@abdallahsamabd how is different than #1313? Do you have any benchmark on time decay or quota based prune strategies like the memorybank paper?

abdallahsamabd · 2026-02-23T19:16:19Z

@rootfs
as we can see in this ticket #1350
PruneUser is currently only callable programmatically — there is no automated job that runs it on a schedule.

Copilot

Pull request overview

This PR implements MemoryBank-style memory pruning with a retention scoring system (R=exp(-t/S)) and two complementary pruning strategies to prevent unbounded memory growth.

Changes:

Added event-driven cap enforcement (Path 1) that asynchronously prunes memories when users exceed max_memories_per_user on Store()
Implemented background sweep mechanism (Path 2) using a periodic ticker to prune decayed memories for inactive users in batches
Added Prometheus metrics for monitoring pruning activity, sweep performance, and error tracking

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
website/docs/proposals/agentic-memory.md	Updated feature status table to mark memory pruning and quotas as implemented
website/docs/installation/configuration.md	Added comprehensive documentation for memory pruning configuration, metrics, and multi-replica deployment
src/vllm-sr/cli/templates/config.template.yaml	Updated config template with new pruning parameters and two-path strategy explanation
src/semantic-router/pkg/memory/pruner_test.go	Added comprehensive test coverage for pruning functionality including cap enforcement, sweep operations, and edge cases
src/semantic-router/pkg/memory/pruner.go	Implemented background sweep goroutine with batch processing and graceful shutdown
src/semantic-router/pkg/memory/prune_metrics.go	Defined Prometheus metrics for tracking pruning operations and performance
src/semantic-router/pkg/memory/milvus_store.go	Added event-driven cap enforcement, helper methods for counting and querying stale memories
src/semantic-router/pkg/extproc/server.go	Added graceful shutdown of prune sweep goroutine in Stop()
src/semantic-router/pkg/extproc/router.go	Integrated prune sweep startup and added StopPruneSweep field to router
src/semantic-router/pkg/config/config.go	Added configuration fields for prune interval, batch size, and sweep enablement

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-24T02:34:35Z

+
+	// Path 1: event-driven cap enforcement — async prune if user exceeds max_memories_per_user
+	if m.config.QualityScoring.MaxMemoriesPerUser > 0 {
+		go m.pruneIfOverCap(context.Background(), memory.UserID)


Using context.Background() in a goroutine ignores the parent context's cancellation and deadline. Consider propagating a detached context derived from the parent (e.g., using a custom function to extract values without cancellation) or document why ignoring cancellation is acceptable for async pruning.

Suggested change

go m.pruneIfOverCap(context.Background(), memory.UserID)

go m.pruneIfOverCap(ctx, memory.UserID)

context.Background() is intentional here. The goroutine must outlive the HTTP/gRPC request — if we passed ctx, the pruning would be cancelled as soon as Store() returns to the caller

rootfs · 2026-02-24T14:02:33Z

@abdallahsamabd @yehudit1987 Let's have a design review on memory pruning.

The pruning is not just a storage issue, it is a semantic issue too: if the related context is pruned, the remaining memory could be corrupted.
The pruning needs to scale wrt the users.

abdallahsamabd · 2026-02-25T20:08:56Z

Hi @rootfs @yehudit1987
please review this design document
memory-pruning-design.html
thanks

rootfs · 2026-02-25T20:30:28Z

@abdallahsamabd thanks for having the design doc.

Memory injection has to be dealt with care. Since the router makes decisions on behalf of the users, injecting conflict/wrong/stale memory will have poor consequences (see this). This is the top concern at the moment, pruning will be after that.

For any injection and pruning strategy, we need mitigate the risk by using well validated, highly cited research, rather than hand wavy ideas. The memory bank solution makes that cut. If you can support any of your PRs on that basis, it would make them much stronger.

abdallahsamabd · 2026-02-25T22:06:55Z

@rootfs
The retention scoring and pruning in PR #1373 is directly based on the MemoryBank paper (Zhong et al., arXiv:2305.10250), which uses the Ebbinghaus forgetting curve for memory lifecycle management:

R = exp(-t/S), where t = days since last access, S = S0 + access_count
Memories that are retrieved frequently build up strength (higher S), decaying slower
Memories below the threshold R < 0.1 are pruned

regarding memory injection, I opened this issue #1386

…lm-project#1350) Signed-off-by: Abdallah Samara <abdallahsamabd@gmail.com>

abdallahsamabd requested review from Xunzhuo and rootfs as code owners February 23, 2026 13:38

github-actions Bot assigned rootfs, wangchen615, Xunzhuo and yuluo-yx Feb 23, 2026

abdallahsamabd force-pushed the feat/1350 branch from 6dc1e37 to baf9f10 Compare February 23, 2026 13:56

abdallahsamabd force-pushed the feat/1350 branch from baf9f10 to b432711 Compare February 23, 2026 19:12

Xunzhuo requested a review from Copilot February 24, 2026 02:29

Copilot started reviewing on behalf of Xunzhuo February 24, 2026 02:34 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

abdallahsamabd force-pushed the feat/1350 branch 6 times, most recently from 7b5ed7f to 87c6c36 Compare February 24, 2026 14:01

rootfs added the hold label Feb 24, 2026

abdallahsamabd force-pushed the feat/1350 branch from 87c6c36 to 75bfc28 Compare February 24, 2026 22:51

abdallahsamabd force-pushed the feat/1350 branch 2 times, most recently from 91e9850 to ad1df7f Compare March 8, 2026 10:41

github-actions Bot assigned JaredforReal Mar 8, 2026

abdallahsamabd force-pushed the feat/1350 branch 6 times, most recently from b5a0888 to bfe7794 Compare March 8, 2026 20:41

feat(memory): add scheduled memory pruning with two-path strategy (vl…

e058375

…lm-project#1350) Signed-off-by: Abdallah Samara <abdallahsamabd@gmail.com>

abdallahsamabd force-pushed the feat/1350 branch from bfe7794 to e058375 Compare March 8, 2026 22:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat]: add scheduled memory pruning with two-path strategy#1373

[Feat]: add scheduled memory pruning with two-path strategy#1373
abdallahsamabd wants to merge 1 commit intovllm-project:mainfrom
abdallahsamabd:feat/1350

abdallahsamabd commented Feb 23, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

rootfs commented Feb 23, 2026

Uh oh!

abdallahsamabd commented Feb 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

abdallahsamabd Feb 24, 2026

Uh oh!

rootfs commented Feb 24, 2026 •

edited

Loading

Uh oh!

abdallahsamabd commented Feb 25, 2026

Uh oh!

rootfs commented Feb 25, 2026 •

edited

Loading

Uh oh!

abdallahsamabd commented Feb 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

	go m.pruneIfOverCap(context.Background(), memory.UserID)
	go m.pruneIfOverCap(ctx, memory.UserID)

Conversation

abdallahsamabd commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Title and Classification

Code Quality

DCO and Signed-off-by

What to Expect for the Reviews

Uh oh!

netlify Bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions Bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 src

📁 tools

📁 website

🎉 Thanks for your contributions!

Uh oh!

rootfs commented Feb 23, 2026

Uh oh!

abdallahsamabd commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

abdallahsamabd Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

rootfs commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abdallahsamabd commented Feb 25, 2026

Uh oh!

rootfs commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abdallahsamabd commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

abdallahsamabd commented Feb 23, 2026 •

edited

Loading

netlify Bot commented Feb 23, 2026 •

edited

Loading

github-actions Bot commented Feb 23, 2026 •

edited

Loading

📁 `src`

📁 `tools`

📁 `website`

abdallahsamabd commented Feb 23, 2026 •

edited

Loading

rootfs commented Feb 24, 2026 •

edited

Loading

rootfs commented Feb 25, 2026 •

edited

Loading

abdallahsamabd commented Feb 25, 2026 •

edited

Loading