Add cross-model global budget rate limiting with Valkey provider by fbalicchia · Pull Request #1727 · vllm-project/semantic-router

fbalicchia · 2026-04-08T16:51:19Z

Purpose

What: Adds a Valkey-backed, cross-model global budget rate limiter that enforces a single spending envelope per user across all AI models (Haiku, Sonnet, Opus), regardless of which model serves the request. Also extends cost tracking to include prompt caching
tokens (cache_read/cache_write).
Why: Envoy AI Gateway's native rate limiting is per-route (i.e. per-model), so a user with a $3/month budget could spend $3 on each model independently. With the Semantic Router dynamically selecting models, a true cross-model budget is essential to prevent
budget overruns.
Modules affected: Router / CI/Build

Key changes

valkey_provider.go — new valkey-limiter rate limit provider using Valkey INCRBY with TTL-based windows. Cost expressed in CEL units ($10⁻⁸) for parity with AI Gateway formulas. Built on the valkey-glide Go client.
Prompt caching cost tracking — extended TokenUsage, response usage parsing, and ModelPricing config with CacheReadPer1M / CacheWritePer1M rates (Anthropic & OpenAI formats).
Config additions — DB field on rate limit provider config, GetModelPricingFull() helper, provider type renamed from redis-limiter to valkey-limiter.
Streaming support — extractStreamingUsage and reportStreamingUsageMetrics now propagate cache token counts.
Dockerfile fix — restructured Dockerfile.extproc for reliable multi-arch (amd64 + arm64) builds including nlp-binding.
Build fix — removed duplicate resolveModelConfig in helper.go.

Test Plan

go test ./... -run TestValkey — unit tests for the Valkey limiter (valkey_provider_test.go)
go test ./... -run TestCELParity — verifies CEL cost calculations match Envoy AI Gateway formulas (cel_parity_test.go)
e2e/scripts/test_ratelimit_e2e.sh — end-to-end rate limit validation against a running Valkey instance
docker buildx build --platform linux/amd64,linux/arm64 -f Dockerfile.extproc . — confirms multi-arch build succeeds

Test Result

All unit tests pass; CEL parity tests confirm cost calculations match the Envoy AI Gateway formulas within rounding tolerance.
Multi-arch Docker build completes successfully.
Follow-up: E2E test requires a running Valkey instance; CI integration for this is not yet wired up.

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

netlify · 2026-04-08T16:51:29Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`341428a`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/69d96355a10a0c000980cc30
😎 Deploy Preview	https://deploy-preview-1727--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2026-04-08T16:51:37Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `config`

Owners: @rootfs, @Xunzhuo
Files changed:

config/config.yaml

📁 `src/semantic-router`

Owners: @rootfs, @Xunzhuo, @szedan-rh, @yehuditkerido, @abdallahsamabd, @asaadbalum, @liavweiss, @noalimoy
Files changed:

src/semantic-router/pkg/cache/valkey_cache_helpers.go
src/semantic-router/pkg/cache/valkey_cache_integration_test.go
src/semantic-router/pkg/config/config.go
src/semantic-router/pkg/config/helper.go
src/semantic-router/pkg/config/helper_provider.go
src/semantic-router/pkg/config/model_config_types.go
src/semantic-router/pkg/extproc/processor_req_body_prepare.go
src/semantic-router/pkg/extproc/processor_res_body_streaming.go
src/semantic-router/pkg/extproc/processor_res_cache.go
src/semantic-router/pkg/extproc/processor_res_usage.go
src/semantic-router/pkg/extproc/processor_res_usage_test.go
src/semantic-router/pkg/extproc/router_resolvers.go
src/semantic-router/pkg/ratelimit/cel_parity_test.go
src/semantic-router/pkg/ratelimit/local_provider.go
src/semantic-router/pkg/ratelimit/provider.go
src/semantic-router/pkg/ratelimit/ratelimit_test.go
src/semantic-router/pkg/ratelimit/valkey_provider.go
src/semantic-router/pkg/ratelimit/valkey_provider_test.go

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

github-actions · 2026-04-08T16:51:55Z

✅ Supply Chain Security Report — All Clear

Scanner	Status	Findings
AST Codebase Scan (Py, Go, JS/TS, Rust)	✅	27 finding(s) — MEDIUM: 21 · LOW: 6
AST PR Diff Scan	✅	No issues detected
Regex Fallback Scan	✅	No issues detected

Scanned at 2026-04-10T20:58:18.789Z · View full workflow logs

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Filippo Balicchia <fbalicchia@gmail.com>

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

src/semantic-router/pkg/cache/valkey_cache.go

src/semantic-router/pkg/extproc/router_components.go

src/semantic-router/pkg/ratelimit/local_provider.go

src/semantic-router/pkg/ratelimit/valkey_provider.go

daric93

Thanks for this PR. There are a few issues that need to be addressed before merging:

This PR modifies the existing Valkey cache (valkey_cache.go, valkey_cache_helpers.go) in ways that appear unrelated to the rate limiter feature. These changes (TEXT→TAG schema change, FT.SEARCH replaced with reverse-lookup keys, validation code removed) introduce new complexity and reduce parity with the Milvus backend. If the rate limiter needs additional cache capabilities, it could add new methods rather than changing existing logic that has no issues.
Bug fix should not be blocked — The router_components.go fix that wires up the Valkey config into createSemanticCache is important. Please consider merging it as a separate small PR so it doesn't get delayed by iteration on the rate limiter. Has been split out into #1737
Atomicity concerns — Both the new pending:* keys in the cache and the INCRBY + EXPIRE in the rate limiter lack atomicity guarantees.

See inline comments for details.

daric93

Note: the router_components.go Valkey config wiring fix has been split out into #1737 so it can be merged independently.

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

src/semantic-router/pkg/cache/valkey_cache.go

src/semantic-router/pkg/ratelimit/local_provider.go

src/semantic-router/pkg/cache/valkey_cache.go

src/semantic-router/pkg/ratelimit/valkey_provider.go

tools/docker/Dockerfile.extproc

Signed-off-by: Filippo Balicchia <fbalicchia@cuebiq.com>

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

fbalicchia · 2026-04-10T20:55:53Z

@daric93 thanks for the review — here’s a brief recap. Let me know if I missed anything.

Revert valkey_cache.go to main: remove reverse-lookup pending:* keys
and restore FT.SEARCH-based pending entry resolution (to be submitted
as a separate PR)
Add debug logging when no ratelimit rule matches a request (fail-open)
in both LocalLimiter and ValkeyLimiterProvider, helping operators catch
misconfigured rule sets
Replace manual reverse-scan in parseHostPort with net.SplitHostPort
for correct IPv6 address handling

Also Dockerfile.extproc was reverted

fbalicchia added 10 commits April 8, 2026 18:15

make possible to build image on macOS

a5e132a

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

valy db configurable

c057836

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

add valkeys params

f7edb92

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

Take in cosideration cache price too

3281513

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

wip

e34740d

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

git commit add test

81f37b6

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

remove duplicate code

35a997f

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

use valkey istead of redis

3d03309

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

replace FT.SEARCH with direct key lookup for pending cache entries

24095d7

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

add month case in ParseUnit

917b084

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

fbalicchia requested review from Xunzhuo and rootfs as code owners April 8, 2026 16:51

github-actions bot assigned abdallahsamabd, asaadbalum, liavweiss, noalimoy, rootfs, samzong, szedan-rh, Xunzhuo, yehuditkerido and yuluo-yx Apr 8, 2026

fbalicchia mentioned this pull request Apr 8, 2026

Add cross-model global budget rate limiting with Valkey provider #1726

Closed

4 tasks

fbalicchia added 3 commits April 8, 2026 21:40

Fix: make golint happy

bb2da6b

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

remove blank line before error check in valkey_cache.go

599c1d6

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

yet an other formt fix

bfca243

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

fbalicchia force-pushed the feat/global-model-ratelimit-clean branch from bf67f8f to bfca243 Compare April 8, 2026 20:20

fbalicchia and others added 2 commits April 9, 2026 18:04

Update src/semantic-router/pkg/extproc/processor_res_cache.go

b756d7d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Filippo Balicchia <fbalicchia@gmail.com>

Apply copilot change

7fb6a36

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

daric93 reviewed Apr 9, 2026

View reviewed changes

src/semantic-router/pkg/cache/valkey_cache.go Outdated Show resolved Hide resolved

daric93 reviewed Apr 9, 2026

View reviewed changes

src/semantic-router/pkg/cache/valkey_cache.go Outdated Show resolved Hide resolved

daric93 reviewed Apr 9, 2026

View reviewed changes

src/semantic-router/pkg/cache/valkey_cache.go Outdated Show resolved Hide resolved

daric93 reviewed Apr 9, 2026

View reviewed changes

src/semantic-router/pkg/cache/valkey_cache.go Show resolved Hide resolved

daric93 reviewed Apr 9, 2026

View reviewed changes

src/semantic-router/pkg/extproc/router_components.go Show resolved Hide resolved

daric93 reviewed Apr 9, 2026

View reviewed changes

src/semantic-router/pkg/ratelimit/local_provider.go Show resolved Hide resolved

daric93 reviewed Apr 9, 2026

View reviewed changes

src/semantic-router/pkg/ratelimit/valkey_provider.go Show resolved Hide resolved

daric93 suggested changes Apr 9, 2026

View reviewed changes

daric93 reviewed Apr 9, 2026

View reviewed changes

fbalicchia added 5 commits April 9, 2026 23:02

Merge branch 'main' into feat/global-model-ratelimit-clean

b939324

Fix TTL race conditions in Valkey cache and ratelimit providers

dbad827

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

Fix TTL race in ratelimit, update cache index schema, and fix usage test

e973fd0

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>

Merge branch 'main' into feat/global-model-ratelimit-clean

9c62e3a

Fix: golint after merge

7a82228

Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>