Skip to content

security: add max_evaluation_chars limit to prevent giant prompt DoS#1455

Open
yossiovadia wants to merge 2 commits intovllm-project:mainfrom
yossiovadia:fix/signal-eval-input-limits
Open

security: add max_evaluation_chars limit to prevent giant prompt DoS#1455
yossiovadia wants to merge 2 commits intovllm-project:mainfrom
yossiovadia:fix/signal-eval-input-limits

Conversation

@yossiovadia
Copy link
Copy Markdown
Collaborator

Summary

Fixes #1454 — signal evaluation latency grows super-linearly with prompt size (10K chars = 21s, 25K+ = timeout). A single client can make the router unresponsive.

Fix

Add max_evaluation_chars config with 8192-char default that truncates evaluation text before any signal processing:

# In router config (optional — default 8192)
max_evaluation_chars: 8192  # set to -1 to disable
  • Truncation at character level before compression/embedding/classification
  • 8192 chars ≈ ~2K tokens — well within embedding model capacity
  • Does NOT truncate the request body — only text used for routing decisions
  • Complements prompt_compression (quality-aware NLP) with a hard safety bound

Changes

File Change
pkg/config/config.go Add MaxEvaluationChars field with documentation
pkg/extproc/req_filter_classification.go Truncate evaluationText before signal processing

2 files, 22 insertions.

Test plan

  • make build-router passes
  • golangci-lint — 0 issues on changed files
  • Default 8192 limit applied when config is omitted
  • Prompts under limit pass through unchanged
  • Prompts over limit truncated with warning log

@netlify
Copy link
Copy Markdown

netlify bot commented Mar 6, 2026

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 3fdacb8
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/69bb0acde5525c00080fddc4
😎 Deploy Preview https://deploy-preview-1455--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@yossiovadia yossiovadia force-pushed the fix/signal-eval-input-limits branch from 111c271 to 9e8c459 Compare March 6, 2026 20:43
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/pkg/config/config.go
  • src/semantic-router/pkg/config/config_test.go
  • src/semantic-router/pkg/extproc/req_filter_classification.go

📁 tools

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

  • tools/agent/structure-rules.yaml

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@rootfs
Copy link
Copy Markdown
Collaborator

rootfs commented Mar 9, 2026

@yossiovadia truncating long prompt introduces attention loss, please check the prompt compression feature #1437 that reduces the seq len

@rootfs rootfs added the hold label Mar 9, 2026
@yossiovadia
Copy link
Copy Markdown
Collaborator Author

Prompt compression helps with classification quality but doesn't protect against DoS. In our testing (#1454), a 25K char prompt caused the router to become completely unresponsive — health endpoint stopped responding. This happened because:

  1. Compression is disabled by default — most deployments have no protection
  2. Even when enabled, compression itself is O(n²) on the input sentences — a giant prompt overwhelms the compression step before it even reaches signal evaluation

max_evaluation_chars is a hard safety bound that truncates before any processing (compression or classification). It's defense-in-depth — complements compression, doesn't replace it.

@yossiovadia yossiovadia force-pushed the fix/signal-eval-input-limits branch from 9e8c459 to 0d148c8 Compare March 11, 2026 21:57
…llm-project#1454)

Signal evaluation latency grows super-linearly with prompt size (10K
chars = 21s, 25K+ = timeout). Without a hard limit, a single client
can make the router unresponsive by sending large prompts.

Fix: add max_evaluation_chars config with 8192-char default.

- Truncate evaluationText before any signal processing (compression,
  embedding, classification)
- Default 8192 chars (~2K tokens) — within embedding model capacity
- Configurable: increase/decrease via config, or disable with -1
- Does NOT truncate the actual request body — only the text used for
  routing signal evaluation
- Complements existing prompt_compression (quality-aware NLP-based
  reduction) with a hard safety bound (simple char truncation)

Fixes vllm-project#1454

Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
@yossiovadia yossiovadia force-pushed the fix/signal-eval-input-limits branch from 0d148c8 to bbd431e Compare March 18, 2026 19:49
…g 281-line function)

Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
@yossiovadia yossiovadia force-pushed the fix/signal-eval-input-limits branch from a95fb8a to 3fdacb8 Compare March 18, 2026 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

security: no input size limit for signal evaluation — giant prompt DoS

5 participants