security: add max_evaluation_chars limit to prevent giant prompt DoS#1455
security: add max_evaluation_chars limit to prevent giant prompt DoS#1455yossiovadia wants to merge 2 commits intovllm-project:mainfrom
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
111c271 to
9e8c459
Compare
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
|
@yossiovadia truncating long prompt introduces attention loss, please check the prompt compression feature #1437 that reduces the seq len |
|
Prompt compression helps with classification quality but doesn't protect against DoS. In our testing (#1454), a 25K char prompt caused the router to become completely unresponsive — health endpoint stopped responding. This happened because:
|
9e8c459 to
0d148c8
Compare
…llm-project#1454) Signal evaluation latency grows super-linearly with prompt size (10K chars = 21s, 25K+ = timeout). Without a hard limit, a single client can make the router unresponsive by sending large prompts. Fix: add max_evaluation_chars config with 8192-char default. - Truncate evaluationText before any signal processing (compression, embedding, classification) - Default 8192 chars (~2K tokens) — within embedding model capacity - Configurable: increase/decrease via config, or disable with -1 - Does NOT truncate the actual request body — only the text used for routing signal evaluation - Complements existing prompt_compression (quality-aware NLP-based reduction) with a hard safety bound (simple char truncation) Fixes vllm-project#1454 Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
0d148c8 to
bbd431e
Compare
…g 281-line function) Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
a95fb8a to
3fdacb8
Compare

Summary
Fixes #1454 — signal evaluation latency grows super-linearly with prompt size (10K chars = 21s, 25K+ = timeout). A single client can make the router unresponsive.
Fix
Add
max_evaluation_charsconfig with 8192-char default that truncates evaluation text before any signal processing:prompt_compression(quality-aware NLP) with a hard safety boundChanges
pkg/config/config.goMaxEvaluationCharsfield with documentationpkg/extproc/req_filter_classification.goevaluationTextbefore signal processing2 files, 22 insertions.
Test plan
make build-routerpassesgolangci-lint— 0 issues on changed files