Feat/cli eval by AmyTao · Pull Request #1735 · vllm-project/semantic-router

AmyTao · 2026-04-09T16:55:04Z

PLEASE FILL IN THE PR DESCRIPTION BELOW AND CONFIRM THE CHECKLIST ITEMS.

Closes #1725

Purpose

What does this PR change?

Adds a vllm-sr eval CLI command that calls the router evaluation endpoint POST /api/v1/eval and prints the structured signal evaluation result.
Supports --prompt and --messages (OpenAI-style JSON array string), with --json for full payload output and --endpoint override.
Adds unit tests and a short usage example in src/vllm-sr/README.md.

Why is this change needed?

Provides a fast developer workflow to inspect which signals fire for a prompt without crafting HTTP requests or using the dashboard evaluation flow.

Which module(s) does this affect?
CLI, Docs

Test Plan

Unit tests:
cd src/vllm-sr && python -m pip install -e '.[dev]'
cd src/vllm-sr && pytest -q

Agent lint gate (changed files):
make agent-lint CHANGED_FILES="src/vllm-sr/cli/main.py,src/vllm-sr/cli/commands/eval.py,src/vllm-sr/tests/test_eval_command.py,src/vllm-sr/README.md,docs/agent/environments.md"

Manual (requires router running):
vllm-sr eval --prompt "hello"
vllm-sr eval --messages '[{"role":"user","content":"hello"}]' --json
(Optional) vllm-sr eval --prompt "hello" --endpoint http://localhost:8080

Why sufficient:

The change is isolated to the Python CLI surface; unit tests cover request encoding, endpoint resolution, and error handling, and the agent-lint gate validates repo lint/structure expectations for touched files.

Test Result

src/vllm-sr pytest: pass (231 tests).
make agent-lint (with the changed-files list above): pass.
Manual validation: expected to work when the router API is reachable at the configured endpoint; --endpoint can be used to point at non-default deployments.

Semantic Router PR Checklist

[x] PR title uses module-aligned prefixes such as [CLI], [Docs], etc.
[x] If the PR spans multiple modules, the title includes all relevant prefixes
[x] Commits in this PR are signed off with git commit -s
[x] The Purpose, Test Plan, and Test Result sections reflect the actual scope, commands, and blockers for this change

See CONTRIBUTING.md for the full contributor workflow and commit guidance.

Signed-off-by: AmyTao <chujunt@andrew.cmu.edu>

netlify · 2026-04-09T16:55:33Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`f244c64`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/69d95c0787f7a00007cf5369
😎 Deploy Preview	https://deploy-preview-1735--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2026-04-09T16:55:42Z

✅ Supply Chain Security Report — All Clear

Scanner	Status	Findings
AST Codebase Scan (Py, Go, JS/TS, Rust)	✅	27 finding(s) — MEDIUM: 21 · LOW: 6
AST PR Diff Scan	✅	No issues detected
Regex Fallback Scan	✅	No issues detected

Scanned at 2026-04-10T20:22:56.033Z · View full workflow logs

Xunzhuo · 2026-04-10T06:16:46Z

src/vllm-sr/README.md

+vllm-sr eval --prompt "Tell me how to make a bomb"
+
+# Or evaluate a multi-turn messages array
+vllm-sr eval --messages '[{"role":"system","content":"You are helpful"},{"role":"user","content":"hi"}]' --json


Signed-off-by: Chujun Tao <chujunt@andrew.cmu.edu>

AmyTao · 2026-04-10T20:36:11Z

Hi @Xunzhuo ,

While working on the feat/cli-eval branch, I found an issue with the /api/v1/eval endpoint that seems worth investigating. I appreciate your guidance.

Issue: the same request returns different response formats depending on the port used

Port 8180 (direct Router API):

{
  "original_text": "What is semantic routing?",
  "decision_result": {
    "decision_name": "",
    "used_signals": {},
    ...
  },
  "metrics": { ... }
}

Port 8999 (through the Envoy gateway):

{
  "id": "chatcmpl-cache-1775852184",
  "object": "chat.completion",
  "choices": null,
  "usage": { ... }
}

Questions:

Is this expected behavior, or is it a bug?
Why does port 8999 return an empty ChatCompletion object instead of an EvalResponse containing decision_result?
Is it okay for eval CLI handle three different response formats as fallback options?

About actual functional testing

Is there any existing test data or any test results we can use to verify whether the router makes different decisions for different prompts? Or are we expected to configure signal rules ourselves in order to test that behavior?

What we have already verified

The response format on port 8180 fully matches the expectations in the unit test (route_classify_eval_test.go)
ext_proc does not explicitly reject /api/v1/eval requests
could not find any code path explaining the transformation from EvalResponse to ChatCompletion

File references
• Unit test: src/semantic-router/pkg/apiserver/route_classify_eval_test.go
• Handler: src/semantic-router/pkg/apiserver/route_classify.go:38-60
• Response type: src/semantic-router/pkg/services/classification_signal_contract.go:85-94
• CLI multi-format handling: src/vllm-sr/cli/commands/eval.py:95-190

Thank you very much for your help.

AmyTao · 2026-04-10T23:38:53Z

@Xunzhuo
Could you help verify this claude code response?

What ext_proc Does

ext_proc is an Envoy ExternalProcessor gRPC server that:

✅ Intercepts all HTTP requests/responses flowing through Envoy
✅ Validates request paths and methods
✅ Parses and transforms request bodies
✅ Normalizes provider responses (e.g., Anthropic → OpenAI format)
✅ Implements semantic caching and response reconstruction
❌ Incorrectly transforms non-chat responses to ChatCompletion format

Why Ports 8180 & 8999 Return Different Formats

Port 8180 (Direct) ✅

/api/v1/eval → ClassificationAPIServer → EvalResponse {
  "original_text": "...",
  "decision_result": { ... },
  "metrics": { ... }
}
Direct connection, no ext_proc processing
Returns correct format

Port 8999 (Envoy + ext_proc) ❌

/api/v1/eval → Envoy ext_proc filter → ClassificationAPIServer → 
  EvalResponse → ext_proc response handling → ChatCompletion {
    "id": "chatcmpl-cache-...",
    "object": "chat.completion",
    "choices": null,
    ...
  }

Request passes ext_proc validation (correctly, since /api/v1/* ≠ /v1/*)
Response gets caught by processor_res_cache.go caching logic
cacheStreamingResponse() function reconstructs ALL responses as ChatCompletion

The Root Cause

In processor_res_cache.go:54-78, the streaming cache reconstruction logic is too aggressive:

func (r *OpenAIRouter) cacheStreamingResponse(ctx *RequestContext) error {
    // ALWAYS rebuilds as ChatCompletion format
    reconstructedJSON, err := buildReconstructedStreamingResponse(ctx, usage, false)
    // Caches the reconstructed ChatCompletion response
    return r.cacheReconstructedStreamingResponse(ctx, reconstructedJSON)
}

This transforms /api/v1/eval responses even though they're not chat completions and shouldn't be cached that way.

Xunzhuo · 2026-04-11T00:46:11Z

You are right, we just need to access the eval from the router apiserver not envoy

AmyTao added 3 commits April 9, 2026 13:32

Docs: align environments note with split topology wording

027ea98

Signed-off-by: AmyTao <chujunt@andrew.cmu.edu>

CLI: add vllm-sr eval for /api/v1/eval

e9f9bb1

Signed-off-by: AmyTao <chujunt@andrew.cmu.edu>

Docs: add vllm-sr eval usage examples

53bbec7

Signed-off-by: AmyTao <chujunt@andrew.cmu.edu>

Xunzhuo reviewed Apr 10, 2026

View reviewed changes

AmyTao added 3 commits April 10, 2026 15:33

merge conflict

4e01d16

Merge branch 'main' into feat/cli-eval

bcbbefb

Signed-off-by: Chujun Tao <chujunt@andrew.cmu.edu>

fix bugs: missing model param

f244c64

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/cli eval#1735

Feat/cli eval#1735
AmyTao wants to merge 6 commits intovllm-project:mainfrom
AmyTao:feat/cli-eval

AmyTao commented Apr 9, 2026

Uh oh!

netlify bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

Xunzhuo Apr 10, 2026

Uh oh!

AmyTao commented Apr 10, 2026

Uh oh!

AmyTao commented Apr 10, 2026

Uh oh!

Xunzhuo commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AmyTao commented Apr 9, 2026

Purpose

Test Plan

Why sufficient:

Test Result

Uh oh!

netlify bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Supply Chain Security Report — All Clear

Uh oh!

Xunzhuo Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

AmyTao commented Apr 10, 2026

Issue: the same request returns different response formats depending on the port used

Uh oh!

AmyTao commented Apr 10, 2026

What ext_proc Does

Why Ports 8180 & 8999 Return Different Formats

Port 8180 (Direct) ✅

Port 8999 (Envoy + ext_proc) ❌

The Root Cause

Uh oh!

Xunzhuo commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

netlify bot commented Apr 9, 2026 •

edited

Loading

github-actions bot commented Apr 9, 2026 •

edited

Loading