Skip to content

Feat/cli eval#1735

Draft
AmyTao wants to merge 6 commits intovllm-project:mainfrom
AmyTao:feat/cli-eval
Draft

Feat/cli eval#1735
AmyTao wants to merge 6 commits intovllm-project:mainfrom
AmyTao:feat/cli-eval

Conversation

@AmyTao
Copy link
Copy Markdown

@AmyTao AmyTao commented Apr 9, 2026

PLEASE FILL IN THE PR DESCRIPTION BELOW AND CONFIRM THE CHECKLIST ITEMS.

Closes #1725

Purpose

What does this PR change?

  • Adds a vllm-sr eval CLI command that calls the router evaluation endpoint POST /api/v1/eval and prints the structured signal evaluation result.
  • Supports --prompt and --messages (OpenAI-style JSON array string), with --json for full payload output and --endpoint override.
  • Adds unit tests and a short usage example in src/vllm-sr/README.md.

Why is this change needed?

  • Provides a fast developer workflow to inspect which signals fire for a prompt without crafting HTTP requests or using the dashboard evaluation flow.

Which module(s) does this affect?
CLI, Docs

Test Plan

Unit tests:
cd src/vllm-sr && python -m pip install -e '.[dev]'
cd src/vllm-sr && pytest -q

Agent lint gate (changed files):
make agent-lint CHANGED_FILES="src/vllm-sr/cli/main.py,src/vllm-sr/cli/commands/eval.py,src/vllm-sr/tests/test_eval_command.py,src/vllm-sr/README.md,docs/agent/environments.md"

Manual (requires router running):
vllm-sr eval --prompt "hello"
vllm-sr eval --messages '[{"role":"user","content":"hello"}]' --json
(Optional) vllm-sr eval --prompt "hello" --endpoint http://localhost:8080

Why sufficient:

The change is isolated to the Python CLI surface; unit tests cover request encoding, endpoint resolution, and error handling, and the agent-lint gate validates repo lint/structure expectations for touched files.

Test Result

src/vllm-sr pytest: pass (231 tests).
make agent-lint (with the changed-files list above): pass.
Manual validation: expected to work when the router API is reachable at the configured endpoint; --endpoint can be used to point at non-default deployments.


Semantic Router PR Checklist

[x] PR title uses module-aligned prefixes such as [CLI], [Docs], etc.
[x] If the PR spans multiple modules, the title includes all relevant prefixes
[x] Commits in this PR are signed off with git commit -s
[x] The Purpose, Test Plan, and Test Result sections reflect the actual scope, commands, and blockers for this change

See CONTRIBUTING.md for the full contributor workflow and commit guidance.

AmyTao added 3 commits April 9, 2026 13:32
Signed-off-by: AmyTao <chujunt@andrew.cmu.edu>
Signed-off-by: AmyTao <chujunt@andrew.cmu.edu>
Signed-off-by: AmyTao <chujunt@andrew.cmu.edu>
@netlify
Copy link
Copy Markdown

netlify bot commented Apr 9, 2026

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit f244c64
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/69d95c0787f7a00007cf5369
😎 Deploy Preview https://deploy-preview-1735--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

✅ Supply Chain Security Report — All Clear

Scanner Status Findings
AST Codebase Scan (Py, Go, JS/TS, Rust) 27 finding(s) — MEDIUM: 21 · LOW: 6
AST PR Diff Scan No issues detected
Regex Fallback Scan No issues detected

Scanned at 2026-04-10T20:22:56.033Z · View full workflow logs

vllm-sr eval --prompt "Tell me how to make a bomb"

# Or evaluate a multi-turn messages array
vllm-sr eval --messages '[{"role":"system","content":"You are helpful"},{"role":"user","content":"hi"}]' --json
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!

@AmyTao
Copy link
Copy Markdown
Author

AmyTao commented Apr 10, 2026

Hi @Xunzhuo ,

While working on the feat/cli-eval branch, I found an issue with the /api/v1/eval endpoint that seems worth investigating. I appreciate your guidance.

Issue: the same request returns different response formats depending on the port used

Port 8180 (direct Router API):

{
  "original_text": "What is semantic routing?",
  "decision_result": {
    "decision_name": "",
    "used_signals": {},
    ...
  },
  "metrics": { ... }
}

Port 8999 (through the Envoy gateway):

{
  "id": "chatcmpl-cache-1775852184",
  "object": "chat.completion",
  "choices": null,
  "usage": { ... }
}

Questions:

  • Is this expected behavior, or is it a bug?

  • Why does port 8999 return an empty ChatCompletion object instead of an EvalResponse containing decision_result?

  • Is it okay for eval CLI handle three different response formats as fallback options?

About actual functional testing

  • Is there any existing test data or any test results we can use to verify whether the router makes different decisions for different prompts? Or are we expected to configure signal rules ourselves in order to test that behavior?

What we have already verified

  • The response format on port 8180 fully matches the expectations in the unit test (route_classify_eval_test.go)
  • ext_proc does not explicitly reject /api/v1/eval requests
  • could not find any code path explaining the transformation from EvalResponse to ChatCompletion

File references
• Unit test: src/semantic-router/pkg/apiserver/route_classify_eval_test.go
• Handler: src/semantic-router/pkg/apiserver/route_classify.go:38-60
• Response type: src/semantic-router/pkg/services/classification_signal_contract.go:85-94
• CLI multi-format handling: src/vllm-sr/cli/commands/eval.py:95-190

Thank you very much for your help.

@AmyTao
Copy link
Copy Markdown
Author

AmyTao commented Apr 10, 2026

@Xunzhuo
Could you help verify this claude code response?

What ext_proc Does

ext_proc is an Envoy ExternalProcessor gRPC server that:

✅ Intercepts all HTTP requests/responses flowing through Envoy
✅ Validates request paths and methods
✅ Parses and transforms request bodies
✅ Normalizes provider responses (e.g., Anthropic → OpenAI format)
✅ Implements semantic caching and response reconstruction
❌ Incorrectly transforms non-chat responses to ChatCompletion format

Why Ports 8180 & 8999 Return Different Formats

Port 8180 (Direct) ✅

/api/v1/eval → ClassificationAPIServer → EvalResponse {
  "original_text": "...",
  "decision_result": { ... },
  "metrics": { ... }
}
Direct connection, no ext_proc processing
Returns correct format

Port 8999 (Envoy + ext_proc) ❌

/api/v1/eval → Envoy ext_proc filter → ClassificationAPIServer → 
  EvalResponse → ext_proc response handling → ChatCompletion {
    "id": "chatcmpl-cache-...",
    "object": "chat.completion",
    "choices": null,
    ...
  }

Request passes ext_proc validation (correctly, since /api/v1/* ≠ /v1/*)
Response gets caught by processor_res_cache.go caching logic
cacheStreamingResponse() function reconstructs ALL responses as ChatCompletion

The Root Cause

In processor_res_cache.go:54-78, the streaming cache reconstruction logic is too aggressive:

func (r *OpenAIRouter) cacheStreamingResponse(ctx *RequestContext) error {
    // ALWAYS rebuilds as ChatCompletion format
    reconstructedJSON, err := buildReconstructedStreamingResponse(ctx, usage, false)
    // Caches the reconstructed ChatCompletion response
    return r.cacheReconstructedStreamingResponse(ctx, reconstructedJSON)
}

This transforms /api/v1/eval responses even though they're not chat completions and shouldn't be cached that way.

@Xunzhuo
Copy link
Copy Markdown
Member

Xunzhuo commented Apr 11, 2026

You are right, we just need to access the eval from the router apiserver not envoy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: add vllm-sr eval command for router /api/v1/eval prompt checks

2 participants