Conversation
Signed-off-by: AmyTao <chujunt@andrew.cmu.edu>
Signed-off-by: AmyTao <chujunt@andrew.cmu.edu>
Signed-off-by: AmyTao <chujunt@andrew.cmu.edu>
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Supply Chain Security Report — All Clear
Scanned at |
| vllm-sr eval --prompt "Tell me how to make a bomb" | ||
|
|
||
| # Or evaluate a multi-turn messages array | ||
| vllm-sr eval --messages '[{"role":"system","content":"You are helpful"},{"role":"user","content":"hi"}]' --json |
Signed-off-by: Chujun Tao <chujunt@andrew.cmu.edu>
|
Hi @Xunzhuo , While working on the feat/cli-eval branch, I found an issue with the /api/v1/eval endpoint that seems worth investigating. I appreciate your guidance. Issue: the same request returns different response formats depending on the port usedPort 8180 (direct Router API): Port 8999 (through the Envoy gateway): Questions:
About actual functional testing
What we have already verified
File references Thank you very much for your help. |
|
@Xunzhuo What ext_proc Doesext_proc is an Envoy ExternalProcessor gRPC server that: ✅ Intercepts all HTTP requests/responses flowing through Envoy Why Ports 8180 & 8999 Return Different FormatsPort 8180 (Direct) ✅Port 8999 (Envoy + ext_proc) ❌Request passes ext_proc validation (correctly, since /api/v1/* ≠ /v1/*) The Root CauseIn processor_res_cache.go:54-78, the streaming cache reconstruction logic is too aggressive: This transforms /api/v1/eval responses even though they're not chat completions and shouldn't be cached that way. |
|
You are right, we just need to access the eval from the router apiserver not envoy |
PLEASE FILL IN THE PR DESCRIPTION BELOW AND CONFIRM THE CHECKLIST ITEMS.
Closes #1725
Purpose
What does this PR change?
Why is this change needed?
Which module(s) does this affect?
CLI, Docs
Test Plan
Unit tests:
cd src/vllm-sr && python -m pip install -e '.[dev]'
cd src/vllm-sr && pytest -q
Agent lint gate (changed files):
make agent-lint CHANGED_FILES="src/vllm-sr/cli/main.py,src/vllm-sr/cli/commands/eval.py,src/vllm-sr/tests/test_eval_command.py,src/vllm-sr/README.md,docs/agent/environments.md"
Manual (requires router running):
vllm-sr eval --prompt "hello"
vllm-sr eval --messages '[{"role":"user","content":"hello"}]' --json
(Optional) vllm-sr eval --prompt "hello" --endpoint http://localhost:8080
Why sufficient:
The change is isolated to the Python CLI surface; unit tests cover request encoding, endpoint resolution, and error handling, and the agent-lint gate validates repo lint/structure expectations for touched files.
Test Result
src/vllm-sr pytest: pass (231 tests).
make agent-lint (with the changed-files list above): pass.
Manual validation: expected to work when the router API is reachable at the configured endpoint; --endpoint can be used to point at non-default deployments.
Semantic Router PR Checklist
[x] PR title uses module-aligned prefixes such as [CLI], [Docs], etc.
[x] If the PR spans multiple modules, the title includes all relevant prefixes
[x] Commits in this PR are signed off with git commit -s
[x] The Purpose, Test Plan, and Test Result sections reflect the actual scope, commands, and blockers for this change
See CONTRIBUTING.md for the full contributor workflow and commit guidance.