-
Notifications
You must be signed in to change notification settings - Fork 132
Open
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a triage label and requires one.Indicates an issue or PR lacks a triage label and requires one.
Milestone
Description
- I think I found a bug in our routing sidecar + overall P/D implementation. I was sending a request with an incorrect model name [accidentally] --- the actual model name is deepseek-ai/DeepSeek-R1-DATE
robertgshaw@Roberts-MacBook-Pro wide-ep-lws % curl -X POST -vvv http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-R1",
"messages": [
{
"role": "user",
"content": "Write a detailed technical explanation of how prefix cache aware routing improves throughput in a disaggregated prefill/decode LLM architecture. Include discussion of KV locality, expert parallel implications, and memory bandwidth."
}
],
"max_tokens": 1024,
"temperature": 0.2,
"top_p": 0.95
}'
- This was manifesting as the following on the client side:
* Connected to localhost (::1) port 8080
> POST /v1/chat/completions HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 429
>
* upload completely sent off: 429 bytes
< HTTP/1.1 404 Not Found
< x-envoy-upstream-service-time: 7
< x-went-into-resp-headers: true
< date: Thu, 19 Feb 2026 18:05:33 GMT
< server: istio-envoy
< content-length: 0
<
* Connection #0 to host localhost left intact
- I get an error in the routing proxy (I think because the request to the P worker does not succeed).
{"level":"error","ts":"2026-02-19T18:05:33Z","logger":"proxy server on port 8000","msg":"request failed","code":404,"stacktrace":"github.com/llm-d/llm-d-inference-scheduler/pkg/sidecar/proxy.(*Server).runNIXLProtocolV2\n\t/workspace/pkg/sidecar/proxy/connector_nixlv2.go:111\ngithub.com/llm-d/llm-d-inference-scheduler/pkg/sidecar/proxy.(*Server).chatCompletionsHandler\n\t/workspace/pkg/sidecar/proxy/chat_completions.go:78\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2294\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2822\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3301\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2102"}
I think we need to have a better way to handle these validation errors in a P/D deployment as these should not return 404 status to the user
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a triage label and requires one.Indicates an issue or PR lacks a triage label and requires one.
Type
Projects
Status
In review