Skip to content

[Bug][P/D] Request Validation Improper Handling #637

@robertgshaw2-redhat

Description

@robertgshaw2-redhat
  • I think I found a bug in our routing sidecar + overall P/D implementation. I was sending a request with an incorrect model name [accidentally] --- the actual model name is deepseek-ai/DeepSeek-R1-DATE
robertgshaw@Roberts-MacBook-Pro wide-ep-lws % curl -X POST -vvv http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-R1",
    "messages": [
      {
        "role": "user",
        "content": "Write a detailed technical explanation of how prefix cache aware routing improves throughput in a disaggregated prefill/decode LLM architecture. Include discussion of KV locality, expert parallel implications, and memory bandwidth."
      }
    ],
    "max_tokens": 1024,
    "temperature": 0.2,
    "top_p": 0.95
  }'
  • This was manifesting as the following on the client side:
* Connected to localhost (::1) port 8080
> POST /v1/chat/completions HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 429
> 
* upload completely sent off: 429 bytes
< HTTP/1.1 404 Not Found
< x-envoy-upstream-service-time: 7
< x-went-into-resp-headers: true
< date: Thu, 19 Feb 2026 18:05:33 GMT
< server: istio-envoy
< content-length: 0
< 
* Connection #0 to host localhost left intact
  • I get an error in the routing proxy (I think because the request to the P worker does not succeed).
{"level":"error","ts":"2026-02-19T18:05:33Z","logger":"proxy server on port 8000","msg":"request failed","code":404,"stacktrace":"github.com/llm-d/llm-d-inference-scheduler/pkg/sidecar/proxy.(*Server).runNIXLProtocolV2\n\t/workspace/pkg/sidecar/proxy/connector_nixlv2.go:111\ngithub.com/llm-d/llm-d-inference-scheduler/pkg/sidecar/proxy.(*Server).chatCompletionsHandler\n\t/workspace/pkg/sidecar/proxy/chat_completions.go:78\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2294\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2822\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3301\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2102"}

I think we need to have a better way to handle these validation errors in a P/D deployment as these should not return 404 status to the user

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a triage label and requires one.

    Type

    No type

    Projects

    Status

    In review

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions