Optimized request prefill error messages by learner0810 · Pull Request #652 · llm-d/llm-d-inference-scheduler

learner0810 · 2026-02-25T06:48:23Z

The complete error message should be returned to the upstream rather than just an HTTP code.

Before

~# curl -s http://inference-gateway-istio.lzj.svc.cluster.local:80/v1/chat/completions       -H "Content-Type: application/json"       -d '{
                "model": "Qwen/Qwen3-0.6B",
                "messages": [
                        {
                                "role": "user",
                                "content": [
                                        {
                                                "type": "text",
                                                "text": "Describe this image in one sentence."
                                        },
                                        {
                                                "type": "image_url",
                                                "image_url": {
                                                        "url": "https://cdn2.thecatapi.com/images/dui.jpg"
                                                }
                                        }
                                ]
                        }
                ]
        }'|jq .

Now

curl -s http://inference-gateway-istio.lzj.svc.cluster.local:80/v1/chat/completions       -H "Content-Type: application/json"       -d '{
                "model": "Qwen/Qwen3-0.6B",
                "messages": [
                        {
                                "role": "user",
                                "content": [
                                        {
                                                "type": "text",
                                                "text": "Describe this image in one sentence."
                                        },
                                        {
                                                "type": "image_url",
                                                "image_url": {
                                                        "url": "https://cdn2.thecatapi.com/images/dui.jpg"
                                                }
                                        }
                                ]
                        }
                ]
        }'|jq .
{
  "error": {
    "code": 400,
    "message": "/model-cache/Qwen3-0.6B is not a multimodal model None",
    "param": null,
    "type": "BadRequestError"
  }
}

Copilot

Pull request overview

This PR enhances error handling in the NIXL Protocol V2 connector by returning complete error messages and headers to upstream clients when prefill requests fail, rather than just HTTP status codes.

Changes:

Added header forwarding from prefill error responses to client responses
Added response body writing to return detailed error messages
Implemented proper error handling for write failures with logging

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

elevran · 2026-02-25T08:12:29Z

pkg/sidecar/proxy/connector_nixlv2.go


 	if isHTTPError(pw.statusCode) {
-		s.logger.Error(err, "request failed", "code", pw.statusCode)
+		s.logger.Error(err, "request failed", "code", pw.statusCode, "body", pw.buffer.String())


The prefill request can fail for multiple reasons and returning an error here might not be optimal).

What do you think about approaching this in the following way instead:

ignore prefill on failure (and not return its error to the caller) and

process the request with the decode node (removing the KV cache transfer or any other P related artifacts from the request)?

Since P/D disaggregation is an optimization, this would sidestep any prefiller issues and (might?) still return a response to the user.
I think this would be a safer way to handle prefill errors/failures (e.g., when only the prefiller does not run the base model but the decoder does).

I've adjusted some of the logic. Please take another look. If it's a client-side error, there's no need to forward the request to decode. In other cases, forward the request to decode.

A better overall solution might entail that we send multiple prefill targets in the "prefill header", if onbe fails we try another prefill node.

This is apparently very true with E/P/D, where the decode nodes are not allowed to do prefill.

@robertgshaw2-redhat

In the specific case I was looking at, the error will occur in every pod because the model name was incorrect in my input.

I think we should take a simplicity approach in error handling at the current stage of the feature

elevran · 2026-02-25T08:14:39Z

pkg/sidecar/proxy/connector_nixlv2.go

+		}
 		return
 	}
 	prefillSpan.End()


unrelated to this PR:
if there is an error, is the prefillerSpan reported and does it carry any valid and useful information back?
We might want to have the span close as error for observability.

/cc: @sallyom

pkg/sidecar/proxy/connector_nixlv2.go

cmd/pd-sidecar/main.go

Dockerfile.sidecar

Signed-off-by: learner0810 <zhongjun.li@daocloud.io>

Copilot AI review requested due to automatic review settings February 25, 2026 06:48

github-project-automation bot added this to llm-d-inference-scheduler Feb 25, 2026

github-actions bot requested review from nilig and shmuelk February 25, 2026 06:48

Copilot started reviewing on behalf of learner0810 February 25, 2026 06:48 View session

learner0810 force-pushed the optimized-request-prefill-error-messages branch from 1e8e153 to 67fe70c Compare February 25, 2026 06:49

Copilot AI reviewed Feb 25, 2026

View reviewed changes

learner0810 force-pushed the optimized-request-prefill-error-messages branch from 67fe70c to a7bd2fd Compare February 25, 2026 07:15

elevran reviewed Feb 25, 2026

View reviewed changes

learner0810 force-pushed the optimized-request-prefill-error-messages branch from a7bd2fd to 1cdb26c Compare February 25, 2026 09:34

elevran reviewed Feb 25, 2026

View reviewed changes

pkg/sidecar/proxy/connector_nixlv2.go Show resolved Hide resolved

learner0810 force-pushed the optimized-request-prefill-error-messages branch from 1cdb26c to dc578a0 Compare February 26, 2026 02:08

elevran reviewed Feb 26, 2026

View reviewed changes

cmd/pd-sidecar/main.go Show resolved Hide resolved

elevran reviewed Feb 26, 2026

View reviewed changes

Dockerfile.sidecar Outdated Show resolved Hide resolved

Optimized request prefill error messages

8daeff3

Signed-off-by: learner0810 <zhongjun.li@daocloud.io>

learner0810 force-pushed the optimized-request-prefill-error-messages branch from dc578a0 to 8daeff3 Compare February 27, 2026 02:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized request prefill error messages#652

Optimized request prefill error messages#652
learner0810 wants to merge 1 commit intollm-d:mainfrom
learner0810:optimized-request-prefill-error-messages

learner0810 commented Feb 25, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

elevran Feb 25, 2026

Uh oh!

learner0810 Feb 25, 2026

Uh oh!

shmuelk Feb 25, 2026

Uh oh!

robertgshaw2-redhat Feb 26, 2026

Uh oh!

elevran Feb 25, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

learner0810 commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

elevran Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

learner0810 Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

shmuelk Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

robertgshaw2-redhat Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

elevran Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

learner0810 commented Feb 25, 2026 •

edited

Loading