Skip to content

Optimized request prefill error messages#652

Open
learner0810 wants to merge 1 commit intollm-d:mainfrom
learner0810:optimized-request-prefill-error-messages
Open

Optimized request prefill error messages#652
learner0810 wants to merge 1 commit intollm-d:mainfrom
learner0810:optimized-request-prefill-error-messages

Conversation

@learner0810
Copy link
Contributor

@learner0810 learner0810 commented Feb 25, 2026

fix: #637

The complete error message should be returned to the upstream rather than just an HTTP code.

  • Before
~# curl -s http://inference-gateway-istio.lzj.svc.cluster.local:80/v1/chat/completions       -H "Content-Type: application/json"       -d '{
                "model": "Qwen/Qwen3-0.6B",
                "messages": [
                        {
                                "role": "user",
                                "content": [
                                        {
                                                "type": "text",
                                                "text": "Describe this image in one sentence."
                                        },
                                        {
                                                "type": "image_url",
                                                "image_url": {
                                                        "url": "https://cdn2.thecatapi.com/images/dui.jpg"
                                                }
                                        }
                                ]
                        }
                ]
        }'|jq .
  • Now
curl -s http://inference-gateway-istio.lzj.svc.cluster.local:80/v1/chat/completions       -H "Content-Type: application/json"       -d '{
                "model": "Qwen/Qwen3-0.6B",
                "messages": [
                        {
                                "role": "user",
                                "content": [
                                        {
                                                "type": "text",
                                                "text": "Describe this image in one sentence."
                                        },
                                        {
                                                "type": "image_url",
                                                "image_url": {
                                                        "url": "https://cdn2.thecatapi.com/images/dui.jpg"
                                                }
                                        }
                                ]
                        }
                ]
        }'|jq .
{
  "error": {
    "code": 400,
    "message": "/model-cache/Qwen3-0.6B is not a multimodal model None",
    "param": null,
    "type": "BadRequestError"
  }
}

Copilot AI review requested due to automatic review settings February 25, 2026 06:48
@github-actions github-actions bot requested review from nilig and shmuelk February 25, 2026 06:48
@learner0810 learner0810 force-pushed the optimized-request-prefill-error-messages branch from 1e8e153 to 67fe70c Compare February 25, 2026 06:49
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances error handling in the NIXL Protocol V2 connector by returning complete error messages and headers to upstream clients when prefill requests fail, rather than just HTTP status codes.

Changes:

  • Added header forwarding from prefill error responses to client responses
  • Added response body writing to return detailed error messages
  • Implemented proper error handling for write failures with logging

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@learner0810 learner0810 force-pushed the optimized-request-prefill-error-messages branch from 67fe70c to a7bd2fd Compare February 25, 2026 07:15

if isHTTPError(pw.statusCode) {
s.logger.Error(err, "request failed", "code", pw.statusCode)
s.logger.Error(err, "request failed", "code", pw.statusCode, "body", pw.buffer.String())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefill request can fail for multiple reasons and returning an error here might not be optimal).

What do you think about approaching this in the following way instead:

  • ignore prefill on failure (and not return its error to the caller) and
  • process the request with the decode node (removing the KV cache transfer or any other P related artifacts from the request)?

Since P/D disaggregation is an optimization, this would sidestep any prefiller issues and (might?) still return a response to the user.
I think this would be a safer way to handle prefill errors/failures (e.g., when only the prefiller does not run the base model but the decoder does).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adjusted some of the logic. Please take another look. If it's a client-side error, there's no need to forward the request to decode. In other cases, forward the request to decode.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better overall solution might entail that we send multiple prefill targets in the "prefill header", if onbe fails we try another prefill node.

This is apparently very true with E/P/D, where the decode nodes are not allowed to do prefill.

@robertgshaw2-redhat

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the specific case I was looking at, the error will occur in every pod because the model name was incorrect in my input.

I think we should take a simplicity approach in error handling at the current stage of the feature

}
return
}
prefillSpan.End()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this PR:
if there is an error, is the prefillerSpan reported and does it carry any valid and useful information back?
We might want to have the span close as error for observability.

/cc: @sallyom

@learner0810 learner0810 force-pushed the optimized-request-prefill-error-messages branch from a7bd2fd to 1cdb26c Compare February 25, 2026 09:34
@learner0810 learner0810 force-pushed the optimized-request-prefill-error-messages branch from 1cdb26c to dc578a0 Compare February 26, 2026 02:08
Signed-off-by: learner0810 <zhongjun.li@daocloud.io>
@learner0810 learner0810 force-pushed the optimized-request-prefill-error-messages branch from dc578a0 to 8daeff3 Compare February 27, 2026 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[Bug][P/D] Request Validation Improper Handling

5 participants