Optimized request prefill error messages#652
Conversation
1e8e153 to
67fe70c
Compare
There was a problem hiding this comment.
Pull request overview
This PR enhances error handling in the NIXL Protocol V2 connector by returning complete error messages and headers to upstream clients when prefill requests fail, rather than just HTTP status codes.
Changes:
- Added header forwarding from prefill error responses to client responses
- Added response body writing to return detailed error messages
- Implemented proper error handling for write failures with logging
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
67fe70c to
a7bd2fd
Compare
|
|
||
| if isHTTPError(pw.statusCode) { | ||
| s.logger.Error(err, "request failed", "code", pw.statusCode) | ||
| s.logger.Error(err, "request failed", "code", pw.statusCode, "body", pw.buffer.String()) |
There was a problem hiding this comment.
The prefill request can fail for multiple reasons and returning an error here might not be optimal).
What do you think about approaching this in the following way instead:
- ignore prefill on failure (and not return its error to the caller) and
- process the request with the decode node (removing the KV cache transfer or any other P related artifacts from the request)?
Since P/D disaggregation is an optimization, this would sidestep any prefiller issues and (might?) still return a response to the user.
I think this would be a safer way to handle prefill errors/failures (e.g., when only the prefiller does not run the base model but the decoder does).
There was a problem hiding this comment.
I've adjusted some of the logic. Please take another look. If it's a client-side error, there's no need to forward the request to decode. In other cases, forward the request to decode.
There was a problem hiding this comment.
A better overall solution might entail that we send multiple prefill targets in the "prefill header", if onbe fails we try another prefill node.
This is apparently very true with E/P/D, where the decode nodes are not allowed to do prefill.
There was a problem hiding this comment.
In the specific case I was looking at, the error will occur in every pod because the model name was incorrect in my input.
I think we should take a simplicity approach in error handling at the current stage of the feature
| } | ||
| return | ||
| } | ||
| prefillSpan.End() |
There was a problem hiding this comment.
unrelated to this PR:
if there is an error, is the prefillerSpan reported and does it carry any valid and useful information back?
We might want to have the span close as error for observability.
/cc: @sallyom
a7bd2fd to
1cdb26c
Compare
1cdb26c to
dc578a0
Compare
Signed-off-by: learner0810 <zhongjun.li@daocloud.io>
dc578a0 to
8daeff3
Compare
fix: #637
The complete error message should be returned to the upstream rather than just an HTTP code.