Track errors through the inference return path#3776
Track errors through the inference return path#3776tdene wants to merge 6 commits intoNVIDIA:mainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
| entry = self.requests[request_id] | ||
| request = entry.record[-1] |
There was a problem hiding this comment.
Should this be
request = self.requests[request_id]
entry = request.record[-1]
There was a problem hiding this comment.
I understand what you mean.
But unfortunately, self.requests is a misnomer. self.requests is a Dict[int, RequestEntry], where each RequestEntry contains a DynamicInferenceRequestRecord, and each DynamicInferenceRequestRecord contains a list[DynamicInferenceRequest].
So if anything, we should be changing the name of self.requests to self.request_entries.
There was a problem hiding this comment.
I see, can you make it this then?
request_entry = self.requests[request_id]
request = request_entry.record[-1]
| # Send the reply immediately, because it may never get a chance to be sent again. | ||
| if self.use_coordinator and self.is_mp_coordinator: | ||
| payload = msgpack.packb( | ||
| [Headers.ENGINE_REPLY.value, [entry.record.serialize()]], use_bin_type=True |
There was a problem hiding this comment.
should entry here and down below be request_entry?
There was a problem hiding this comment.
Ach, my bad. Resolved!
| request.prompt_tokens.tolist() | ||
| ) | ||
| request.generated_text = self.controller.tokenizer.detokenize(request.generated_tokens) | ||
| entry.future.set_result(entry.record) |
There was a problem hiding this comment.
entry -> request_entry 2x
| return self.requests[request_id].record[-1] | ||
|
|
||
| def _handle_failed_request(self, request_id: int): | ||
| """Handle a failed request by sending the reply immediately. |
There was a problem hiding this comment.
what's the reason exactly for needing to return failed requests immediately? you mentioned offline that a failure can prevent the next step from ever running. can you give an example of this?
There was a problem hiding this comment.
If the first set of requests in the engines fail (because they're all too long, for example), the coordinator does not go into the forward step (because there are no active requests), and the whole system hangs.
| finished_request_records.append(failed_entry.record) | ||
| failed_entry.future.set_result(failed_entry.record) | ||
| assert ( | ||
| failed_entry.future.done() |
There was a problem hiding this comment.
isn't there a race condition between: 1) resolving the future in _handle_failed_request(), and 2) this assert failed_entry.future.done()? does anything prevent this from running before the future is resolved?
There was a problem hiding this comment.
There is not, because there's no async yield points in _handle_failed_request or even _add_request. All of _add_request can be considered to run atomically, so the future will be created and resolved before async_bookkeep gets a chance to run.
| if self.rank == 0: | ||
| warnings.warn(f"Request {request_id} failed to be added to the engine due to errors.") | ||
|
|
||
| request.add_event_fail() |
There was a problem hiding this comment.
in async_bookkeep, we used to also set request.status = Status.FAILED. We should update the status.
| request.prompt = self.controller.tokenizer.detokenize( | ||
| request.prompt_tokens.tolist() | ||
| ) | ||
| request.generated_text = self.controller.tokenizer.detokenize(request.generated_tokens) |
There was a problem hiding this comment.
does detokenize() work fine even if generated_tokens is empty?
There was a problem hiding this comment.
It was working in my tests (before the request_entry change :) ), but you're right, there's no guarantee it'll always work for all tokenizers. And there's also no point in doing this tokenization if there's no generated tokens.
Addressed!
|
/claude test |
What does this PR do ?
Contribution process
Pre-checks
Code review
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.
Step 1: Mark PR as "Ready for Review"
.github/CODEOWNERS.Final Review might get declined if these requirements are not fulfilled.
Step 2: Final Review
For PRs that change
megatron/core, once all expert reviewers have approved, theFinal Reviewlabel is applied automatically and final reviewers are assigned.For PRs outside
megatron/core, this step is skipped.Step 3: Approved
Once all required reviewers have approved, the
Approvedlabel is applied automatically.Merge
Any member of mcore-engineers will be able to merge your PR.
For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.