[Model Runner V2] Do not initialize sampler for non-last PP ranks#36824
[Model Runner V2] Do not initialize sampler for non-last PP ranks#36824WoosukKwon merged 7 commits intomainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an optimization to skip the initialization of the Sampler and related classes for non-last pipeline parallel ranks and for pooling models. The changes correctly make the initialization of these components conditional, which avoids unnecessary resource allocation. All usages of these potentially uninitialized components are now properly guarded with conditional checks or assertions, ensuring runtime safety. The related modifications in input_batch.py to handle an optional output_bin_counts tensor are also implemented correctly. The changes are logical, well-contained, and effectively achieve the intended optimization.
|
Hi @WoosukKwon, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
njhill
left a comment
There was a problem hiding this comment.
Nice :)
We've also always done this in V1 .. it had been on my to-do list to fix that too
I think we can similarly conditionally create the pooling_runner in load_model (also no need if not last pp rank)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
| ) | ||
| if self.is_pooling_model: | ||
| if self.is_pooling_model and self.is_last_pp_rank: | ||
| self.pooling_runner = PoolingRunner(self.model) |
There was a problem hiding this comment.
We may still need to get/store the supported tasks here ... since that's queried from the front-end and the executor returns the result from rank 0.
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Skip the initialization of Sampler (and a few sample-related classes) for non-last PP ranks or for pooling models.