-
Notifications
You must be signed in to change notification settings - Fork 239
Feat (vLLM): initial export support #1444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
4e3e36a to
7d4a78c
Compare
3259272 to
b9ae23a
Compare
| torch>=2.4 | ||
| tqdm | ||
| transformers[sentencepiece]<5.0 | ||
| vllm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like vLLM should be an optional dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can do it in a similar way to what we did for lighteval/lm_eval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaving it for now so that test run and I can see what other things I'm breaking in the process, but I'll remove before this PR is merged
Reason for this PR
Initial support for vLLM export.
To do:
Changes Made in this PR
We are re-using the inference quantizers also for vLLM.
This is still fake-quantization style, but should be faster than plain torch execution, even in eager mode.
The same template could be easily extended to support real quantization, torch.compile, etc. etc.
Testing Summary
TBD