-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update benchmarking guide with latest results with vllm v1 #559
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kaushikmitr The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @kaushikmitr. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@danehans fyi |
Can you add the vllm deployment yaml used in this benchmark as well? And also the output json files as an example people should expect from the benchmark |
@@ -93,6 +93,6 @@ This guide shows how to run the jupyter notebook using vscode. | |||
``` | |||
|
|||
1. Open the notebook `./tools/benchmark/benchmark.ipynb`, and run each cell. At the end you should | |||
see a bar chart like below: | |||
see a bar chart like below where **"ie"** represents inference extension. This chart is generated using this benchmarking tool with 10 vLLM (v1) model servers (H100 80 GB) and the ShareGPT dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"inference extension" is the name of the project and not a specific extension. I assume the test was conducted with the Endpoint Selector Extension (ESE)? If so, s/represents inference extension/represents the endpoint selector inference extension/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that is correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed a type in Line 48. Can you update it? I was gonna send a PR but since you are touching this file, might just fix it. Thank you!
This is the updated command:
SVC_IP=$(kubectl get service/vllm-llama2-7b -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
@@ -93,6 +93,6 @@ This guide shows how to run the jupyter notebook using vscode. | |||
``` | |||
|
|||
1. Open the notebook `./tools/benchmark/benchmark.ipynb`, and run each cell. At the end you should | |||
see a bar chart like below: | |||
see a bar chart like below where **"ie"** represents inference extension. This chart is generated using this benchmarking tool with 10 vLLM (v1) model servers (H100 80 GB), llama2-7b and the ShareGPT dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to the source of the share GPT dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to the reference page https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered
EDIT: nm, I thought this was a file download page, not the HF details page. Link is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, discuss why we chose the cleaned one vs the raw since it's not obvious to a casual glance, and if we use "ShareGPT" as a shortcut description that is not entirely accurate.
No description provided.