update benchmarking guide with latest results with vllm v1 #559

kaushikmitr · 2025-03-21T18:10:50Z

No description provided.

k8s-ci-robot · 2025-03-21T18:10:55Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kaushikmitr
Once this PR has been reviewed and has the lgtm label, please assign sergeykanzhelev for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-03-21T18:10:59Z

Hi @kaushikmitr. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

netlify · 2025-03-21T18:11:07Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`0e714fe`
🔍 Latest deploy log	https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67ddb61bc292c60008c07cb1
😎 Deploy Preview	https://deploy-preview-559--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

ahg-g · 2025-03-21T18:12:28Z

@danehans fyi

liu-cong · 2025-03-21T18:29:48Z

Can you add the vllm deployment yaml used in this benchmark as well?

And also the output json files as an example people should expect from the benchmark

danehans · 2025-03-21T18:30:10Z

site-src/performance/benchmark/index.md

@@ -93,6 +93,6 @@ This guide shows how to run the jupyter notebook using vscode.
    ```

 1. Open the notebook `./tools/benchmark/benchmark.ipynb`, and run each cell. At the end you should
-    see a bar chart like below:
+    see a bar chart like below where **"ie"** represents inference extension. This chart is generated using this benchmarking tool with 10 vLLM (v1) model servers (H100 80 GB) and the ShareGPT dataset.


"inference extension" is the name of the project and not a specific extension. I assume the test was conducted with the Endpoint Selector Extension (ESE)? If so, s/represents inference extension/represents the endpoint selector inference extension/

yes that is correct

liu-cong · 2025-03-21T18:37:28Z

site-src/performance/benchmark/index.md

I just noticed a type in Line 48. Can you update it? I was gonna send a PR but since you are touching this file, might just fix it. Thank you!

This is the updated command:

SVC_IP=$(kubectl get service/vllm-llama2-7b -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

smarterclayton · 2025-03-21T19:01:00Z

site-src/performance/benchmark/index.md

@@ -93,6 +93,6 @@ This guide shows how to run the jupyter notebook using vscode.
    ```

 1. Open the notebook `./tools/benchmark/benchmark.ipynb`, and run each cell. At the end you should
-    see a bar chart like below:
+    see a bar chart like below where **"ie"** represents inference extension. This chart is generated using this benchmarking tool with 10 vLLM (v1) model servers (H100 80 GB), llama2-7b and the ShareGPT dataset.


Link to the source of the share GPT dataset.

BTW this is the link https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json

Link to the reference page https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered

EDIT: nm, I thought this was a file download page, not the HF details page. Link is fine.

Also, discuss why we chose the cleaned one vs the raw since it's not obvious to a casual glance, and if we use "ShareGPT" as a shortcut description that is not entirely accurate.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 21, 2025

k8s-ci-robot requested review from Jeffwan and liu-cong March 21, 2025 18:10

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 21, 2025

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 21, 2025

kaushikmitr force-pushed the main branch from 58c5e22 to 2e62afb Compare March 21, 2025 18:14

danehans reviewed Mar 21, 2025

View reviewed changes

liu-cong reviewed Mar 21, 2025

View reviewed changes

kaushikmitr force-pushed the main branch from 2e62afb to a29028a Compare March 21, 2025 18:52

update benchmarking guide with latest results with vllm v1

0e714fe

kaushikmitr force-pushed the main branch from a29028a to 0e714fe Compare March 21, 2025 18:55

smarterclayton reviewed Mar 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update benchmarking guide with latest results with vllm v1 #559

update benchmarking guide with latest results with vllm v1 #559

kaushikmitr commented Mar 21, 2025

k8s-ci-robot commented Mar 21, 2025

k8s-ci-robot commented Mar 21, 2025

netlify bot commented Mar 21, 2025 •

edited

Loading

ahg-g commented Mar 21, 2025

liu-cong commented Mar 21, 2025 •

edited

Loading

danehans Mar 21, 2025

kaushikmitr Mar 21, 2025

liu-cong Mar 21, 2025

kaushikmitr Mar 21, 2025

smarterclayton Mar 21, 2025

liu-cong Mar 21, 2025

smarterclayton Mar 24, 2025 •

edited

Loading

smarterclayton Mar 24, 2025 •

edited

Loading

update benchmarking guide with latest results with vllm v1 #559

Are you sure you want to change the base?

update benchmarking guide with latest results with vllm v1 #559

Conversation

kaushikmitr commented Mar 21, 2025

k8s-ci-robot commented Mar 21, 2025

k8s-ci-robot commented Mar 21, 2025

netlify bot commented Mar 21, 2025 • edited Loading

✅ Deploy Preview for gateway-api-inference-extension ready!

ahg-g commented Mar 21, 2025

liu-cong commented Mar 21, 2025 • edited Loading

danehans Mar 21, 2025

Choose a reason for hiding this comment

kaushikmitr Mar 21, 2025

Choose a reason for hiding this comment

liu-cong Mar 21, 2025

Choose a reason for hiding this comment

kaushikmitr Mar 21, 2025

Choose a reason for hiding this comment

smarterclayton Mar 21, 2025

Choose a reason for hiding this comment

liu-cong Mar 21, 2025

Choose a reason for hiding this comment

smarterclayton Mar 24, 2025 • edited Loading

Choose a reason for hiding this comment

smarterclayton Mar 24, 2025 • edited Loading

Choose a reason for hiding this comment

netlify bot commented Mar 21, 2025 •

edited

Loading

liu-cong commented Mar 21, 2025 •

edited

Loading

smarterclayton Mar 24, 2025 •

edited

Loading

smarterclayton Mar 24, 2025 •

edited

Loading