Added demo example for vLLM Server and shareGPT datagen component by SachinVarghese · Pull Request #37 · kubernetes-sigs/inference-perf

SachinVarghese · 2025-03-22T14:05:42Z

This PR adds an example notebook to run a vLLM server example
Fixes #35

k8s-ci-robot · 2025-03-22T14:05:48Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SachinVarghese

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [SachinVarghese]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

achandrasekar · 2025-03-25T04:46:58Z

inference_perf/datagen/hf_sharegpt_datagen.py

+                if (
+                    data is None
+                    or data[self.data_key] is None
+                    or len(data[self.data_key]) > self.max_num_turns


I believe we were filtering out conversations with less than 2 turns - looks like this got changed to filtering out conversations > 2 turns.

I had missed that. Fixed in 5f9b3a9

achandrasekar · 2025-03-25T04:59:22Z

inference_perf/loadgen/load_timer.py

        # Given a rate, yield a time to wait before the next request
        while True:
-            next_time += self._rand.uniform(0, 1 / self._rate)
+            next_time += self._rand.exponential(1 / self._rate)


Isn't uniform better for constant load timer? Isn't the main difference between this one and the Poisson one that constant load timer sends requests at uniform intervals between them?

Yes that is the core idea but I was getting incorrect timer results with uniform random function here. Updating to exponentials provide expected results. My recommendation is to use exponential for now and I will revisit this in a separate ticket. As part of a new ticket, I will also add some tests to validate this.

This works for now. But would be good to address in a follow up. From my past experience using this is that it models request rates correctly, but the arrival rate is not uniform within the time interval (second).

Added a ticket to address this here
Please assign this to me.

Signed-off-by: Sachin Varghese <sachin.mathew31@gmail.com>

achandrasekar · 2025-03-25T21:38:40Z

Thanks for putting this out! Having a first e2e demo is great!

/lgtm

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 22, 2025

k8s-ci-robot requested review from Jeffwan and achandrasekar March 22, 2025 14:05

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 22, 2025

SachinVarghese marked this pull request as ready for review March 22, 2025 14:06

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 22, 2025

k8s-ci-robot requested review from ArangoGutierrez and terrytangyuan March 22, 2025 14:06

SachinVarghese changed the title ~~Added demo example~~ Added demo example for vLLM Server and shareGPT datagen component Mar 22, 2025

achandrasekar reviewed Mar 25, 2025

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 25, 2025

SachinVarghese added 4 commits March 25, 2025 08:13

Added demo example

5a15e4b

Signed-off-by: Sachin Varghese <sachin.mathew31@gmail.com>

Share GPT data fixes added

7424b86

Signed-off-by: Sachin Varghese <sachin.mathew31@gmail.com>

lint fixes

480f75a

Signed-off-by: Sachin Varghese <sachin.mathew31@gmail.com>

review fixes

5f9b3a9

Signed-off-by: Sachin Varghese <sachin.mathew31@gmail.com>

SachinVarghese force-pushed the demo branch from 0594de5 to 5f9b3a9 Compare March 25, 2025 12:16

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 25, 2025

k8s-ci-robot assigned achandrasekar Mar 25, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 25, 2025

k8s-ci-robot merged commit 8d2b9b1 into kubernetes-sigs:main Mar 25, 2025
4 checks passed

SachinVarghese deleted the demo branch March 26, 2025 00:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added demo example for vLLM Server and shareGPT datagen component#37

Added demo example for vLLM Server and shareGPT datagen component#37
k8s-ci-robot merged 4 commits intokubernetes-sigs:mainfrom
SachinVarghese:demo

SachinVarghese commented Mar 22, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented Mar 22, 2025

Uh oh!

achandrasekar Mar 25, 2025

Uh oh!

SachinVarghese Mar 25, 2025

Uh oh!

achandrasekar Mar 25, 2025

Uh oh!

SachinVarghese Mar 25, 2025

Uh oh!

achandrasekar Mar 25, 2025

Uh oh!

SachinVarghese Mar 26, 2025

Uh oh!

achandrasekar commented Mar 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SachinVarghese commented Mar 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Mar 22, 2025

Uh oh!

achandrasekar Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

SachinVarghese Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

achandrasekar Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

SachinVarghese Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

achandrasekar Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

SachinVarghese Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

achandrasekar commented Mar 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SachinVarghese commented Mar 22, 2025 •

edited

Loading