Tekton Orchestrated Benchmarking POC by kalantar · Pull Request #423 · llm-d/llm-d-benchmark

kalantar · 2025-10-08T14:14:19Z

See tekton_poc/README.md

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

sriumcp · 2025-10-08T14:57:20Z

charts/model-download/templates/job.yaml

+          mkdir -p "${MOUNT_PATH}/${MODEL_PATH}";
+          python -m pip install huggingface_hub;
+          hf auth login --token "${HF_TOKEN}";
+          hf download "${HF_MODEL_ID}" --local-dir "/cache/${MODEL_PATH}"


Does this skip the model download if it is already present in the path? I believe this is the entire idea of using the PVC to reuse models locally ... right?

I believe hf download detects whether or not the model has been downloaded. Furthermore mkdir -p accepts folders that already exist, so I think this is good.

hf download is intelligent and will skip the work. In the current implementation, the PVC is created locally and used only once. Each parallel experiment is running in a different namespace. In principle, we can define a PVC to an existing PV that has only once copy of the model. If we do this, we probably want to add a task before the parallelism to download the model.

sriumcp · 2025-10-08T15:01:20Z

charts/harness/Chart.yaml

@@ -0,0 +1,40 @@
+apiVersion: v2


Is this chart needed?

Can we simply use the inference-perf container as part of the StepAction, and configure it so that the step actually directly sends the Inference request?

This approach eliminates the chart, along with the need for a separate harness pod (the Tekton step will run as part of some pod anyway).

The step has now been updated to use the llm-d-benchmark image directly. It isn't as clean as it might be to use the inference-perf image directly. However, it should also be possible to modify it to be easier to use.

tekton-poc/pipeline/experiment-task.yaml

sriumcp · 2025-10-08T15:07:34Z

tekton-poc/pipeline/experiment-task.yaml

+
+        echo "✅ workload completed"
+
+    - name: upload-results


This section needs to be completed.

Let us start by implementing HTTP(S)-based results upload to an S3 compatible storage.

See example here: https://www.ibm.com/docs/en/storage-scale/5.2.1?topic=storage-connectivity-cloud-object

This has been done; the results folder is tarred and uploaded to and s3 compatible bucket.

jgchn · 2025-10-08T15:32:24Z

charts/model-download/templates/job.yaml

+          mkdir -p "${MOUNT_PATH}/${MODEL_PATH}";
+          python -m pip install huggingface_hub;
+          hf auth login --token "${HF_TOKEN}";
+          hf download "${HF_MODEL_ID}" --local-dir "/cache/${MODEL_PATH}"


I believe hf download detects whether or not the model has been downloaded. Furthermore mkdir -p accepts folders that already exist, so I think this is good.

jgchn · 2025-10-08T15:33:09Z

tekton-poc/examples/inference-scheduling/gaie-values.yaml

@@ -0,0 +1,150 @@
+inferenceExtension:


I am guessing the user would be creating these values yaml files for the experiment pipeline?

Today, the pipeline takes the location (url) as input. It can take a stringified values file as well. When sweeping through values, the values file ise overridden using --set.

An alternative is to use a (yaml) description of the desired environment to generate the values files. This seems to assume we can express it more simply than the values files to today. It is not clear to me that this is the case.

tekton-poc/README.md

jgchn · 2025-10-08T15:38:43Z

tekton-poc/README.md

+A _matrix_ based `Task` can be unrolled into multiple tasks to reduce the parallelism.
+The utility script `utility/transform-pr-parallel.py` does this as follows:
+
+1. Unroll a single parameter into one `Task` per value. Each resulting Task defines a matrix over the remaining parameters.


Curious what the "unrolled" output looks like here.

tekton-poc/README.md

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

tekton-poc/README.md

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

jgchn · 2025-10-10T14:15:58Z

tekton-poc/README.md

+
+1. Create a namespace where the Tekton pipeline will execute.
+    ```shell
+    export $NAMESPACE=your_namespace


Suggested change

export $NAMESPACE=your_namespace

export NAMESPACE=your_namespace

jgchn · 2025-10-10T14:28:01Z

tekton-poc/README.md

+    - the namespace (where the PipelineRun executes)
+    - s3 details: secret name, bucket name and endpoint URL
+
+Run by creating the PipelineRun:


This appears as one-liner after rendering. May need to un-indent.

jgchn · 2025-10-10T14:28:14Z

tekton-poc/README.md

+    ```shell
+    kubectl apply -f pipeline/stepactions.yaml


Suggested change

```shell

kubectl apply -f pipeline/stepactions.yaml

```shell

cd tekton-poc

kubectl apply -f pipeline/stepactions.yaml

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

jgchn

Tested and looks good! I was able to execute three experiments in parallel asynchronously, and saw the results uploaded to IBM COS. Lets get this merged!

jgchn · 2025-10-10T18:03:40Z

tekton-poc/README.md

+This proof of concept currently implements a variation of the inference-scheduling [scenairo](https://github.com/llm-d/llm-d-benchmark/blob/main/scenarios/guides/inference-scheduling.sh)/[experiment](https://github.com/llm-d/llm-d-benchmark/blob/main/experiments/inference-scheduling.yaml).
+


Suggested change

This proof of concept currently implements a variation of the inference-scheduling [scenairo](https://github.com/llm-d/llm-d-benchmark/blob/main/scenarios/guides/inference-scheduling.sh)/[experiment](https://github.com/llm-d/llm-d-benchmark/blob/main/experiments/inference-scheduling.yaml).

This proof of concept currently implements a variation of the inference-scheduling [scenairo](https://github.com/llm-d/llm-d-benchmark/blob/main/scenarios/guides/inference-scheduling.sh)/[experiment](https://github.com/llm-d/llm-d-benchmark/blob/main/experiments/inference-scheduling.yaml).

To change the Inference Scheduling configs for the experiment, update `tekton-poc/examples/inference-scheduling/gaie-values.yaml`, then `git push` to your fork, and supply the new URL to `inference-scheduling/` for the `experimentBaseUrl` value in the [pipeline run yaml](./pipeline/pipelinerun-matrix.yaml#L46).

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

github-actions · 2026-02-14T01:38:44Z

This PR is marked as stale after 21d of inactivity. After an additional 14d of inactivity (7d to become rotten, then 7d more), it will be closed. To prevent this PR from being closed, add a comment or remove the lifecycle/stale label.

kalantar added 17 commits October 1, 2025 07:42

values files

4644ab6

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

update gateway

40bf8cd

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

change model label

9041ccc

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

harness launcher chart

698d774

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

model-download chart

0516f2b

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

try without backslash

ae36f88

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

try without backslash

aeec000

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

restructure args

90473a5

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

hack

b68c2f7

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

hack

37893a1

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

extend hack

f9e1372

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

more image configurability

91c6c2f

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

quote

cde1549

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

MODELID

9792fa5

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

inital pipeline

24c8fe1

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

inital pipeline

5c66007

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

utility to manage parallelism

abc0419

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

kalantar marked this pull request as draft October 8, 2025 14:14

sriumcp requested changes Oct 8, 2025

View reviewed changes

jgchn reviewed Oct 8, 2025

View reviewed changes

namasl reviewed Oct 8, 2025

View reviewed changes

tekton-poc/README.md Show resolved Hide resolved

namasl reviewed Oct 8, 2025

View reviewed changes

tekton-poc/README.md Show resolved Hide resolved

kalantar added 3 commits October 8, 2025 14:45

change workload steps

46d469f

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

update readme

59b329a

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

pin image version

8dfadd8

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

kalantar force-pushed the tekton-poc branch from 0782687 to 8dfadd8 Compare October 8, 2025 19:50

jgchn reviewed Oct 9, 2025

View reviewed changes

tekton-poc/README.md Outdated Show resolved Hide resolved

jgchn reviewed Oct 9, 2025

View reviewed changes

tekton-poc/README.md Outdated Show resolved Hide resolved

update roles.yaml

ed015a8

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

kalantar added 5 commits October 9, 2025 13:29

remove hardcoded param

d349a01

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

expose s3 config

28fabf1

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

document s3

ec1a648

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

Merge branch 'main' into tekton-poc

34c4181

clarify use of namespace

f6c062b

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

jgchn reviewed Oct 10, 2025

View reviewed changes

kalantar added 3 commits October 10, 2025 11:53

change image for s3 upload

afb5655

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

prevent using kalantar ns

78de7de

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

prevent using kalantar ns

17bee4b

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

jgchn reviewed Oct 10, 2025

View reviewed changes

maugustosilva added the do-not-merge Indicates that a PR should not merge label Oct 10, 2025

kalantar added 16 commits October 10, 2025 17:10

delete experiment namespaces

f918279

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

pd yaml

c5a5e38

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

update secret name

5f4e42b

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

update fullnameoverride

c3754c7

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

fix tensor

3fcbb20

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

gateway name

b09aa28

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

desitation name

b60f447

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

label

a3407ae

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

progress towards pd scenario

2af0868

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

remove --tensor-parallel-size

e730192

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

Merge branch 'main' into tekton-poc

8f29639

capacity planner

3bacf14

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

Merge branch 'main' into tekton-poc

2f33417

reduce task pod requirements

62fc6c4

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

rename a few things

218833f

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

update roles

168739d

Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>

github-actions bot added the lifecycle/stale label Feb 14, 2026

	export $NAMESPACE=your_namespace
	export NAMESPACE=your_namespace

		This proof of concept currently implements a variation of the inference-scheduling [scenairo](https://github.com/llm-d/llm-d-benchmark/blob/main/scenarios/guides/inference-scheduling.sh)/[experiment](https://github.com/llm-d/llm-d-benchmark/blob/main/experiments/inference-scheduling.yaml).

Conversation

kalantar commented Oct 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgchn left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jgchn left a comment •

edited

Loading