Skip to content

Bring in Config Explorer from llm-d-benchmark#125

Closed
namasl wants to merge 581 commits intomainfrom
ce-import
Closed

Bring in Config Explorer from llm-d-benchmark#125
namasl wants to merge 581 commits intomainfrom
ce-import

Conversation

@namasl
Copy link
Copy Markdown
Collaborator

@namasl namasl commented Mar 27, 2026

Closes #120

maugustosilva and others added 30 commits October 29, 2025 12:40
* [Standup] Convert step 5 to python

Signed-off-by: maugustosilva <maugusto.silva@gmail.com>

* Small bugfixes

Signed-off-by: maugustosilva <maugusto.silva@gmail.com>

---------

Signed-off-by: maugustosilva <maugusto.silva@gmail.com>
* initial wva

---------

Signed-off-by: vezio <tyler.rimaldi@ibm.com>
Signed-off-by: Vezio <31221081+Vezio@users.noreply.github.com>
* Bump guidellm in container to devel build

* Update guidellm harness to use scenarios and fix convert script

* Add a few basic workload examples

* Bump GuideLLM

* GuideLLM now converts single dataset to a list automatically

* Install CPU version of torch to cut down on install size
* Quote annotations

* Also escape labels
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
* Allow import and conversion of all runs in GuideLLM results

Signed-off-by: Nick Masluk <nick@randombytes.net>

* autopep8

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Update help

Signed-off-by: Nick Masluk <nick@randombytes.net>

---------

Signed-off-by: Nick Masluk <nick@randombytes.net>
* Add bounds to scenarios

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Rename column variables

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Add more bound handling functions, support bounds in plots

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Complete implementation

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Remove stale code

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Fix comment

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Address comments

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Fix imports

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Add constants.py link

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Fix missing concurrency handling in GuideLLM

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Return dict to label figures

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Update columns

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Fix constant name

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Add back hack

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Fix fstring

Signed-off-by: Nick Masluk <nick@randombytes.net>

---------

Signed-off-by: Nick Masluk <nick@randombytes.net>
Signed-off-by: vezio <tyler.rimaldi@ibm.com>
* Add more GuideLLM parameters to DataFrame

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Address comments

Signed-off-by: Nick Masluk <nick@randombytes.net>

---------

Signed-off-by: Nick Masluk <nick@randombytes.net>
* Enable bounds in i/o length in config UI

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

* Update

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

* Rm unused lines

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

* Address feedback

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

---------

Signed-off-by: Jing Chen <jing.chen2@ibm.com>
…ds (#492)

Signed-off-by: Nick Masluk <nick@randombytes.net>
* Allow symbolic link recursion for Python versions that support it

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Avoid merge conflict with HACK removal PR

Signed-off-by: Nick Masluk <nick@randombytes.net>

* Avoid multi-line f-string for Python 3.11

Signed-off-by: Nick Masluk <nick@randombytes.net>

---------

Signed-off-by: Nick Masluk <nick@randombytes.net>
* [Run] Removed `fmperf` as a supported harness

Since `fmperf` has stopped active development, we cease support for it

Additionally, removed "service account" definition from the benchmark
launcher pod

---------

Signed-off-by: maugustosilva <maugusto.silva@gmail.com>
…uated (#499)

Signed-off-by: Nick Masluk <nick@randombytes.net>
…pot (#500)

Signed-off-by: Nick Masluk <nick@randombytes.net>
If local `docker` or `podman` daemon is not running, do NOT select it as `LLMDBENCH_CONTROL_CCMD`

Also, added a new helper script,
`setup/preprocess/set_nixl_environment.py`, to help setting environment
variables for `RoCE/GDR`

Signed-off-by: maugustosilva <maugusto.silva@gmail.com>
* Initializing KubeCon tutorial

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

* Fix link and spelling

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

* Update

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

* Fix link

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

* Fix link

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

* Show broken links

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

* Fix link again

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

* Kubecon link

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

* Rm kind ref

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

* Address feedback

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

---------

Signed-off-by: Jing Chen <jing.chen2@ibm.com>
* [Standup] bug fix for failed k8s context in setup/functions.py

* [Standup] omit error message for a failed k8s context but replace to the correct context
* kubecon2025 -> tutorials

* Add basic guidellm tutorial

* Fix broken link
[Refactor] Remove no longer used functions on `functions.sh`
[Standup] Check for context files pointing to the wrong cluster (in
addition to simply non-functional "stale" files)

Signed-off-by: maugustosilva <maugusto.silva@gmail.com>
This function is still used by `run.sh`

Signed-off-by: maugustosilva <marcio.a.silva@ibm.com>
* Fill in PD tutorial

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

* Rm command

Signed-off-by: Jing Chen <jing.chen2@ibm.com>

---------

Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
GKE precise prefix cache aware tutorial
This needed dependency was accidentally removed during the discontinuation of support for fmperf

Signed-off-by: maugustosilva <marcio.a.silva@ibm.com>
… (#514)

Added logic to use a CLI-provided namespace if the namespace cannot be detected from kubeconfig.

Signed-off-by: Pete Cheslock <pete.cheslock@redhat.com>
… host details (#511)

Signed-off-by: Nick Masluk <nick@randombytes.net>
* Update vLLM parameters

* Add warmup step to vLLM benchmark

---------

Signed-off-by: Nick Masluk <nick@randombytes.net>
…516)

deployments

Also fixed a bug on step 10

Signed-off-by: maugustosilva <maugusto.silva@gmail.com>
diegocastanibm and others added 22 commits March 11, 2026 17:45
Signed-off-by: Diego-Castan <diego.castan@ibm.com>
Controlled by environment variable
`LLMDBENCH_VLLM_COMMON_EXTRA_INIT_CONTAINER_CONFIG` (and associated
`_MODELSERVICE_` equivalents), it allows the use of the main
llm-d-benchmark image to performe "pre-processing" tasks.

Converted all guides from mount `/preprocess` from a `ConfigMap` to use
`initContainer`

Reset version on all dependencies back to "auto"

Updated `OWNERS` file

Signed-off-by: maugustosilva <maugusto.silva@gmail.com>
Signed-off-by: maugustosilva <maugusto.silva@gmail.com>
Addresses #782 — CNCF and CI Audit - week of Mar 9 2026

- Add CODE_OF_CONDUCT.md
- Add SECURITY.md
- Add PR_SIGNOFF.md
- Add .prowlabels.yaml
- Add .github/CODEOWNERS

Signed-off-by: Andrew Anderson <andy@clubanderson.com>
* feat: allow disabling of prometheus service monitor

* feat: allow gateway resource configuration

* fix: document gateway env vars

* fix: small sim example

---------

Signed-off-by: MICHAEL DESMOND <mdesmond@us.ibm.com>
Signed-off-by: maugustosilva <maugusto.silva@gmail.com>
* remove redundant code

* add metrics summary into benchmark report v0.2

* fix the bug where METRICS_DIR was not set explicitly

* remove redundant package in Dockerfile
* make epp verbosity as 4 when monitoring mode is on

* add epp log analysis

* remove headers in python scripts
…n (#859)

Add support for running experiments multiple times and aggregating results
with mean and standard deviation to account for benchmark variability.

Changes:
- existing_stack/run_only.sh: Add -R/--repeat flag (default: 1) and
  LLMDBENCH_HARNESS_REPEAT env var. Each run gets a unique experiment ID
  ({uid}_{workload}_run{i}). After all runs, calls aggregation script.
- analysis/aggregate_runs.py: New script that reads Benchmark Report v0.2
  files from repeated runs and produces aggregated_summary.json and
  aggregated_summary.txt with mean, std, min, max for all metrics.

Backward compatible: --repeat 1 (default) produces identical behavior.

Closes #701
…1279dae1329'

git-subtree-dir: src/config_explorer
git-subtree-mainline: 1d44e80
git-subtree-split: bb42822
Signed-off-by: Jing Chen <jing.chen2@ibm.com>
Remove refs to benchmark report in config explorer
@namasl namasl requested review from amito, anfredette and jgchn March 27, 2026 11:34
namasl and others added 3 commits March 27, 2026 07:38
Signed-off-by: Nick Masluk <nick@randombytes.net>
Signed-off-by: Nick Masluk <nick@randombytes.net>
@namasl namasl closed this Mar 27, 2026
@namasl namasl deleted the ce-import branch March 27, 2026 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate llm-d-benchmark.config_explorer into this repo