Skip to content

Commit 20dc122

Browse files
authored
fix: Removes Redundant Version References (llm-d#903)
* remove redundant chart version references --------- Signed-off-by: vezio <tyler.rimaldi@ibm.com> Signed-off-by: Tyler Vezio Rimaldi <31221081+Vezio@users.noreply.github.com>
1 parent 6b51473 commit 20dc122

4 files changed

Lines changed: 87 additions & 9 deletions

File tree

config/README.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -330,6 +330,73 @@ scenario:
330330
uriProtocol: hf # No PVC, no download job — fetch at runtime
331331
```
332332

333+
### Code path
334+
335+
1. `llmdbenchmark/standup/steps/step_04_model_namespace.py` — `_requires_pvc_download()` returns `False` when `uriProtocol != "pvc"`
336+
2. `config/templates/jinja/13_ms-values.yaml.j2` — conditionally generates `hf://` or `pvc://` URI
337+
3. `config/templates/jinja/04_download_job.yaml.j2` — only rendered/applied when protocol is `pvc`
338+
339+
## Chart Versions
340+
341+
All Helm chart and component versions are centralized in the `chartVersions` section of `defaults.yaml`. This is the single place to bump versions when upgrading components.
342+
343+
| Field | Default | Description |
344+
|---|---|---|
345+
| `chartVersions.istioBase` | `1.29.1` | Istio base chart version |
346+
| `chartVersions.istiod` | `1.29.1` | Istiod chart version (also used as gateway version) |
347+
| `chartVersions.llmDInfra` | `auto` | llm-d-infra Helm chart (auto-resolved via helm) |
348+
| `chartVersions.llmDModelservice` | `auto` | llm-d-modelservice Helm chart (auto-resolved via helm) |
349+
| `chartVersions.inferencePool` | `v1.3.0` | Inference pool chart version |
350+
| `chartVersions.gaie` | `v1.3.1` | GAIE chart version |
351+
| `chartVersions.wva` | `auto` | Workload Variant Autoscaler chart (auto-resolved) |
352+
| `chartVersions.kgateway` | `v2.2.1` | kgateway chart version |
353+
| `chartVersions.lws` | `0.8.0` | LeaderWorkerSet chart version |
354+
355+
Versions set to `auto` are resolved at plan time by `VersionResolver` using `helm search repo` or OCI registry queries (skopeo/crane). Fixed versions are used as-is.
356+
357+
### Overriding versions in a scenario
358+
359+
Add a `chartVersions` section to your scenario YAML. Only include the versions you want to change — the rest inherit from defaults:
360+
361+
```yaml
362+
scenario:
363+
- name: "my-upgrade-test"
364+
chartVersions:
365+
llmDModelservice: "0.5.0" # pin to specific version
366+
kgateway: "v2.3.0" # upgrade kgateway
367+
```
368+
369+
### Pinning all versions for reproducibility
370+
371+
To ensure a benchmark run is fully reproducible, pin every `auto` version to a specific value. Run `plan` first to see what `auto` resolves to, then copy those values into your scenario:
372+
373+
```yaml
374+
scenario:
375+
- name: "reproducible-bench"
376+
chartVersions:
377+
llmDInfra: "v1.4.0" # was auto
378+
llmDModelservice: "v0.4.9" # was auto
379+
wva: "0.5.1" # was auto
380+
```
381+
382+
### Upgrading Istio
383+
384+
Istio uses two charts (`istio-base` and `istiod`) that must be the same version. Override both:
385+
386+
```yaml
387+
chartVersions:
388+
istioBase: "1.30.0"
389+
istiod: "1.30.0"
390+
```
391+
392+
### How `auto` resolution works
393+
394+
1. For charts with a `helmRepositories` entry: queries the repo via `helm search repo` or OCI registry
395+
2. Falls back to `skopeo list-tags` or `crane ls` for OCI registries
396+
3. Selects the latest semver-compatible tag
397+
4. Resolved versions are logged during `plan`: `📦 Resolved chart llmDInfra to v1.4.0 (via repo URL)`
398+
399+
> **Note:** `auto` versions may change between runs as upstream charts release new versions. Pin versions in your scenario for consistent results across runs.
333400
## KV Transfer Configuration
334401

335402
The `vllmCommon.kvTransfer` section controls the `--kv-transfer-config` argument passed to the `vllm serve` command. This is how vLLM knows which KV cache transfer connector to use and how to configure it.

config/templates/values/defaults.yaml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -378,6 +378,8 @@ chartVersions:
378378
inferencePool: v1.3.0
379379
gaie: v1.3.1
380380
wva: auto
381+
kgateway: v2.2.1
382+
lws: "0.8.0"
381383

382384
# ============================================================================
383385
# SERVICE ACCOUNT CONFIGURATION
@@ -484,7 +486,7 @@ gateway:
484486
className: istio
485487
provider: istio
486488
providerNamespace: istio-system
487-
version: *istio_version
489+
# Gateway version is derived from chartVersions.istiod at plan time
488490
logLevel: error
489491
service:
490492
type: NodePort
@@ -1209,15 +1211,15 @@ wva:
12091211
# ============================================================================
12101212
gatewayProviders:
12111213
kgateway:
1212-
chartVersion: v2.2.1
1214+
# Chart version managed in chartVersions.kgateway
12131215
namespace: kgateway-system
12141216
helmRepository: oci://cr.kgateway.dev/kgateway-dev/charts
12151217

12161218
# ============================================================================
12171219
# LWS (LeaderWorkerSet)
12181220
# ============================================================================
12191221
lws:
1220-
chartVersion: "0.8.0"
1222+
# Chart version managed in chartVersions.lws
12211223
namespace: lws-system
12221224
helmRepository: oci://ghcr.io/kubernetes-sigs/lws
12231225

llmdbenchmark/parser/version_resolver.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -318,7 +318,12 @@ def _resolve_chart_versions(self, values: dict, unresolved: list) -> None:
318318
unresolved.append(f"chartVersions.{chart_key}")
319319

320320
def _resolve_gateway_version(self, values: dict) -> None:
321-
"""Resolve gateway version from the istio chart version."""
321+
"""Resolve gateway version from the istio chart version.
322+
323+
Kept for backward compatibility with configs that still set
324+
``gateway.version``. New configs should use ``chartVersions.istiod``
325+
directly.
326+
"""
322327
gateway = values.get("gateway", {})
323328
if gateway.get("version") == "auto":
324329
istio_version = values.get("chartVersions", {}).get("istiod")
@@ -340,7 +345,8 @@ def has_unresolved(self, values: dict) -> list[str]:
340345
standalone = values.get("standalone", {}).get("image", {})
341346
if isinstance(standalone, dict) and standalone.get("tag") == "auto":
342347
unresolved.append("standalone.image.tag")
343-
if values.get("gateway", {}).get("version") == "auto":
348+
gw_ver = values.get("gateway", {}).get("version")
349+
if gw_ver == "auto":
344350
unresolved.append("gateway.version")
345351
wva = values.get("wva", {}).get("image", {})
346352
if isinstance(wva, dict) and wva.get("tag") == "auto":

llmdbenchmark/standup/steps/step_02_admin_prerequisites.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -339,7 +339,7 @@ def _install_lws_if_needed(
339339
)
340340
return
341341

342-
self._install_lws(cmd, lws_config, errors)
342+
self._install_lws(cmd, lws_config, errors, plan_config=plan_config)
343343

344344
def _install_prometheus_crds_if_needed(
345345
self,
@@ -422,7 +422,7 @@ def _apply_openshift_sccs(
422422

423423
def _install_kgateway(self, cmd: CommandExecutor, plan_config: dict, errors: list):
424424
kgw = plan_config.get("gatewayProviders", {}).get("kgateway", {})
425-
chart_version = kgw.get("chartVersion", "")
425+
chart_version = plan_config.get("chartVersions", {}).get("kgateway", "") or kgw.get("chartVersion", "")
426426
namespace = self._require_config(kgw, "namespace")
427427
helm_repo = kgw.get("helmRepository", "")
428428

@@ -502,8 +502,11 @@ def _install_istio(
502502
if not result.success:
503503
errors.append(f"Failed to install Istio via helmfile: {result.stderr}")
504504

505-
def _install_lws(self, cmd: CommandExecutor, lws_config: dict, errors: list):
506-
version = lws_config.get("chartVersion", "")
505+
def _install_lws(self, cmd: CommandExecutor, lws_config: dict, errors: list, plan_config: dict | None = None):
506+
version = ""
507+
if plan_config:
508+
version = plan_config.get("chartVersions", {}).get("lws", "")
509+
version = version or lws_config.get("chartVersion", "")
507510
namespace = self._require_config(lws_config, "namespace")
508511
helm_repo = lws_config.get("helmRepository", "")
509512

0 commit comments

Comments
 (0)