support lws, sidecar disabled, ensure env defined

kalantar · kalantar · commit ef0943ce446c · 2026-02-19T11:18:44.000-05:00
Signed-off-by: Michael Kalantar &lt;kalantar@us.ibm.com&gt;
diff --git a/.claude/skills/convert-guide/SKILL.md b/.claude/skills/convert-guide/SKILL.md
@@ -183,6 +183,16 @@ Before displaying or writing files, verify:
 - Volumes include `preprocesses` configMap (should be first in list)
 - If guide has `inferenceExtension.pluginsCustomConfig`, then `LLMDBENCH_VLLM_MODELSERVICE_GAIE_CUSTOM_PLUGINS` is defined
 - If guide specifies explicit container image version, then `LLMDBENCH_LLMD_IMAGE_TAG` is set
+- If guide uses LeaderWorkerSet (LWS), then `LLMDBENCH_VLLM_MODELSERVICE_MULTINODE=true` is set
+
+**Completeness Checklist (MANDATORY - never omit source configuration):**
+- **Environment variables**: If guide defines `env:` in decode/prefill containers, ALL env vars MUST be captured in `LLMDBENCH_VLLM_MODELSERVICE_DECODE_ENVVARS_TO_YAML` or `LLMDBENCH_VLLM_MODELSERVICE_PREFILL_ENVVARS_TO_YAML`
+- **Volume mounts**: If guide defines `volumeMounts:`, ALL mounts MUST be captured in `EXTRA_VOLUME_MOUNTS`
+- **Volumes**: If guide defines `volumes:`, ALL volumes MUST be captured in `EXTRA_VOLUMES`
+- **vLLM args**: If guide defines `args:`, ALL args MUST be captured in `EXTRA_ARGS`
+- **Container config**: If guide defines resources, securityContext, or other container config, it MUST be captured
+
+**CRITICAL**: The scenario file must be a complete representation of the guide. Configuration from the source guide should NEVER be silently dropped or omitted. If a value cannot be mapped, document it in a comment explaining why.
 
 **Source Documentation Checklist:**
 - Every environment variable has a `# SOURCE:` comment block
diff --git a/.claude/skills/convert-guide/references/mappings.md b/.claude/skills/convert-guide/references/mappings.md
@@ -144,8 +144,9 @@ The ModelService Helm chart is deployed via `setup/steps/09_deploy_via_modelserv
 
 | Helm Path | LLMDBENCH Variable | ev[] Key | Notes |
 |-----------|-------------------|----------|-------|
-| `multinode` | `LLMDBENCH_VLLM_MODELSERVICE_MULTINODE` | `vllm_modelservice_multinode` | Multi-node deployment |
+| `multinode` | `LLMDBENCH_VLLM_MODELSERVICE_MULTINODE` | `vllm_modelservice_multinode` | Enable LeaderWorkerSet deployment (set `true` when guide uses LWS CRD) |
 | `routing.servicePort` | `LLMDBENCH_VLLM_COMMON_INFERENCE_PORT` | `vllm_common_inference_port` | Inference service port |
+| `routing.proxy.enabled` | `LLMDBENCH_LLMD_ROUTINGSIDECAR_ENABLED` | `llmd_routingsidecar_enabled` | Routing sidecar enablement flag |
 | `routing.proxy.connector` | `LLMDBENCH_LLMD_ROUTINGSIDECAR_CONNECTOR` | `llmd_routingsidecar_connector` | Routing connector type |
 | `routing.proxy.debugLevel` | `LLMDBENCH_LLMD_ROUTINGSIDECAR_DEBUG_LEVEL` | `llmd_routingsidecar_debug_level` | Debug level |
 | `accelerator.type` | `LLMDBENCH_VLLM_COMMON_ACCELERATOR_RESOURCE` | `vllm_common_accelerator_resource` | GPU resource type |
@@ -258,7 +259,7 @@ These Helm values have no direct LLMDBENCH equivalent and should be noted in com
 | `LLMDBENCH_VLLM_COMMON_PVC_MODEL_CACHE_SIZE` | `vllm_common_pvc_model_cache_size` | `300Gi` | PVC size for model cache |
 | `LLMDBENCH_VLLM_COMMON_INFERENCE_PORT` | `vllm_common_inference_port` | `8000` | Service port (proxy/sidecar listens here, forwards to vLLM) |
 | `LLMDBENCH_VLLM_COMMON_METRICS_PORT` | `vllm_common_metrics_port` | `8200` | vLLM container port (where vLLM actually listens via `--port`) |
-| `LLMDBENCH_VLLM_MODELSERVICE_MULTINODE` | `vllm_modelservice_multinode` | `false` | Multi-node deployment |
+| `LLMDBENCH_VLLM_MODELSERVICE_MULTINODE` | `vllm_modelservice_multinode` | `false` | Enable LeaderWorkerSet (LWS) for multi-pod coordination |
 | `LLMDBENCH_VLLM_MODELSERVICE_GATEWAY_CLASS_NAME` | `vllm_modelservice_gateway_class_name` | `istio` | Gateway class |
 | `LLMDBENCH_VLLM_MODELSERVICE_GAIE_PLUGINS_CONFIGFILE` | `vllm_modelservice_gaie_plugins_configfile` | `default-plugins.yaml` | GAIE plugins config |
 | `LLMDBENCH_HARNESS_NAME` | `harness_name` | `inference-perf` | Default load generator |
diff --git a/.claude/skills/convert-guide/references/patterns.md b/.claude/skills/convert-guide/references/patterns.md
@@ -68,18 +68,34 @@ The preprocesses configMap volume should be listed FIRST. The preprocesses mount
 
 ## Environment Variables
 
+**CRITICAL RULE**: ALL environment variables defined in the guide's `env:` section MUST be captured in the scenario file. Never silently drop env vars - they are often essential for the guide to function correctly (e.g., accelerator-specific settings, logging paths, feature flags).
+
 For container environment variables:
 
 ```bash
-export LLMDBENCH_VLLM_COMMON_ENVVARS_TO_YAML=$(mktemp)
-cat << EOF > $LLMDBENCH_VLLM_COMMON_ENVVARS_TO_YAML
+export LLMDBENCH_VLLM_MODELSERVICE_DECODE_ENVVARS_TO_YAML=$(mktemp)
+cat << EOF > $LLMDBENCH_VLLM_MODELSERVICE_DECODE_ENVVARS_TO_YAML
 - name: VLLM_LOGGING_LEVEL
   value: INFO
 - name: UCX_TLS
   value: "sm,cuda_ipc,cuda_copy,tcp"
 EOF
 ```
 
+### Mapping
+
+| Guide Section | LLMDBENCH Variable |
+|--------------|-------------------|
+| `decode.containers[].env` | `LLMDBENCH_VLLM_MODELSERVICE_DECODE_ENVVARS_TO_YAML` |
+| `prefill.containers[].env` | `LLMDBENCH_VLLM_MODELSERVICE_PREFILL_ENVVARS_TO_YAML` |
+
+### Verification
+
+After generating the scenario file, verify:
+1. Count the env vars in the source guide's `env:` sections
+2. Count the env vars in the generated `ENVVARS_TO_YAML` blocks
+3. The counts must match (excluding any benchmark-framework-added vars which should be documented)
+
 ## GAIE Custom Plugin Configuration
 
 **CRITICAL RULE**: When the guide contains `inferenceExtension.pluginsCustomConfig`, you MUST always define `LLMDBENCH_VLLM_MODELSERVICE_GAIE_CUSTOM_PLUGINS` in the scenario file, regardless of whether a preset file exists.
@@ -164,6 +180,93 @@ EOF
 - Assume preset files will have the same content as the guide's custom config
 - Omit custom config to avoid "duplication" - the scenario file should be complete
 
+## LeaderWorkerSet / Multinode Patterns
+
+When converting guides that use LeaderWorkerSet (LWS) for multi-node or multi-pod deployment:
+
+### Detection
+
+A guide uses LeaderWorkerSet if:
+- The kustomize manifests contain `kind: LeaderWorkerSet` resources
+- The manifest has fields like `leaderWorkerTemplate`, `workerTemplate`, `size`, or `LWS_*` environment variables
+- The vLLM command uses flags like `--data-parallel-address`, `--data-parallel-start-rank`, `--data-parallel-rpc-port`
+
+### Support
+
+**LeaderWorkerSet IS supported** by the llm-d-benchmark framework via the modelservice Helm chart. Set:
+
+```bash
+export LLMDBENCH_VLLM_MODELSERVICE_MULTINODE=true
+```
+
+This maps to `multinode: true` in the modelservice Helm chart, which enables LeaderWorkerSet-based deployment.
+
+### Configuration Mapping
+
+| LWS Manifest Field | LLMDBENCH Variable | Notes |
+|-------------------|-------------------|-------|
+| `spec.replicas` | `LLMDBENCH_VLLM_MODELSERVICE_DECODE_REPLICAS` | Number of LWS groups |
+| `spec.leaderWorkerTemplate.size` | `LLMDBENCH_VLLM_COMMON_NUM_WORKERS_PARALLELISM` | Pods per LWS group |
+| `DP_SIZE_LOCAL` env var | `LLMDBENCH_VLLM_MODELSERVICE_DECODE_DATA_LOCAL_PARALLELISM` | Data parallel per pod |
+| `TP_SIZE` env var | `LLMDBENCH_VLLM_MODELSERVICE_DECODE_TENSOR_PARALLELISM` | Tensor parallel size |
+
+### Template
+
+```bash
+# =============================================================================
+# LeaderWorkerSet / Multinode Configuration
+# SOURCE: <path-to-lws-manifest>
+# Lines <line-numbers>:
+#   spec.replicas: <value>
+#   spec.leaderWorkerTemplate.size: <value>
+# =============================================================================
+export LLMDBENCH_VLLM_MODELSERVICE_MULTINODE=true
+
+# Number of LWS groups (each group has size workers)
+export LLMDBENCH_VLLM_MODELSERVICE_DECODE_REPLICAS=<replicas>
+
+# Number of pods per LWS group
+export LLMDBENCH_VLLM_COMMON_NUM_WORKERS_PARALLELISM=<lws-size>
+
+# Data parallelism per pod
+export LLMDBENCH_VLLM_MODELSERVICE_DECODE_DATA_LOCAL_PARALLELISM=<dp_size_local>
+```
+
+### LWS-Specific vLLM Arguments
+
+When multinode is enabled, the modelservice Helm chart automatically handles LWS-specific vLLM arguments. You typically do NOT need to include these in `EXTRA_ARGS`:
+- `--data-parallel-address` (set automatically from LWS leader)
+- `--data-parallel-start-rank` (set automatically per pod)
+- `--data-parallel-rpc-port` (set automatically)
+
+However, DO include these parallelism flags in `EXTRA_ARGS`:
+- `--tensor-parallel-size`
+- `--data-parallel-size-local` (maps to `DP_SIZE_LOCAL`)
+- `--data-parallel-size` (total DP = `LWS_GROUP_SIZE * DP_SIZE_LOCAL`)
+
+### Complete Example
+
+```bash
+# Enable LeaderWorkerSet deployment
+export LLMDBENCH_VLLM_MODELSERVICE_MULTINODE=true
+
+# LWS group configuration
+export LLMDBENCH_VLLM_MODELSERVICE_DECODE_REPLICAS=1      # 1 LWS group
+export LLMDBENCH_VLLM_COMMON_NUM_WORKERS_PARALLELISM=2    # 2 pods per group
+
+# Per-pod parallelism
+export LLMDBENCH_VLLM_MODELSERVICE_DECODE_TENSOR_PARALLELISM=1
+export LLMDBENCH_VLLM_MODELSERVICE_DECODE_DATA_LOCAL_PARALLELISM=8  # 8 GPUs per pod
+
+# Total: 1 group × 2 pods × 8 GPUs = 16 GPUs for decode
+```
+
+### DO NOT
+
+- Add comments saying LWS is "not supported" by llm-d-benchmark
+- Skip multinode configuration when converting LWS-based guides
+- Manually set LWS-specific args that are auto-configured by the Helm chart
+
 ## Accelerator Patterns
 
 ### XPU (Intel GPU)