Skip to content

Commit ef0943c

Browse files
committed
support lws, sidecar disabled, ensure env defined
Signed-off-by: Michael Kalantar <kalantar@us.ibm.com>
1 parent 72d6888 commit ef0943c

File tree

3 files changed

+118
-4
lines changed

3 files changed

+118
-4
lines changed

.claude/skills/convert-guide/SKILL.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,16 @@ Before displaying or writing files, verify:
183183
- Volumes include `preprocesses` configMap (should be first in list)
184184
- If guide has `inferenceExtension.pluginsCustomConfig`, then `LLMDBENCH_VLLM_MODELSERVICE_GAIE_CUSTOM_PLUGINS` is defined
185185
- If guide specifies explicit container image version, then `LLMDBENCH_LLMD_IMAGE_TAG` is set
186+
- If guide uses LeaderWorkerSet (LWS), then `LLMDBENCH_VLLM_MODELSERVICE_MULTINODE=true` is set
187+
188+
**Completeness Checklist (MANDATORY - never omit source configuration):**
189+
- **Environment variables**: If guide defines `env:` in decode/prefill containers, ALL env vars MUST be captured in `LLMDBENCH_VLLM_MODELSERVICE_DECODE_ENVVARS_TO_YAML` or `LLMDBENCH_VLLM_MODELSERVICE_PREFILL_ENVVARS_TO_YAML`
190+
- **Volume mounts**: If guide defines `volumeMounts:`, ALL mounts MUST be captured in `EXTRA_VOLUME_MOUNTS`
191+
- **Volumes**: If guide defines `volumes:`, ALL volumes MUST be captured in `EXTRA_VOLUMES`
192+
- **vLLM args**: If guide defines `args:`, ALL args MUST be captured in `EXTRA_ARGS`
193+
- **Container config**: If guide defines resources, securityContext, or other container config, it MUST be captured
194+
195+
**CRITICAL**: The scenario file must be a complete representation of the guide. Configuration from the source guide should NEVER be silently dropped or omitted. If a value cannot be mapped, document it in a comment explaining why.
186196
187197
**Source Documentation Checklist:**
188198
- Every environment variable has a `# SOURCE:` comment block

.claude/skills/convert-guide/references/mappings.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -144,8 +144,9 @@ The ModelService Helm chart is deployed via `setup/steps/09_deploy_via_modelserv
144144

145145
| Helm Path | LLMDBENCH Variable | ev[] Key | Notes |
146146
|-----------|-------------------|----------|-------|
147-
| `multinode` | `LLMDBENCH_VLLM_MODELSERVICE_MULTINODE` | `vllm_modelservice_multinode` | Multi-node deployment |
147+
| `multinode` | `LLMDBENCH_VLLM_MODELSERVICE_MULTINODE` | `vllm_modelservice_multinode` | Enable LeaderWorkerSet deployment (set `true` when guide uses LWS CRD) |
148148
| `routing.servicePort` | `LLMDBENCH_VLLM_COMMON_INFERENCE_PORT` | `vllm_common_inference_port` | Inference service port |
149+
| `routing.proxy.enabled` | `LLMDBENCH_LLMD_ROUTINGSIDECAR_ENABLED` | `llmd_routingsidecar_enabled` | Routing sidecar enablement flag |
149150
| `routing.proxy.connector` | `LLMDBENCH_LLMD_ROUTINGSIDECAR_CONNECTOR` | `llmd_routingsidecar_connector` | Routing connector type |
150151
| `routing.proxy.debugLevel` | `LLMDBENCH_LLMD_ROUTINGSIDECAR_DEBUG_LEVEL` | `llmd_routingsidecar_debug_level` | Debug level |
151152
| `accelerator.type` | `LLMDBENCH_VLLM_COMMON_ACCELERATOR_RESOURCE` | `vllm_common_accelerator_resource` | GPU resource type |
@@ -258,7 +259,7 @@ These Helm values have no direct LLMDBENCH equivalent and should be noted in com
258259
| `LLMDBENCH_VLLM_COMMON_PVC_MODEL_CACHE_SIZE` | `vllm_common_pvc_model_cache_size` | `300Gi` | PVC size for model cache |
259260
| `LLMDBENCH_VLLM_COMMON_INFERENCE_PORT` | `vllm_common_inference_port` | `8000` | Service port (proxy/sidecar listens here, forwards to vLLM) |
260261
| `LLMDBENCH_VLLM_COMMON_METRICS_PORT` | `vllm_common_metrics_port` | `8200` | vLLM container port (where vLLM actually listens via `--port`) |
261-
| `LLMDBENCH_VLLM_MODELSERVICE_MULTINODE` | `vllm_modelservice_multinode` | `false` | Multi-node deployment |
262+
| `LLMDBENCH_VLLM_MODELSERVICE_MULTINODE` | `vllm_modelservice_multinode` | `false` | Enable LeaderWorkerSet (LWS) for multi-pod coordination |
262263
| `LLMDBENCH_VLLM_MODELSERVICE_GATEWAY_CLASS_NAME` | `vllm_modelservice_gateway_class_name` | `istio` | Gateway class |
263264
| `LLMDBENCH_VLLM_MODELSERVICE_GAIE_PLUGINS_CONFIGFILE` | `vllm_modelservice_gaie_plugins_configfile` | `default-plugins.yaml` | GAIE plugins config |
264265
| `LLMDBENCH_HARNESS_NAME` | `harness_name` | `inference-perf` | Default load generator |

.claude/skills/convert-guide/references/patterns.md

Lines changed: 105 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,18 +68,34 @@ The preprocesses configMap volume should be listed FIRST. The preprocesses mount
6868

6969
## Environment Variables
7070

71+
**CRITICAL RULE**: ALL environment variables defined in the guide's `env:` section MUST be captured in the scenario file. Never silently drop env vars - they are often essential for the guide to function correctly (e.g., accelerator-specific settings, logging paths, feature flags).
72+
7173
For container environment variables:
7274

7375
```bash
74-
export LLMDBENCH_VLLM_COMMON_ENVVARS_TO_YAML=$(mktemp)
75-
cat << EOF > $LLMDBENCH_VLLM_COMMON_ENVVARS_TO_YAML
76+
export LLMDBENCH_VLLM_MODELSERVICE_DECODE_ENVVARS_TO_YAML=$(mktemp)
77+
cat << EOF > $LLMDBENCH_VLLM_MODELSERVICE_DECODE_ENVVARS_TO_YAML
7678
- name: VLLM_LOGGING_LEVEL
7779
value: INFO
7880
- name: UCX_TLS
7981
value: "sm,cuda_ipc,cuda_copy,tcp"
8082
EOF
8183
```
8284

85+
### Mapping
86+
87+
| Guide Section | LLMDBENCH Variable |
88+
|--------------|-------------------|
89+
| `decode.containers[].env` | `LLMDBENCH_VLLM_MODELSERVICE_DECODE_ENVVARS_TO_YAML` |
90+
| `prefill.containers[].env` | `LLMDBENCH_VLLM_MODELSERVICE_PREFILL_ENVVARS_TO_YAML` |
91+
92+
### Verification
93+
94+
After generating the scenario file, verify:
95+
1. Count the env vars in the source guide's `env:` sections
96+
2. Count the env vars in the generated `ENVVARS_TO_YAML` blocks
97+
3. The counts must match (excluding any benchmark-framework-added vars which should be documented)
98+
8399
## GAIE Custom Plugin Configuration
84100

85101
**CRITICAL RULE**: When the guide contains `inferenceExtension.pluginsCustomConfig`, you MUST always define `LLMDBENCH_VLLM_MODELSERVICE_GAIE_CUSTOM_PLUGINS` in the scenario file, regardless of whether a preset file exists.
@@ -164,6 +180,93 @@ EOF
164180
- Assume preset files will have the same content as the guide's custom config
165181
- Omit custom config to avoid "duplication" - the scenario file should be complete
166182

183+
## LeaderWorkerSet / Multinode Patterns
184+
185+
When converting guides that use LeaderWorkerSet (LWS) for multi-node or multi-pod deployment:
186+
187+
### Detection
188+
189+
A guide uses LeaderWorkerSet if:
190+
- The kustomize manifests contain `kind: LeaderWorkerSet` resources
191+
- The manifest has fields like `leaderWorkerTemplate`, `workerTemplate`, `size`, or `LWS_*` environment variables
192+
- The vLLM command uses flags like `--data-parallel-address`, `--data-parallel-start-rank`, `--data-parallel-rpc-port`
193+
194+
### Support
195+
196+
**LeaderWorkerSet IS supported** by the llm-d-benchmark framework via the modelservice Helm chart. Set:
197+
198+
```bash
199+
export LLMDBENCH_VLLM_MODELSERVICE_MULTINODE=true
200+
```
201+
202+
This maps to `multinode: true` in the modelservice Helm chart, which enables LeaderWorkerSet-based deployment.
203+
204+
### Configuration Mapping
205+
206+
| LWS Manifest Field | LLMDBENCH Variable | Notes |
207+
|-------------------|-------------------|-------|
208+
| `spec.replicas` | `LLMDBENCH_VLLM_MODELSERVICE_DECODE_REPLICAS` | Number of LWS groups |
209+
| `spec.leaderWorkerTemplate.size` | `LLMDBENCH_VLLM_COMMON_NUM_WORKERS_PARALLELISM` | Pods per LWS group |
210+
| `DP_SIZE_LOCAL` env var | `LLMDBENCH_VLLM_MODELSERVICE_DECODE_DATA_LOCAL_PARALLELISM` | Data parallel per pod |
211+
| `TP_SIZE` env var | `LLMDBENCH_VLLM_MODELSERVICE_DECODE_TENSOR_PARALLELISM` | Tensor parallel size |
212+
213+
### Template
214+
215+
```bash
216+
# =============================================================================
217+
# LeaderWorkerSet / Multinode Configuration
218+
# SOURCE: <path-to-lws-manifest>
219+
# Lines <line-numbers>:
220+
# spec.replicas: <value>
221+
# spec.leaderWorkerTemplate.size: <value>
222+
# =============================================================================
223+
export LLMDBENCH_VLLM_MODELSERVICE_MULTINODE=true
224+
225+
# Number of LWS groups (each group has size workers)
226+
export LLMDBENCH_VLLM_MODELSERVICE_DECODE_REPLICAS=<replicas>
227+
228+
# Number of pods per LWS group
229+
export LLMDBENCH_VLLM_COMMON_NUM_WORKERS_PARALLELISM=<lws-size>
230+
231+
# Data parallelism per pod
232+
export LLMDBENCH_VLLM_MODELSERVICE_DECODE_DATA_LOCAL_PARALLELISM=<dp_size_local>
233+
```
234+
235+
### LWS-Specific vLLM Arguments
236+
237+
When multinode is enabled, the modelservice Helm chart automatically handles LWS-specific vLLM arguments. You typically do NOT need to include these in `EXTRA_ARGS`:
238+
- `--data-parallel-address` (set automatically from LWS leader)
239+
- `--data-parallel-start-rank` (set automatically per pod)
240+
- `--data-parallel-rpc-port` (set automatically)
241+
242+
However, DO include these parallelism flags in `EXTRA_ARGS`:
243+
- `--tensor-parallel-size`
244+
- `--data-parallel-size-local` (maps to `DP_SIZE_LOCAL`)
245+
- `--data-parallel-size` (total DP = `LWS_GROUP_SIZE * DP_SIZE_LOCAL`)
246+
247+
### Complete Example
248+
249+
```bash
250+
# Enable LeaderWorkerSet deployment
251+
export LLMDBENCH_VLLM_MODELSERVICE_MULTINODE=true
252+
253+
# LWS group configuration
254+
export LLMDBENCH_VLLM_MODELSERVICE_DECODE_REPLICAS=1 # 1 LWS group
255+
export LLMDBENCH_VLLM_COMMON_NUM_WORKERS_PARALLELISM=2 # 2 pods per group
256+
257+
# Per-pod parallelism
258+
export LLMDBENCH_VLLM_MODELSERVICE_DECODE_TENSOR_PARALLELISM=1
259+
export LLMDBENCH_VLLM_MODELSERVICE_DECODE_DATA_LOCAL_PARALLELISM=8 # 8 GPUs per pod
260+
261+
# Total: 1 group × 2 pods × 8 GPUs = 16 GPUs for decode
262+
```
263+
264+
### DO NOT
265+
266+
- Add comments saying LWS is "not supported" by llm-d-benchmark
267+
- Skip multinode configuration when converting LWS-based guides
268+
- Manually set LWS-specific args that are auto-configured by the Helm chart
269+
167270
## Accelerator Patterns
168271

169272
### XPU (Intel GPU)

0 commit comments

Comments
 (0)