You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .claude/skills/convert-guide/SKILL.md
+10Lines changed: 10 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -183,6 +183,16 @@ Before displaying or writing files, verify:
183
183
- Volumes include `preprocesses` configMap (should be first in list)
184
184
- If guide has `inferenceExtension.pluginsCustomConfig`, then`LLMDBENCH_VLLM_MODELSERVICE_GAIE_CUSTOM_PLUGINS` is defined
185
185
- If guide specifies explicit container image version, then`LLMDBENCH_LLMD_IMAGE_TAG` is set
186
+
- If guide uses LeaderWorkerSet (LWS), then`LLMDBENCH_VLLM_MODELSERVICE_MULTINODE=true` is set
187
+
188
+
**Completeness Checklist (MANDATORY - never omit source configuration):**
189
+
- **Environment variables**: If guide defines `env:`in decode/prefill containers, ALL env vars MUST be captured in`LLMDBENCH_VLLM_MODELSERVICE_DECODE_ENVVARS_TO_YAML` or `LLMDBENCH_VLLM_MODELSERVICE_PREFILL_ENVVARS_TO_YAML`
190
+
- **Volume mounts**: If guide defines `volumeMounts:`, ALL mounts MUST be captured in`EXTRA_VOLUME_MOUNTS`
191
+
- **Volumes**: If guide defines `volumes:`, ALL volumes MUST be captured in`EXTRA_VOLUMES`
192
+
- **vLLM args**: If guide defines `args:`, ALL args MUST be captured in`EXTRA_ARGS`
193
+
- **Container config**: If guide defines resources, securityContext, or other container config, it MUST be captured
194
+
195
+
**CRITICAL**: The scenario file must be a complete representation of the guide. Configuration from the source guide should NEVER be silently dropped or omitted. If a value cannot be mapped, document it in a comment explaining why.
186
196
187
197
**Source Documentation Checklist:**
188
198
- Every environment variable has a `# SOURCE:` comment block
Copy file name to clipboardExpand all lines: .claude/skills/convert-guide/references/patterns.md
+105-2Lines changed: 105 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -68,18 +68,34 @@ The preprocesses configMap volume should be listed FIRST. The preprocesses mount
68
68
69
69
## Environment Variables
70
70
71
+
**CRITICAL RULE**: ALL environment variables defined in the guide's `env:` section MUST be captured in the scenario file. Never silently drop env vars - they are often essential for the guide to function correctly (e.g., accelerator-specific settings, logging paths, feature flags).
1. Count the env vars in the source guide's `env:` sections
96
+
2. Count the env vars in the generated `ENVVARS_TO_YAML` blocks
97
+
3. The counts must match (excluding any benchmark-framework-added vars which should be documented)
98
+
83
99
## GAIE Custom Plugin Configuration
84
100
85
101
**CRITICAL RULE**: When the guide contains `inferenceExtension.pluginsCustomConfig`, you MUST always define `LLMDBENCH_VLLM_MODELSERVICE_GAIE_CUSTOM_PLUGINS` in the scenario file, regardless of whether a preset file exists.
@@ -164,6 +180,93 @@ EOF
164
180
- Assume preset files will have the same content as the guide's custom config
165
181
- Omit custom config to avoid "duplication" - the scenario file should be complete
166
182
183
+
## LeaderWorkerSet / Multinode Patterns
184
+
185
+
When converting guides that use LeaderWorkerSet (LWS) for multi-node or multi-pod deployment:
186
+
187
+
### Detection
188
+
189
+
A guide uses LeaderWorkerSet if:
190
+
- The kustomize manifests contain `kind: LeaderWorkerSet` resources
191
+
- The manifest has fields like `leaderWorkerTemplate`, `workerTemplate`, `size`, or `LWS_*` environment variables
192
+
- The vLLM command uses flags like `--data-parallel-address`, `--data-parallel-start-rank`, `--data-parallel-rpc-port`
193
+
194
+
### Support
195
+
196
+
**LeaderWorkerSet IS supported** by the llm-d-benchmark framework via the modelservice Helm chart. Set:
197
+
198
+
```bash
199
+
export LLMDBENCH_VLLM_MODELSERVICE_MULTINODE=true
200
+
```
201
+
202
+
This maps to `multinode: true` in the modelservice Helm chart, which enables LeaderWorkerSet-based deployment.
203
+
204
+
### Configuration Mapping
205
+
206
+
| LWS Manifest Field | LLMDBENCH Variable | Notes |
207
+
|-------------------|-------------------|-------|
208
+
|`spec.replicas`|`LLMDBENCH_VLLM_MODELSERVICE_DECODE_REPLICAS`| Number of LWS groups |
209
+
|`spec.leaderWorkerTemplate.size`|`LLMDBENCH_VLLM_COMMON_NUM_WORKERS_PARALLELISM`| Pods per LWS group |
210
+
|`DP_SIZE_LOCAL` env var |`LLMDBENCH_VLLM_MODELSERVICE_DECODE_DATA_LOCAL_PARALLELISM`| Data parallel per pod |
211
+
|`TP_SIZE` env var |`LLMDBENCH_VLLM_MODELSERVICE_DECODE_TENSOR_PARALLELISM`| Tensor parallel size |
When multinode is enabled, the modelservice Helm chart automatically handles LWS-specific vLLM arguments. You typically do NOT need to include these in `EXTRA_ARGS`:
238
+
-`--data-parallel-address` (set automatically from LWS leader)
239
+
-`--data-parallel-start-rank` (set automatically per pod)
240
+
-`--data-parallel-rpc-port` (set automatically)
241
+
242
+
However, DO include these parallelism flags in `EXTRA_ARGS`:
243
+
-`--tensor-parallel-size`
244
+
-`--data-parallel-size-local` (maps to `DP_SIZE_LOCAL`)
0 commit comments