You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: config/README.md
+43Lines changed: 43 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -287,6 +287,49 @@ This allows public models (e.g. `facebook/opt-125m`) to be deployed without a to
287
287
288
288
The `enabled` flag is auto-computed during plan rendering by `_resolve_hf_token()` in `render_plans.py`. It checks `HF_TOKEN`, `LLMDBENCH_HF_TOKEN`, and the scenario YAML in that order.
289
289
290
+
## Model Artifact Protocol (`modelservice.uriProtocol`)
291
+
292
+
Controls how the modelservice Helm chart locates and loads model weights. Set via `modelservice.uriProtocol` in your scenario or defaults.
293
+
294
+
| Protocol | `modelArtifacts.uri` Generated | PVC Created | Download Job | Model Loading |
3. The modelservice Helm chart downloads the model at pod startup time from HuggingFace Hub
316
+
4. For gated models, `huggingface.secretName` is passed as `authSecretName` so the chart can authenticate
317
+
318
+
This is useful for CI/CD (no PVC needed), quick testing, or when storage provisioning is unavailable.
319
+
320
+
### Scenario example
321
+
322
+
```yaml
323
+
scenario:
324
+
- name: "my-hf-deploy"
325
+
model:
326
+
name: facebook/opt-125m
327
+
huggingfaceId: facebook/opt-125m
328
+
modelservice:
329
+
enabled: true
330
+
uriProtocol: hf # No PVC, no download job — fetch at runtime
331
+
```
332
+
290
333
## KV Transfer Configuration
291
334
292
335
The `vllmCommon.kvTransfer` section controls the `--kv-transfer-config` argument passed to the `vllm serve` command. This is how vLLM knows which KV cache transfer connector to use and how to configure it.
0 commit comments