feat: add support for pvc when download large model from HF#232
feat: add support for pvc when download large model from HF#232zdtsw wants to merge 1 commit intollm-d-incubation:mainfrom
Conversation
- modelArtifacts.size can be set but if the node running the pod does not have enough storage it still can be a problem for the model to run. this is to add one new option "usePVC" so user can use it instead. - by default storageclass to "" which is cluster's default one, user can config to other if they want Signed-off-by: Wen Zhou <wenzhou@redhat.com>
There was a problem hiding this comment.
Pull request overview
Adds an opt-in Helm chart feature to back Hugging Face (hf://) model downloads with a PersistentVolumeClaim to avoid failures due to insufficient node ephemeral storage.
Changes:
- Introduce
modelArtifacts.usePVCand a newmodel-cache-pvc.yamltemplate to create and mount a PVC forhf://model storage. - Update volume generation logic to use
persistentVolumeClaimwhenusePVCis enabled (HF only). - Bump chart version to
v0.4.9and refresh generated example manifests / pre-commit hook revision.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/output-xpu.yaml | Refresh example output (chart label version + whitespace). |
| examples/output-xpu-pd.yaml | Refresh example output (chart label version + whitespace). |
| examples/output-requester.yaml | Refresh example output (chart label version + whitespace). |
| examples/output-pvc.yaml | Refresh example output (chart label version). |
| examples/output-pvc-hf.yaml | Refresh example output (chart label version). |
| examples/output-pd.yaml | Refresh example output (chart label version + whitespace). |
| examples/output-pd-mnnvl.yaml | Refresh example output (chart label version + whitespace). |
| examples/output-heterogeneous-pd.yaml | Refresh example output (chart label version + whitespace). |
| examples/output-gaudi.yaml | Refresh example output (chart label version). |
| examples/output-dra.yaml | Refresh example output (chart label version). |
| examples/output-cpu.yaml | Refresh example output (chart label version + whitespace). |
| charts/llm-d-modelservice/values.yaml | Add modelArtifacts.usePVC and document intended PVC options. |
| charts/llm-d-modelservice/values.schema.tmpl.json | Add usePVC to schema template. |
| charts/llm-d-modelservice/values.schema.json | Add usePVC to dereferenced schema. |
| charts/llm-d-modelservice/templates/model-cache-pvc.yaml | New template to create <fullname>-model-cache PVC for HF cache. |
| charts/llm-d-modelservice/templates/_helpers.tpl | Switch HF model-storage volume to PVC when usePVC enabled; inject claim name. |
| charts/llm-d-modelservice/Chart.yaml | Chart version bump to v0.4.9. |
| .pre-commit-config.yaml | Update pre-commit-hooks revision. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -450,7 +455,8 @@ context is a pdSpec | |||
| {{- end -}} | |||
| {{- end -}} | |||
| {{- if $hasModelVolume }} | |||
There was a problem hiding this comment.
A single PVC name (<fullname>-model-cache) is being mounted into all pods that mount the model volume. With the default ReadWriteOnce access mode, this will commonly fail when decode/prefill land on different nodes, when replicas > 1, or in multinode (LWS) mode. Consider per-role/per-replica claims, requiring/setting ReadWriteMany, and/or surfacing a clear validation/error when usePVC is enabled with incompatible scheduling/scale.
| {{- if $hasModelVolume }} | |
| {{- if $hasModelVolume }} | |
| {{- $usePVC := default false .Values.modelArtifacts.usePVC }} | |
| {{- $pvcAccessMode := default "ReadWriteOnce" .Values.modelArtifacts.pvcAccessMode }} | |
| {{- $decodeParallelism := default 1 .Values.decode.parallelism }} | |
| {{- $prefillParallelism := default 1 .Values.prefill.parallelism }} | |
| {{- $lwsEnabled := default false .Values.lws.enabled }} | |
| {{- if and $usePVC (ne (upper $pvcAccessMode) "READWRITEMANY") (or (gt (int $decodeParallelism) 1) (gt (int $prefillParallelism) 1) $lwsEnabled) }} | |
| {{- fail "modelArtifacts.usePVC with a single PVC requires pvcAccessMode=ReadWriteMany when using parallelism > 1 or LWS; configure per-role/per-replica PVCs or change the access mode." }} | |
| {{- end }} |
| - {{ .Values.modelArtifacts.accessMode | default "ReadWriteOnce" }} | ||
| resources: | ||
| requests: | ||
| storage: {{ .Values.modelArtifacts.size | default "100Gi" }} |
There was a problem hiding this comment.
storage: {{ .Values.modelArtifacts.size | default "100Gi" }} is misleading because modelArtifacts.size already has a non-empty default in values.yaml (so default "100Gi" will never apply). Either remove the unused default or introduce a dedicated PVC sizing value (e.g., pvcSize) if you want a different default for PVC vs emptyDir.
| storage: {{ .Values.modelArtifacts.size | default "100Gi" }} | |
| storage: {{ .Values.modelArtifacts.size }} |
| "uri": { | ||
| "default": "hf://{{ .Values.modelArtifacts.name }}", | ||
| "description": " hf://model/name - model as defined on Hugging Face pvc://pvc_name/path/to/model - model on existing persistant storage volume oci:// not yet supported", | ||
| "required": [], | ||
| "title": "uri" | ||
| }, | ||
| "usePVC": { | ||
| "default": false, | ||
| "description": "Use a PersistentVolumeClaim instead of emptyDir for model storage (hf:// protocol only). Set to true for large models that exceed node's ephemeral storage capacity", | ||
| "title": "usePVC", | ||
| "type": "boolean" | ||
| } |
There was a problem hiding this comment.
Same as values.schema.json: modelArtifacts disallows unknown keys (additionalProperties: false), so if accessMode / storageClassName are intended to be user-settable for the new PVC feature, they need to be present in this schema template (likely by adding them as real keys in values.yaml and regenerating).
| # type: string | ||
| # @schema | ||
| # storageClassName -- Storage class for the PVC. Only used when usePVC: true (uses cluster default if not specified) | ||
| # storageClassName: "" |
There was a problem hiding this comment.
storageClassName is documented here but commented out, so it won’t exist in values.yaml and therefore won’t be allowed by values.schema.* (modelArtifacts has additionalProperties: false). If users set modelArtifacts.storageClassName, Helm schema validation will reject it; add this key (with a default like empty string) to values.yaml and regenerate the schemas.
| # storageClassName: "" | |
| storageClassName: "" |
There was a problem hiding this comment.
these two are commented out because usePVC by default is set to false then these wont take any effects, so better let user to uncomment it once they wanna set usePVC to true
| # type: string | ||
| # @schema | ||
| # accessMode -- Access mode for the PVC. Only used when usePVC: true (default: ReadWriteOnce) | ||
| # accessMode: ReadWriteOnce |
There was a problem hiding this comment.
Same issue as storageClassName: accessMode is referenced by the new PVC template, but it’s commented out in values.yaml and not present in the generated schema (modelArtifacts has additionalProperties: false). Add accessMode as a real values key and regenerate values.schema.tmpl.json / values.schema.json so users can configure it.
| # accessMode: ReadWriteOnce | |
| accessMode: ReadWriteOnce |
| "description": " hf://model/name - model as defined on Hugging Face pvc://pvc_name/path/to/model - model on existing persistant storage volume oci:// not yet supported", | ||
| "required": [], | ||
| "title": "uri" | ||
| }, | ||
| "usePVC": { | ||
| "default": false, | ||
| "description": "Use a PersistentVolumeClaim instead of emptyDir for model storage (hf:// protocol only). Set to true for large models that exceed node's ephemeral storage capacity", | ||
| "title": "usePVC", | ||
| "type": "boolean" | ||
| } |
There was a problem hiding this comment.
modelArtifacts schema sets additionalProperties: false, but the new PVC feature relies on additional keys (accessMode, storageClassName). As-is, users can’t set those values without failing schema validation; please add them to the schema (via values.yaml keys / helm-schema regen).
|
Why can't we use the existing modelArtifacts:
uri: pvc+hf://pvc-name/path/to/hf_hub_cache/namespace/modelID |
this requires user download model to the PVC first |
|
Why can't we manually add a pvc and manually mount it? That is, use |
You are right. I was not aware of wdyt? |
My vote goes to |
|
/review |
|
I feel like unless we have more demand for it; |
If it is not really helpful we can close this PR |
It is documented here: https://github.com/llm-d-incubation/llm-d-modelservice/blob/main/README.md?plain=1#L99. However, agree that it isn't very prominent. More examples contributed would also be good. Closing issue. |
Changes
modelArtifacts.sizecan be set but if the node running the pod does not have enough storage it still can be a problem for the model to download. this is to add one new option "usePVC" so user can use it instead.Ref: llm-d/llm-d#857