feat: add support for pvc when download large model from HF by zdtsw · Pull Request #232 · llm-d-incubation/llm-d-modelservice

zdtsw · 2026-03-04T17:27:33Z

Changes

modelArtifacts.size can be set but if the node running the pod does not have enough storage it still can be a problem for the model to download. this is to add one new option "usePVC" so user can use it instead.
this only applied to model with protocol "hf://" as "oci" and "pvc" does not require this
it creates a PVC named as "$fullname-model-cache" and injected into deployment
by default storageclass to "" which is to cluster's default one, user can config to other if they want but they will need to create this pre running helm chart

- modelArtifacts.size can be set but if the node running the pod does not have enough storage it still can be a problem for the model to run. this is to add one new option "usePVC" so user can use it instead. - by default storageclass to "" which is cluster's default one, user can config to other if they want Signed-off-by: Wen Zhou <wenzhou@redhat.com>

Copilot

Pull request overview

Adds an opt-in Helm chart feature to back Hugging Face (hf://) model downloads with a PersistentVolumeClaim to avoid failures due to insufficient node ephemeral storage.

Changes:

Introduce modelArtifacts.usePVC and a new model-cache-pvc.yaml template to create and mount a PVC for hf:// model storage.
Update volume generation logic to use persistentVolumeClaim when usePVC is enabled (HF only).
Bump chart version to v0.4.9 and refresh generated example manifests / pre-commit hook revision.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
examples/output-xpu.yaml	Refresh example output (chart label version + whitespace).
examples/output-xpu-pd.yaml	Refresh example output (chart label version + whitespace).
examples/output-requester.yaml	Refresh example output (chart label version + whitespace).
examples/output-pvc.yaml	Refresh example output (chart label version).
examples/output-pvc-hf.yaml	Refresh example output (chart label version).
examples/output-pd.yaml	Refresh example output (chart label version + whitespace).
examples/output-pd-mnnvl.yaml	Refresh example output (chart label version + whitespace).
examples/output-heterogeneous-pd.yaml	Refresh example output (chart label version + whitespace).
examples/output-gaudi.yaml	Refresh example output (chart label version).
examples/output-dra.yaml	Refresh example output (chart label version).
examples/output-cpu.yaml	Refresh example output (chart label version + whitespace).
charts/llm-d-modelservice/values.yaml	Add `modelArtifacts.usePVC` and document intended PVC options.
charts/llm-d-modelservice/values.schema.tmpl.json	Add `usePVC` to schema template.
charts/llm-d-modelservice/values.schema.json	Add `usePVC` to dereferenced schema.
charts/llm-d-modelservice/templates/model-cache-pvc.yaml	New template to create `<fullname>-model-cache` PVC for HF cache.
charts/llm-d-modelservice/templates/_helpers.tpl	Switch HF model-storage volume to PVC when `usePVC` enabled; inject claim name.
charts/llm-d-modelservice/Chart.yaml	Chart version bump to `v0.4.9`.
.pre-commit-config.yaml	Update `pre-commit-hooks` revision.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-04T17:39:06Z

charts/llm-d-modelservice/templates/_helpers.tpl

@@ -450,7 +455,8 @@ context is a pdSpec
    {{- end -}}
  {{- end -}}
  {{- if $hasModelVolume }}


A single PVC name (<fullname>-model-cache) is being mounted into all pods that mount the model volume. With the default ReadWriteOnce access mode, this will commonly fail when decode/prefill land on different nodes, when replicas > 1, or in multinode (LWS) mode. Consider per-role/per-replica claims, requiring/setting ReadWriteMany, and/or surfacing a clear validation/error when usePVC is enabled with incompatible scheduling/scale.

Suggested change

{{- if $hasModelVolume }}

{{- if $hasModelVolume }}

{{- $usePVC := default false .Values.modelArtifacts.usePVC }}

{{- $pvcAccessMode := default "ReadWriteOnce" .Values.modelArtifacts.pvcAccessMode }}

{{- $decodeParallelism := default 1 .Values.decode.parallelism }}

{{- $prefillParallelism := default 1 .Values.prefill.parallelism }}

{{- $lwsEnabled := default false .Values.lws.enabled }}

{{- if and $usePVC (ne (upper $pvcAccessMode) "READWRITEMANY") (or (gt (int $decodeParallelism) 1) (gt (int $prefillParallelism) 1) $lwsEnabled) }}

{{- fail "modelArtifacts.usePVC with a single PVC requires pvcAccessMode=ReadWriteMany when using parallelism > 1 or LWS; configure per-role/per-replica PVCs or change the access mode." }}

{{- end }}

Copilot · 2026-03-04T17:39:07Z

charts/llm-d-modelservice/templates/model-cache-pvc.yaml

+    - {{ .Values.modelArtifacts.accessMode | default "ReadWriteOnce" }}
+  resources:
+    requests:
+      storage: {{ .Values.modelArtifacts.size | default "100Gi" }}


storage: {{ .Values.modelArtifacts.size | default "100Gi" }} is misleading because modelArtifacts.size already has a non-empty default in values.yaml (so default "100Gi" will never apply). Either remove the unused default or introduce a dedicated PVC sizing value (e.g., pvcSize) if you want a different default for PVC vs emptyDir.

Suggested change

storage: {{ .Values.modelArtifacts.size | default "100Gi" }}

storage: {{ .Values.modelArtifacts.size }}

Copilot · 2026-03-04T17:39:07Z

charts/llm-d-modelservice/values.schema.tmpl.json

        "uri": {
          "default": "hf://{{ .Values.modelArtifacts.name }}",
          "description": "   hf://model/name - model as defined on Hugging Face   pvc://pvc_name/path/to/model - model on existing persistant storage volume   oci:// not yet supported",
          "required": [],
          "title": "uri"
+        },
+        "usePVC": {
+          "default": false,
+          "description": "Use a PersistentVolumeClaim instead of emptyDir for model storage (hf:// protocol only). Set to true for large models that exceed node's ephemeral storage capacity",
+          "title": "usePVC",
+          "type": "boolean"
        }


Same as values.schema.json: modelArtifacts disallows unknown keys (additionalProperties: false), so if accessMode / storageClassName are intended to be user-settable for the new PVC feature, they need to be present in this schema template (likely by adding them as real keys in values.yaml and regenerating).

Copilot · 2026-03-04T17:39:07Z

charts/llm-d-modelservice/values.yaml

+  # type: string
+  # @schema
+  # storageClassName -- Storage class for the PVC. Only used when usePVC: true (uses cluster default if not specified)
+  # storageClassName: ""


storageClassName is documented here but commented out, so it won’t exist in values.yaml and therefore won’t be allowed by values.schema.* (modelArtifacts has additionalProperties: false). If users set modelArtifacts.storageClassName, Helm schema validation will reject it; add this key (with a default like empty string) to values.yaml and regenerate the schemas.

Suggested change

# storageClassName: ""

storageClassName: ""

these two are commented out because usePVC by default is set to false then these wont take any effects, so better let user to uncomment it once they wanna set usePVC to true

Copilot · 2026-03-04T17:39:08Z

charts/llm-d-modelservice/values.yaml

+  # type: string
+  # @schema
+  # accessMode -- Access mode for the PVC. Only used when usePVC: true (default: ReadWriteOnce)
+  # accessMode: ReadWriteOnce


Same issue as storageClassName: accessMode is referenced by the new PVC template, but it’s commented out in values.yaml and not present in the generated schema (modelArtifacts has additionalProperties: false). Add accessMode as a real values key and regenerate values.schema.tmpl.json / values.schema.json so users can configure it.

Suggested change

# accessMode: ReadWriteOnce

accessMode: ReadWriteOnce

Copilot · 2026-03-04T17:39:08Z

charts/llm-d-modelservice/values.schema.json

                    "description": "   hf://model/name - model as defined on Hugging Face   pvc://pvc_name/path/to/model - model on existing persistant storage volume   oci:// not yet supported",
                    "required": [],
                    "title": "uri"
+                },
+                "usePVC": {
+                    "default": false,
+                    "description": "Use a PersistentVolumeClaim instead of emptyDir for model storage (hf:// protocol only). Set to true for large models that exceed node's ephemeral storage capacity",
+                    "title": "usePVC",
+                    "type": "boolean"
                }


modelArtifacts schema sets additionalProperties: false, but the new PVC feature relies on additional keys (accessMode, storageClassName). As-is, users can’t set those values without failing schema validation; please add them to the schema (via values.yaml keys / helm-schema regen).

poussa · 2026-03-05T07:49:51Z

Why can't we use the existing pvc+hf URI?

modelArtifacts:
  uri: pvc+hf://pvc-name/path/to/hf_hub_cache/namespace/modelID

zdtsw · 2026-03-05T08:10:13Z

Why can't we use the existing pvc+hf URI?
modelArtifacts:
  uri: pvc+hf://pvc-name/path/to/hf_hub_cache/namespace/modelID

this requires user download model to the PVC first
https://github.com/llm-d-incubation/llm-d-modelservice/blob/main/examples/pvc/README.md#use-hf-downloaded-models-with-pvcs

kalantar · 2026-03-06T15:18:22Z

Why can't we manually add a pvc and manually mount it? That is, use extraObjects to define the PVC. Add an entry in volumes and mount it where HF_HOME is set.

zdtsw · 2026-03-06T17:11:03Z

extraObjects

You are right. I was not aware of extraObjects when I made the change.
Still, I think flipping usePVC: true and setting size + keep HF_HOME path default is simpler than configuring extraObjects + decode.volumes + might need a different HF_HOME path

wdyt?

poussa · 2026-03-06T17:26:15Z

extraObjects

You are right. I was not aware of extraObjects when I made the change. Still, I think flipping usePVC: true and setting size + keep HF_HOME path default is simpler than configuring extraObjects + decode.volumes + might need a different HF_HOME path

wdyt?

My vote goes to extraObjects

clubanderson · 2026-03-07T04:31:05Z

/review

kalantar · 2026-03-09T13:20:58Z

I feel like unless we have more demand for it; extraObjects for now.

zdtsw · 2026-03-09T13:30:03Z

I feel like unless we have more demand for it; extraObjects for now.

If it is not really helpful we can close this PR
but we will need more documentation on this. I dont know if it is better address it in llm-d repo or here : llm-d/llm-d#857 (comment)

kalantar · 2026-03-09T13:34:51Z

I feel like unless we have more demand for it; extraObjects for now.

If it is not really helpful we can close this PR but we will need more documentation on this. I dont know if it is better address it in llm-d repo or here : llm-d/llm-d#857 (comment)

It is documented here: https://github.com/llm-d-incubation/llm-d-modelservice/blob/main/README.md?plain=1#L99. However, agree that it isn't very prominent. More examples contributed would also be good. Closing issue.

Copilot AI review requested due to automatic review settings March 4, 2026 17:27

Copilot started reviewing on behalf of zdtsw March 4, 2026 17:28 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

zdtsw mentioned this pull request Mar 6, 2026

[Bug 🐛]: TPU recipe for Qwen 3 Coder 480B cannot download the model files llm-d/llm-d#857

Closed

kalantar closed this Mar 9, 2026

-  {{- if $hasModelVolume }}
+  {{- if $hasModelVolume }}
+  {{- $usePVC := default false .Values.modelArtifacts.usePVC }}
+  {{- $pvcAccessMode := default "ReadWriteOnce" .Values.modelArtifacts.pvcAccessMode }}
+  {{- $decodeParallelism := default 1 .Values.decode.parallelism }}
+  {{- $prefillParallelism := default 1 .Values.prefill.parallelism }}
+  {{- $lwsEnabled := default false .Values.lws.enabled }}
+  {{- if and $usePVC (ne (upper $pvcAccessMode) "READWRITEMANY") (or (gt (int $decodeParallelism) 1) (gt (int $prefillParallelism) 1) $lwsEnabled) }}
+  {{- fail "modelArtifacts.usePVC with a single PVC requires pvcAccessMode=ReadWriteMany when using parallelism > 1 or LWS; configure per-role/per-replica PVCs or change the access mode." }}
+  {{- end }}

	storage: {{ .Values.modelArtifacts.size \| default "100Gi" }}
	storage: {{ .Values.modelArtifacts.size }}

Conversation

zdtsw commented Mar 4, 2026

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

zdtsw Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

poussa commented Mar 5, 2026

Uh oh!

zdtsw commented Mar 5, 2026

Uh oh!

kalantar commented Mar 6, 2026

Uh oh!

zdtsw commented Mar 6, 2026

Uh oh!

poussa commented Mar 6, 2026

Uh oh!

clubanderson commented Mar 7, 2026

Uh oh!

kalantar commented Mar 9, 2026

Uh oh!

zdtsw commented Mar 9, 2026

Uh oh!

kalantar commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants