Skip to content

feat: add support for pvc when download large model from HF#232

Closed
zdtsw wants to merge 1 commit intollm-d-incubation:mainfrom
zdtsw:issue_857
Closed

feat: add support for pvc when download large model from HF#232
zdtsw wants to merge 1 commit intollm-d-incubation:mainfrom
zdtsw:issue_857

Conversation

@zdtsw
Copy link
Copy Markdown
Contributor

@zdtsw zdtsw commented Mar 4, 2026

Changes

  • modelArtifacts.size can be set but if the node running the pod does not have enough storage it still can be a problem for the model to download. this is to add one new option "usePVC" so user can use it instead.
  • this only applied to model with protocol "hf://" as "oci" and "pvc" does not require this
  • it creates a PVC named as "$fullname-model-cache" and injected into deployment
  • by default storageclass to "" which is to cluster's default one, user can config to other if they want but they will need to create this pre running helm chart

Ref: llm-d/llm-d#857

- modelArtifacts.size can be set but if the node running the pod does
  not have enough storage it still can be a problem for the model to
  run. this is to add one new option "usePVC" so user can use it
  instead.
- by default storageclass to "" which is cluster's default one, user can
  config to other if they want

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Copilot AI review requested due to automatic review settings March 4, 2026 17:27
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in Helm chart feature to back Hugging Face (hf://) model downloads with a PersistentVolumeClaim to avoid failures due to insufficient node ephemeral storage.

Changes:

  • Introduce modelArtifacts.usePVC and a new model-cache-pvc.yaml template to create and mount a PVC for hf:// model storage.
  • Update volume generation logic to use persistentVolumeClaim when usePVC is enabled (HF only).
  • Bump chart version to v0.4.9 and refresh generated example manifests / pre-commit hook revision.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
examples/output-xpu.yaml Refresh example output (chart label version + whitespace).
examples/output-xpu-pd.yaml Refresh example output (chart label version + whitespace).
examples/output-requester.yaml Refresh example output (chart label version + whitespace).
examples/output-pvc.yaml Refresh example output (chart label version).
examples/output-pvc-hf.yaml Refresh example output (chart label version).
examples/output-pd.yaml Refresh example output (chart label version + whitespace).
examples/output-pd-mnnvl.yaml Refresh example output (chart label version + whitespace).
examples/output-heterogeneous-pd.yaml Refresh example output (chart label version + whitespace).
examples/output-gaudi.yaml Refresh example output (chart label version).
examples/output-dra.yaml Refresh example output (chart label version).
examples/output-cpu.yaml Refresh example output (chart label version + whitespace).
charts/llm-d-modelservice/values.yaml Add modelArtifacts.usePVC and document intended PVC options.
charts/llm-d-modelservice/values.schema.tmpl.json Add usePVC to schema template.
charts/llm-d-modelservice/values.schema.json Add usePVC to dereferenced schema.
charts/llm-d-modelservice/templates/model-cache-pvc.yaml New template to create <fullname>-model-cache PVC for HF cache.
charts/llm-d-modelservice/templates/_helpers.tpl Switch HF model-storage volume to PVC when usePVC enabled; inject claim name.
charts/llm-d-modelservice/Chart.yaml Chart version bump to v0.4.9.
.pre-commit-config.yaml Update pre-commit-hooks revision.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -450,7 +455,8 @@ context is a pdSpec
{{- end -}}
{{- end -}}
{{- if $hasModelVolume }}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single PVC name (<fullname>-model-cache) is being mounted into all pods that mount the model volume. With the default ReadWriteOnce access mode, this will commonly fail when decode/prefill land on different nodes, when replicas > 1, or in multinode (LWS) mode. Consider per-role/per-replica claims, requiring/setting ReadWriteMany, and/or surfacing a clear validation/error when usePVC is enabled with incompatible scheduling/scale.

Suggested change
{{- if $hasModelVolume }}
{{- if $hasModelVolume }}
{{- $usePVC := default false .Values.modelArtifacts.usePVC }}
{{- $pvcAccessMode := default "ReadWriteOnce" .Values.modelArtifacts.pvcAccessMode }}
{{- $decodeParallelism := default 1 .Values.decode.parallelism }}
{{- $prefillParallelism := default 1 .Values.prefill.parallelism }}
{{- $lwsEnabled := default false .Values.lws.enabled }}
{{- if and $usePVC (ne (upper $pvcAccessMode) "READWRITEMANY") (or (gt (int $decodeParallelism) 1) (gt (int $prefillParallelism) 1) $lwsEnabled) }}
{{- fail "modelArtifacts.usePVC with a single PVC requires pvcAccessMode=ReadWriteMany when using parallelism > 1 or LWS; configure per-role/per-replica PVCs or change the access mode." }}
{{- end }}

Copilot uses AI. Check for mistakes.
- {{ .Values.modelArtifacts.accessMode | default "ReadWriteOnce" }}
resources:
requests:
storage: {{ .Values.modelArtifacts.size | default "100Gi" }}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storage: {{ .Values.modelArtifacts.size | default "100Gi" }} is misleading because modelArtifacts.size already has a non-empty default in values.yaml (so default "100Gi" will never apply). Either remove the unused default or introduce a dedicated PVC sizing value (e.g., pvcSize) if you want a different default for PVC vs emptyDir.

Suggested change
storage: {{ .Values.modelArtifacts.size | default "100Gi" }}
storage: {{ .Values.modelArtifacts.size }}

Copilot uses AI. Check for mistakes.
Comment on lines 656 to 667
"uri": {
"default": "hf://{{ .Values.modelArtifacts.name }}",
"description": " hf://model/name - model as defined on Hugging Face pvc://pvc_name/path/to/model - model on existing persistant storage volume oci:// not yet supported",
"required": [],
"title": "uri"
},
"usePVC": {
"default": false,
"description": "Use a PersistentVolumeClaim instead of emptyDir for model storage (hf:// protocol only). Set to true for large models that exceed node's ephemeral storage capacity",
"title": "usePVC",
"type": "boolean"
}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as values.schema.json: modelArtifacts disallows unknown keys (additionalProperties: false), so if accessMode / storageClassName are intended to be user-settable for the new PVC feature, they need to be present in this schema template (likely by adding them as real keys in values.yaml and regenerating).

Copilot uses AI. Check for mistakes.
# type: string
# @schema
# storageClassName -- Storage class for the PVC. Only used when usePVC: true (uses cluster default if not specified)
# storageClassName: ""
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storageClassName is documented here but commented out, so it won’t exist in values.yaml and therefore won’t be allowed by values.schema.* (modelArtifacts has additionalProperties: false). If users set modelArtifacts.storageClassName, Helm schema validation will reject it; add this key (with a default like empty string) to values.yaml and regenerate the schemas.

Suggested change
# storageClassName: ""
storageClassName: ""

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two are commented out because usePVC by default is set to false then these wont take any effects, so better let user to uncomment it once they wanna set usePVC to true

# type: string
# @schema
# accessMode -- Access mode for the PVC. Only used when usePVC: true (default: ReadWriteOnce)
# accessMode: ReadWriteOnce
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as storageClassName: accessMode is referenced by the new PVC template, but it’s commented out in values.yaml and not present in the generated schema (modelArtifacts has additionalProperties: false). Add accessMode as a real values key and regenerate values.schema.tmpl.json / values.schema.json so users can configure it.

Suggested change
# accessMode: ReadWriteOnce
accessMode: ReadWriteOnce

Copilot uses AI. Check for mistakes.
Comment on lines 1972 to 1981
"description": " hf://model/name - model as defined on Hugging Face pvc://pvc_name/path/to/model - model on existing persistant storage volume oci:// not yet supported",
"required": [],
"title": "uri"
},
"usePVC": {
"default": false,
"description": "Use a PersistentVolumeClaim instead of emptyDir for model storage (hf:// protocol only). Set to true for large models that exceed node's ephemeral storage capacity",
"title": "usePVC",
"type": "boolean"
}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modelArtifacts schema sets additionalProperties: false, but the new PVC feature relies on additional keys (accessMode, storageClassName). As-is, users can’t set those values without failing schema validation; please add them to the schema (via values.yaml keys / helm-schema regen).

Copilot uses AI. Check for mistakes.
@poussa
Copy link
Copy Markdown
Contributor

poussa commented Mar 5, 2026

Why can't we use the existing pvc+hf URI?

modelArtifacts:
  uri: pvc+hf://pvc-name/path/to/hf_hub_cache/namespace/modelID

@zdtsw
Copy link
Copy Markdown
Contributor Author

zdtsw commented Mar 5, 2026

Why can't we use the existing pvc+hf URI?

modelArtifacts:
  uri: pvc+hf://pvc-name/path/to/hf_hub_cache/namespace/modelID

this requires user download model to the PVC first
https://github.com/llm-d-incubation/llm-d-modelservice/blob/main/examples/pvc/README.md#use-hf-downloaded-models-with-pvcs

@kalantar
Copy link
Copy Markdown
Collaborator

kalantar commented Mar 6, 2026

Why can't we manually add a pvc and manually mount it? That is, use extraObjects to define the PVC. Add an entry in volumes and mount it where HF_HOME is set.

@zdtsw
Copy link
Copy Markdown
Contributor Author

zdtsw commented Mar 6, 2026

extraObjects

You are right. I was not aware of extraObjects when I made the change.
Still, I think flipping usePVC: true and setting size + keep HF_HOME path default is simpler than configuring extraObjects + decode.volumes + might need a different HF_HOME path

wdyt?

@poussa
Copy link
Copy Markdown
Contributor

poussa commented Mar 6, 2026

extraObjects

You are right. I was not aware of extraObjects when I made the change. Still, I think flipping usePVC: true and setting size + keep HF_HOME path default is simpler than configuring extraObjects + decode.volumes + might need a different HF_HOME path

wdyt?

My vote goes to extraObjects

@clubanderson
Copy link
Copy Markdown
Contributor

/review

@kalantar
Copy link
Copy Markdown
Collaborator

kalantar commented Mar 9, 2026

I feel like unless we have more demand for it; extraObjects for now.

@zdtsw
Copy link
Copy Markdown
Contributor Author

zdtsw commented Mar 9, 2026

I feel like unless we have more demand for it; extraObjects for now.

If it is not really helpful we can close this PR
but we will need more documentation on this. I dont know if it is better address it in llm-d repo or here : llm-d/llm-d#857 (comment)

@kalantar
Copy link
Copy Markdown
Collaborator

kalantar commented Mar 9, 2026

I feel like unless we have more demand for it; extraObjects for now.

If it is not really helpful we can close this PR but we will need more documentation on this. I dont know if it is better address it in llm-d repo or here : llm-d/llm-d#857 (comment)

It is documented here: https://github.com/llm-d-incubation/llm-d-modelservice/blob/main/README.md?plain=1#L99. However, agree that it isn't very prominent. More examples contributed would also be good. Closing issue.

@kalantar kalantar closed this Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants