This Helm chart deploys ComfyUI, a powerful and modular Stable Diffusion GUI, on Kubernetes. The project is open source and welcomes community contributions.
The Helm chart is optimized for deploying ComfyUI with GPU support on Kubernetes clusters. It integrates seamlessly with a provided Dockerfile, which builds a CUDA-enabled container image based on Ubuntu 22.04, ensuring optimal performance for ComfyUI.
-
Dockerfile Integration:
- Builds ComfyUI from the official repository.
- Installs the curated custom nodes needed by this environment during the image build instead of at container startup.
- Installs necessary system and Python dependencies.
- Sets the required environment variables (
NVIDIA_DRIVER_CAPABILITIES,LD_PRELOAD,PYTHONPATH,TORCH_CUDA_ARCH_LIST) at the image layer.NVIDIA_VISIBLE_DEVICESis injected per pod by the NVIDIA device plugin and must not be set invalues.yaml. - Runs ComfyUI server with optimized settings:
python3 main.py --listen 0.0.0.0 --port 8188 --cuda-malloc
-
Kubernetes Optimized:
- Deploys Kubernetes resources: Deployment, Service, Ingress, HPA, ServiceAccount.
- Enables NVIDIA GPU support.
- Exposes application on port 8188.
- Uses a generated service account name from the chart fullname helper (override via
serviceAccount.nameinvalues.yaml).
-
Customizable Image Settings:
Adjust Docker image details invalues.yaml:image: repository: ghcr.io/memenow/comfyui-helm tag: "" # Defaults to .Chart.AppVersion (currently v2.0.0) pullPolicy: IfNotPresent -
Flexible Service Exposure:
Supports ClusterIP, NodePort, LoadBalancer, and Ingress types. -
ERNIE-Image Ready:
- Follows the latest ComfyUI core, which includes native ERNIE-Image support.
- Keeps the required text and tokenizer dependencies available in the container image.
-
Custom Nodes Baked In:
- Preinstalls
ComfyUI-Manager,ComfyUI-GGUF,ComfyUI-KJNodes,ComfyUI-LTXVideo,ComfyUI-SeedVR2_VideoUpscaler,ComfyUI-VideoHelperSuite,ComfyUI-WanVideoWrapper, andNvidia_RTX_Nodes_ComfyUI. - Installs each node's Python dependencies during the Docker build so the container can start cleanly without first-run bootstrap steps.
- Preinstalls
- Kubernetes cluster with NVIDIA GPU nodes.
- Helm v3 installed.
- Docker installed for image building.
-
(Maintainers only) Bump Chart Version: When you ship a new image, bump
Chart.yamlappVersionto match the image tag and incrementversionper SemVer. Application users do not need to touch this file. -
Update Image Details:
Build the image locally with the GHCR-style name and tag that match the chart app version:docker build -t ghcr.io/memenow/comfyui-helm:v2.0.0 .You do not need to push the image if your cluster can access the local image cache.
If you want a reproducible build, you can pin the ComfyUI core or any curated custom node to a specific revision:
docker build \ --build-arg COMFYUI_REF=<comfyui-commit> \ --build-arg COMFYUI_LTXVIDEO_REF=<ltxvideo-ref> \ -t ghcr.io/memenow/comfyui-helm:v2.0.0 .Modify
values.yamlonly if you need a different repository or tag:image: repository: ghcr.io/memenow/comfyui-helm tag: "" # Defaults to .Chart.AppVersion (currently v2.0.0) pullPolicy: IfNotPresent -
Lint Chart (Optional):
helm lint . -
Deploy or Upgrade Chart:
Install:
helm install comfyui-helm .Upgrade existing deployment:
helm upgrade comfyui-helm . -
Horizontal Pod Autoscaler (HPA):
- HPA is disabled by default. ComfyUI is GPU-bound, so CPU/memory utilization is a poor scaling signal: replicas added under CPU pressure typically end up
Pendingbecause no GPU is free, and the per-replica model load makes scale-up slow. - For real autoscaling on GPU workloads, prefer KEDA with the NVIDIA DCGM exporter (
DCGM_FI_DEV_GPU_UTIL) or a queue-depth metric. KEDA also supports scale-to-zero, which is useful when the cluster is shared with other workloads. - The simple Resource-based HPA is still wired up for users who want it:
autoscaling: enabled: true minReplicas: 1 maxReplicas: 5 targetCPUUtilizationPercentage: 80
- HPA is disabled by default. ComfyUI is GPU-bound, so CPU/memory utilization is a poor scaling signal: replicas added under CPU pressure typically end up
-
Service Exposure Options:
- ClusterIP: Default internal access; use port-forwarding externally.
- NodePort: Exposes service externally on node ports.
- LoadBalancer: Automatically provisions external IP if supported.
- Ingress: Enable and configure ingress in
values.yaml.
export POD_NAME=$(kubectl get pods -l "app.kubernetes.io/name=comfyui-helm" -o jsonpath="{.items.metadata.name}")
kubectl port-forward $POD_NAME 8188:8188
Visit http://127.0.0.1:8188.
Retrieve NodePort number:
kubectl get svc comfyui-helm -o=jsonpath='{.spec.ports[?(@.name=="http")].nodePort}'
Access via http://:.
Get external IP:
kubectl get svc comfyui-helm -o=jsonpath='{.status.loadBalancer.ingress.ip}'
Access via external IP at port 8188.
Configure ingress in values.yaml, ensure DNS setup, then access using the configured hostname.
ComfyUI uses a WebSocket connection at /ws to stream progress updates and previews. Most ingress controllers buffer responses or apply short read timeouts that break this channel; set the controller-specific annotations below.
NGINX Ingress (ingress-nginx):
ingress:
enabled: true
className: nginx
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-buffering: "off"
nginx.ingress.kubernetes.io/proxy-body-size: "200m"Traefik (IngressRoute or ingress.annotations):
ingress:
annotations:
traefik.ingress.kubernetes.io/router.middlewares: "default-comfyui-buffering@kubernetescrd"…and apply a Traefik middleware that disables response buffering and lifts the read timeout. For other controllers, look up their equivalents for "disable proxy buffering" and "WebSocket idle timeout".
Models, generated outputs, and user state are lost on pod restart unless backed by a PersistentVolumeClaim. The chart exposes four opt-in mounts under persistence.*; each one creates its own claim (or reuses an existingClaim) and mounts it at the canonical ComfyUI path.
| Key | Mount path | Default size |
|---|---|---|
persistence.models |
/app/ComfyUI/models |
200Gi |
persistence.output |
/app/ComfyUI/output |
50Gi |
persistence.input |
/app/ComfyUI/input |
10Gi |
persistence.user |
/app/ComfyUI/user |
10Gi |
Default access mode is ReadWriteMany because the chart is intended to scale beyond a single replica. Use a RWX-capable backend such as AWS EFS / Mountpoint-S3 CSI, GCP Filestore, Azure Files, CephFS, or Longhorn RWX. If you only run a single replica on block storage, override accessModes with [ReadWriteOnce].
Example (AWS EKS with the EFS CSI driver):
persistence:
models:
enabled: true
storageClass: efs-sc
size: 500Gi
output:
enabled: true
storageClass: efs-sc
size: 100GiTo attach a pre-provisioned claim instead of creating one, set existingClaim:
persistence:
models:
enabled: true
existingClaim: my-shared-models-pvcFor bringing in additional model storage paths (for example, a separate volume per model family), use ComfyUI's extra_model_paths.yaml mechanism by mounting the file with the standard volumes / volumeMounts keys.
The chart is configured to satisfy the Pod Security Standards baseline profile out of the box:
- The container image runs as the unprivileged
comfyuiuser (UID/GID 10001). - Pod-level
securityContextenforcesrunAsNonRoot: true,seccompProfile: RuntimeDefault, and afsGroupso PVC mounts are writable by the user. - Container-level
securityContextdrops all capabilities, disables privilege escalation, and is ready to be flipped toreadOnlyRootFilesystem: trueonce writable paths are externalized viapersistence.*. - The ServiceAccount token is not projected into the pod (
automountServiceAccountToken: false) because ComfyUI does not call the Kubernetes API.
Do not set NVIDIA_VISIBLE_DEVICES in values.yaml. Kubernetes does not expand shell variables in env values, and overriding it defeats the per-pod GPU isolation provided by the NVIDIA device plugin.
To run under the restricted profile, enable persistence so writable directories live on PVCs, then set securityContext.readOnlyRootFilesystem: true.
nodeSelector defaults to empty so the pod lands on whichever node has a free nvidia.com/gpu. If your cluster does not advertise that resource cluster-wide, pin the pod to GPU nodes using either a manual label or the labels emitted by the NVIDIA GPU Operator plus Node Feature Discovery:
# Manual label
nodeSelector:
nvidia.com/gpu: "true"# NFD / GPU Operator
nodeSelector:
feature.node.kubernetes.io/pci-10de.present: "true"For GPU-flavored mixed clusters, also consider:
priorityClassNameto preempt batch / training jobs when interactive ComfyUI requests come in.topologySpreadConstraintsto spread replicas across zones and hosts when scaling out.podDisruptionBudget(opt-in) so node drains preserve at least one ComfyUI pod.networkPolicy(opt-in) to restrict ingress to your ingress controller's namespace and limit egress to model registries.
Beyond the device-plugin model, Kubernetes 1.35+ supports Dynamic Resource Allocation (DRA) with the NVIDIA DRA driver. Migrating to DRA is out of scope for this chart but is the recommended direction for clusters that need MIG, time-slicing, or richer GPU selectors.
Run provided tests:
helm test comfyui-helm
This container image tracks the latest ComfyUI core. As of the April 2026 ERNIE-Image launch, current upstream ComfyUI includes native support for Baidu ERNIE-Image and the immediate follow-up fixes.
After rebuilding the image, ERNIE-Image models can be mounted into the normal ComfyUI model directories under /app/ComfyUI/models.
References:
This project is open source—contributions are encouraged! Fork the repository, submit issues or feature requests, and create pull requests to improve this Helm chart.
See the LICENSE file for licensing details.
For more information on ComfyUI, visit the official ComfyUI GitHub repository.