1- # ComfyUI on Kubernetes (Talos)
1+ # ComfyUI on Kubernetes (Talos) with ArgoCD
22
3- This directory contains Kubernetes manifests to deploy ComfyUI with GPU support on Talos Linux.
3+ This directory contains Kubernetes manifests to deploy ComfyUI with GPU support on Talos Linux via ArgoCD .
44
55## Files Structure
66
77- ` namespace.yaml ` - ComfyUI namespace
8- - ` pvc.yaml ` - Persistent Volume Claim for models and outputs (100GB )
8+ - ` pvc.yaml ` - Persistent Volume Claim for models and outputs (180GB, Longhorn single replica )
99- ` configmap.yaml ` - Configuration for model paths
1010- ` deployment.yaml ` - Main ComfyUI deployment with GPU support
1111- ` service.yaml ` - ClusterIP and NodePort services
1212- ` httproute.yaml ` - HTTPRoute configuration for Gateway API
1313- ` kustomization.yaml ` - Kustomize configuration
14- - ` setup-comfyui.sh ` - Automated setup script
15- - ` comfyui-manifests.yaml ` - Single file with all manifests (for reference)
14+ - ` setup-comfyui.sh ` - Post-deployment setup script for models and workflows
15+ - ` README.md ` - This documentation
1616
1717## Prerequisites for Talos
1818
@@ -21,34 +21,38 @@ This directory contains Kubernetes manifests to deploy ComfyUI with GPU support
2121 ``` bash
2222 kubectl label nodes < your-gpu-node> accelerator=nvidia-gpu
2323 ```
24- 3 . ** Storage** : Configure appropriate storage class (default: ` local-path ` )
24+ 3 . ** Storage** : Longhorn configured (single replica for space efficiency )
25254 . ** Gateway API** : Ensure Gateway API is installed and configured in your cluster
26+ 5 . ** ArgoCD** : This setup assumes deployment via ArgoCD
2627
27- ## Quick Deployment
28+ ## Deployment Workflow
2829
29- ### Option 1: Using the Setup Script (Recommended)
30+ ### 1. ArgoCD Deployment
31+ ArgoCD will automatically deploy the manifests. Ensure your ArgoCD application points to this directory.
32+
33+ ### 2. Post-Deployment Setup (Models & Workflows)
34+ After ArgoCD deploys ComfyUI, run the setup script to install models and custom nodes:
3035
3136``` bash
37+ # Navigate to the ComfyUI directory
38+ cd my-apps/ai/comfyui
39+
3240# Make the script executable
3341chmod +x setup-comfyui.sh
3442
35- # Run the complete setup
43+ # Run post-deployment setup
3644./setup-comfyui.sh
3745```
3846
39- ### Option 2: Manual Deployment
47+ ### 3. Manual Deployment (Alternative)
48+ If you need to deploy manually without ArgoCD:
4049
4150``` bash
4251# Apply all manifests using kustomize
4352kubectl apply -k .
4453
45- # Or apply individual files
46- kubectl apply -f namespace.yaml
47- kubectl apply -f pvc.yaml
48- kubectl apply -f configmap.yaml
49- kubectl apply -f deployment.yaml
50- kubectl apply -f service.yaml
51- kubectl apply -f httproute.yaml
54+ # Then run the setup script
55+ ./setup-comfyui.sh
5256```
5357
5458## Features
@@ -64,88 +68,73 @@ kubectl apply -f httproute.yaml
6468 - ComfyUI Essentials
6569 - Custom Scripts
6670 - RGThree Comfy
67-
68- ### Pre-downloaded Models
69- - ** SDXL Base 1.0** - Main diffusion model
70- - ** SDXL VAE** - Variational autoencoder
71- - ** ControlNet Models** - Canny and OpenPose
72- - ** RealESRGAN 4x** - Upscaling model
73-
74- ### Default Workflow
75- - Basic SDXL generation workflow
76- - Located at: ` /opt/ComfyUI/user/default/workflows/basic_sdxl.json `
71+ - Flux-specific nodes (FluxTrainer, GGUF)
72+ - WAS Node Suite & Efficiency Nodes
73+
74+ ### Pre-downloaded Models (via setup script)
75+ - ** Flux Dev BF16 & FP8** - Latest 12B parameter models for exceptional quality
76+ - ** CyberRealistic Pony v11** - Popular photorealistic model
77+ - ** SDXL Base 1.0** - Stable foundation model
78+ - ** Flux & SDXL VAEs** - High-quality decoders
79+ - ** Flux ControlNet** - Canny and Depth control
80+ - ** Traditional ControlNet** - Canny and OpenPose
81+ - ** RealESRGAN Upscalers** - General and Anime variants
82+ - ** CyberRealistic Embeddings** - Optimized prompt tokens
83+
84+ ### Pre-configured Workflows
85+ - ** Flux Dev Workflow** - High-quality photorealistic generation
86+ - ** CyberRealistic Pony Workflow** - Versatile realistic content
7787
7888## Resource Requirements
7989
80- - ** CPU** : 2-4 cores
81- - ** Memory** : 8-16 GB
82- - ** GPU** : 1x NVIDIA GPU
83- - ** Storage** : 100GB for models and outputs
90+ - ** CPU** : 2-8 cores
91+ - ** Memory** : 8-32 GB
92+ - ** GPU** : 1x NVIDIA GPU (optimized for 24GB VRAM)
93+ - ** Storage** : 180GB Longhorn (single replica for efficiency)
8494
8595## Access Methods
8696
87- ### 1. NodePort (Direct Access)
97+ ### 1. HTTPRoute (Primary - Domain Access)
98+ - URL: ` https://comfyui.vanillax.me `
99+ - Uses Gateway API with ` gateway-internal `
100+
101+ ### 2. NodePort (Direct Access)
88102``` bash
89103# Access via node IP on port 30188
90104http://< NODE_IP> :30188
91105```
92106
93- ### 2 . Port Forward (Local Development)
107+ ### 3 . Port Forward (Local Development)
94108``` bash
95109kubectl port-forward -n comfyui service/comfyui-service 8188:8188
96110# Access at: http://localhost:8188
97111```
98112
99- ### 3. HTTPRoute (Domain Access)
100- Update ` httproute.yaml ` with your domain and gateway configuration:
101- ``` yaml
102- hostnames :
103- - comfyui.your-domain.com
104- parentRefs :
105- - name : your-gateway-name
106- namespace : gateway-system
107- ` ` `
108- Then access via: ` http://comfyui.your-domain.com`
109-
110113## Configuration
111114
112- # ## Gateway Configuration
113- Update the HTTPRoute in `httproute.yaml` :
114- ` ` ` yaml
115- spec:
116- parentRefs:
117- - name: your-gateway-name
118- namespace: gateway-system
119- hostnames:
120- - comfyui.your-domain.com
121- ` ` `
122-
123- # ## Storage Class
124- Default uses `local-path`. Update in `pvc.yaml` :
125- ` ` ` yaml
126- storageClassName: your-storage-class
127- ` ` `
115+ ### Storage (Longhorn)
116+ - Single replica for space efficiency
117+ - 180GB capacity for models and outputs
118+ - Persistent across pod restarts
128119
129120### Resource Limits
130- Adjust in `deployment.yaml` :
121+ Optimized for 24GB GPU systems :
131122``` yaml
132123resources :
133124 requests :
134125 memory : " 8Gi"
135126 cpu : " 2"
136127 nvidia.com/gpu : 1
137128 limits :
138- memory: "16Gi "
139- cpu: "4 "
129+ memory : " 32Gi "
130+ cpu : " 8 "
140131 nvidia.com/gpu : 1
141132` ` `
142133
143- # ## Node Selection
144- Update node selector in `deployment.yaml` :
145- ` ` ` yaml
146- nodeSelector:
147- accelerator: nvidia-gpu
148- ` ` `
134+ ### Gateway Configuration
135+ HTTPRoute configured for:
136+ - Gateway: ` gateway-internal` in `gateway` namespace
137+ - Domain : ` comfyui.vanillax.me`
149138
150139# # Monitoring
151140
@@ -160,10 +149,11 @@ kubectl get httproute -n comfyui
160149# # Troubleshooting
161150
1621511. **Pod not starting** : Check GPU node labels and availability
163- 2. **Storage issues** : Verify storage class and PVC status
164- 3. **Model download issues** : Check pod logs for download progress
152+ 2. **Storage issues** : Verify Longhorn status and PVC binding
153+ 3. **Model download issues** : Check pod logs during setup script execution
1651544. **GPU not detected** : Ensure NVIDIA device plugin is running
1661555. **HTTPRoute not working** : Check Gateway API installation and gateway configuration
156+ 6. **ArgoCD sync issues** : Verify kustomization.yaml and resource files
167157
168158# # Customization
169159
@@ -180,10 +170,35 @@ Use ComfyUI Manager web interface or install manually:
180170kubectl exec -n comfyui $POD_NAME -- bash -c "cd /opt/ComfyUI/custom_nodes && git clone <repo-url>"
181171` ` `
182172
173+ # ## Updating Models via Script
174+ Re-run the setup script to add new models :
175+ ` ` ` bash
176+ ./setup-comfyui.sh
177+ ` ` `
178+
179+ # # ArgoCD Integration
180+
181+ # ## Application Configuration
182+ Ensure your ArgoCD application configuration includes :
183+ ` ` ` yaml
184+ spec:
185+ source:
186+ path: my-apps/ai/comfyui
187+ repoURL: <your-repo>
188+ targetRevision: HEAD
189+ destination:
190+ namespace: comfyui
191+ server: https://kubernetes.default.svc
192+ ` ` `
193+
194+ # ## Sync Strategy
195+ - **Automatic Sync**: Recommended for seamless updates
196+ - **Manual Sync**: For controlled deployments
197+
183198# # Notes
184199
185- - First startup takes time for model downloads
186- - ComfyUI Manager provides easy model and node management
187- - Models persist in the PVC across pod restarts
188- - The setup script handles initial configuration and model downloads
189- - HTTPRoute requires Gateway API to be installed in your cluster
200+ - ** First startup**: Takes time for model downloads (~50GB+)
201+ - ** ComfyUI Manager**: Provides easy model and node management via web UI
202+ - **Model persistence**: All models persist in Longhorn storage across restarts
203+ - **Setup script**: Only handles post-deployment configuration, not manifest deployment
204+ - **ArgoCD friendly**: All manifests are properly structured for GitOps workflows
0 commit comments