|
| 1 | +# Project Overview |
| 2 | + |
| 3 | +This repository provisions and manages a GitOps-driven Kubernetes cluster using Talos OS and K3s, deployed onto Proxmox VMs. Argo CD is used to orchestrate continuous delivery of infrastructure and application manifests structured with Kustomize and Helm. |
| 4 | + |
| 5 | +Applications are categorized into `infrastructure`, `monitoring`, and `my-apps`, each managed by its own Argo CD `ApplicationSet` and GitOps workflow. |
| 6 | + |
| 7 | +# Folder Structure |
| 8 | + |
| 9 | +- `/bootstrap/`: Cluster bootstrap and Argo CD bootstrapping |
| 10 | +- `/infrastructure/`: Cluster controllers and components (Cilium, Longhorn, Vault plugin, CRDs) |
| 11 | +- `/monitoring/`: Prometheus, Grafana, Loki, and related stacks |
| 12 | +- `/my-apps/`: User applications deployed with GPU configs, Helm charts, and Gateway routing |
| 13 | +- `/apps/`, `/infrastructure/controllers/`: Subdirectories containing Kustomize or Helm-based deployments |
| 14 | +- `/terraform/`, `/packer/`, `/talos/`: VM provisioning and image building layers |
| 15 | + |
| 16 | +# Argo CD ApplicationSet Strategy |
| 17 | + |
| 18 | +## ApplicationSet: `infrastructure` |
| 19 | + |
| 20 | +```yaml |
| 21 | + - path: infrastructure/controllers/* |
| 22 | + - path: infrastructure/database/*/* |
| 23 | + - path: infrastructure/networking/* |
| 24 | + - path: infrastructure/storage/* |
| 25 | + - path: infrastructure/crds |
| 26 | +``` |
| 27 | +
|
| 28 | +- Sync wave: `"1"` (after Argo CD bootstrap, before apps) |
| 29 | +- Namespace: `{{path.basename}}` |
| 30 | +- Project: `infrastructure` |
| 31 | +- Use `ignoreDifferences` for CRDs (e.g., `preserveUnknownFields`) |
| 32 | +- Sync options: |
| 33 | + - `CreateNamespace=true` |
| 34 | + - `ServerSideApply=true` |
| 35 | + - `RespectIgnoreDifferences=true` |
| 36 | + - `ApplyOutOfSyncOnly=true` |
| 37 | +- Retry: exponential backoff, max 5 attempts, 3m cap |
| 38 | + |
| 39 | +## ApplicationSet: `monitoring` |
| 40 | + |
| 41 | +```yaml |
| 42 | + - path: monitoring/* |
| 43 | +``` |
| 44 | + |
| 45 | +- Sync wave: `"0"` (early sync) |
| 46 | +- Project: `monitoring` |
| 47 | +- Similar sync options and retry strategy |
| 48 | +- Uses `info` fields for Argo CD UI annotation: |
| 49 | + ```yaml |
| 50 | + info: |
| 51 | + - name: Description |
| 52 | + value: Monitoring component: {{path.basename}} |
| 53 | + ``` |
| 54 | + |
| 55 | +## ApplicationSet: `my-apps` |
| 56 | + |
| 57 | +```yaml |
| 58 | + - path: my-apps/*/* |
| 59 | +``` |
| 60 | + |
| 61 | +- Sync wave: `"2"` (after infra + monitoring) |
| 62 | +- Project: `my-apps` |
| 63 | +- Used for user apps like `comfyui`, `ollama`, etc. |
| 64 | +- Auto namespace creation, Helm + Kustomize integration |
| 65 | +- `ApplicationSet` dynamically generates per app path |
| 66 | +- Supports GPU workloads, Gateway integration, custom storage |
| 67 | + |
| 68 | +# GitOps Best Practices |
| 69 | + |
| 70 | +- All ApplicationSets use `git.directories` generator to reflect file layout in Git |
| 71 | +- Use declarative Kustomize overlays or Helm charts per app/environment |
| 72 | +- Every folder rendered must include a valid `kustomization.yaml` |
| 73 | +- Helm values go into `values.yaml` in app folders |
| 74 | +- All apps must support `kustomize build` locally |
| 75 | +- Use `ignoreDifferences` for Helm-managed labels, CRDs, and known noisy fields |
| 76 | + |
| 77 | +# Helm + Kustomize Integration |
| 78 | + |
| 79 | +Refer to `1password-connect` as a pattern: |
| 80 | + |
| 81 | +```yaml |
| 82 | +helmCharts: |
| 83 | + - name: connect |
| 84 | + repo: https://1password.github.io/connect-helm-charts |
| 85 | + version: 2.0.2 |
| 86 | + releaseName: 1password-connect |
| 87 | + valuesFile: values.yaml |
| 88 | + includeCRDs: true |
| 89 | +``` |
| 90 | + |
| 91 | +Patch Helm resources with Kustomize to: |
| 92 | +- Add `HTTPRoute` (Gateway API) |
| 93 | +- Mount PVCs or add ConfigMaps |
| 94 | +- Inject GPU tolerations and node selectors |
| 95 | + |
| 96 | +# GPU Workload Standards |
| 97 | + |
| 98 | +Apps like `ollama` and `comfyui` follow: |
| 99 | + |
| 100 | +- GPU scheduling using: |
| 101 | + - `nvidia.com/gpu` requests |
| 102 | + - Tolerations for `gpu=true` |
| 103 | + - Node selectors for `pci-0300_10de` |
| 104 | +- Runtime classes: `nvidia` |
| 105 | +- PVCs for `/root`, `/models`, etc. |
| 106 | +- ConfigMaps for GPU runtime tuning |
| 107 | +- Liveness and readiness probes with cold start tolerances |
| 108 | +- Gateway API exposure via `HTTPRoute` + custom hostnames |
| 109 | + |
| 110 | +# App Structure Conventions |
| 111 | + |
| 112 | +Each app directory contains: |
| 113 | + |
| 114 | +- `namespace.yaml` |
| 115 | +- `deployment.yaml` |
| 116 | +- `pvc.yaml` |
| 117 | +- `service.yaml` |
| 118 | +- `httproute.yaml` (for Gateway) |
| 119 | +- Optional `configmap.yaml`, `secret.yaml`, `values.yaml` |
| 120 | +- Labeled with: |
| 121 | + ```yaml |
| 122 | + labels: |
| 123 | + app: <name> |
| 124 | + app.kubernetes.io/name: <name> |
| 125 | + app.kubernetes.io/component: <component> |
| 126 | + ``` |
| 127 | + |
| 128 | +# Cluster Tooling |
| 129 | + |
| 130 | +- Kubernetes: K3s |
| 131 | +- OS: Talos OS |
| 132 | +- GitOps: Argo CD + ApplicationSet |
| 133 | +- Networking: Cilium, MetalLB, Gateway API |
| 134 | +- Storage: Longhorn |
| 135 | +- Monitoring: Prometheus, Grafana, Loki |
| 136 | +- Secrets: Argo Vault Plugin + 1Password |
| 137 | +- GPU Workloads: RuntimeClass, tolerations, securityContext |
0 commit comments