Skip to content

Commit 3fa0fb5

Browse files
committed
updates
1 parent 120319f commit 3fa0fb5

3 files changed

Lines changed: 172 additions & 51 deletions

File tree

docs/vpa-resource-optimization.md

Lines changed: 167 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# VPA Resource Optimization Guide
22

3-
How to use VPA, Goldilocks, and Kyverno to right-size Kubernetes resource requests based on actual workload behavior.
3+
How to use VPA and Goldilocks to right-size Kubernetes resource requests based on actual workload behavior.
44

55
## TL;DR — Just Tell Me What To Do
66

7-
**Everything is automatic.** VPA is already watching every workload in the cluster. You don't need to set anything up.
7+
**Everything is automatic.** Goldilocks auto-creates VPA resources for every workload in the cluster. You don't need to set anything up.
88

99
### Step 1: Open the dashboard
1010

@@ -56,60 +56,124 @@ MEM:.status.recommendation.containerRecommendations[0].target.memory
5656

5757
---
5858

59-
## The Toolchain
60-
61-
| Tool | What It Does | Location |
62-
|------|-------------|----------|
63-
| **metrics-server** | Provides `metrics.k8s.io` API (CPU/memory data from kubelet) | `infrastructure/controllers/metrics-server/` |
64-
| **VPA** (Vertical Pod Autoscaler) | Analyzes metrics, generates resource recommendations | `infrastructure/controllers/vertical-pod-autoscaler/` |
65-
| **Goldilocks** | Auto-creates VPA resources for all workloads AND provides web dashboard to visualize recommendations | `infrastructure/controllers/goldilocks/` |
66-
67-
### How They Fit Together
59+
## Architecture
6860

6961
```
7062
kubelet /metrics/resource
71-
|
72-
v
63+
64+
7365
metrics-server (provides metrics.k8s.io API)
74-
|
75-
v
76-
VPA Recommender (reads metrics, writes recommendations to VPA status)
77-
^
78-
|
79-
Goldilocks Controller (on-by-default: true, auto-creates VPA for all workloads)
80-
|
81-
v
82-
VPA resources (one per workload, updateMode: "Off")
83-
|
84-
v
85-
Goldilocks Dashboard (reads VPA recommendations, shows per-namespace view)
86-
|
87-
v
66+
67+
68+
VPA Recommender (reads metrics, writes recommendations to VPA .status)
69+
70+
71+
Goldilocks Controller (on-by-default: "true")
72+
│ • watches all namespaces
73+
│ • auto-creates a VPA (updateMode: "Off") for every Deployment, StatefulSet, DaemonSet
74+
75+
VPA resources (one per workload, recommend-only)
76+
77+
78+
Goldilocks Dashboard (reads VPA .status, renders per-namespace view)
79+
80+
8881
Human reviews → updates values.yaml → Git push → ArgoCD applies
8982
```
9083

91-
**Key point**: Goldilocks with `on-by-default: "true"` auto-creates VPA resources for all Deployments, StatefulSets, and DaemonSets cluster-wide. No Kyverno policy or manual VPA resources needed.
84+
**Goldilocks is the sole VPA creator.** With `on-by-default: "true"`, it auto-creates VPA resources for all workloads cluster-wide. No manual VPA manifests needed.
85+
86+
---
87+
88+
## Components
89+
90+
| Component | Chart | Version | Namespace | Location |
91+
|-----------|-------|---------|-----------|----------|
92+
| **metrics-server** | `metrics-server/metrics-server` || `kube-system` | `infrastructure/controllers/metrics-server/` |
93+
| **VPA** | `fairwinds-stable/vpa` | 4.10.1 | `vertical-pod-autoscaler` | `infrastructure/controllers/vertical-pod-autoscaler/` |
94+
| **Goldilocks** | `fairwinds-stable/goldilocks` | 10.3.0 | `goldilocks` | `infrastructure/controllers/goldilocks/` |
95+
96+
All three are deployed via the **Infrastructure ApplicationSet** (Wave 4).
97+
98+
### VPA Sub-Components
9299

93-
## Accessing the Dashboard
100+
| Component | Purpose |
101+
|-----------|---------|
102+
| **Recommender** | Analyzes metrics, generates recommendations |
103+
| **Updater** | Applies changes when mode is not Off (evicts or in-place resizes) |
104+
| **Admission Controller** | Sets resources on new pods when mode is not Off |
94105

95-
**Goldilocks Dashboard**: https://goldilocks.vanillax.me
106+
Currently the cluster runs VPA in **Off mode** — recommendations only, no automatic changes.
96107

97-
This is routed via the internal gateway (`gateway-internal`). No port-forward needed if you're on the LAN.
108+
---
109+
110+
## Goldilocks Dashboard
111+
112+
### Accessing the Dashboard
113+
114+
**URL**: https://goldilocks.vanillax.me (routed via `gateway-internal`, LAN/VPN only)
98115

99-
Fallback (if gateway is down):
116+
Fallback if gateway is down:
100117
```bash
101118
kubectl port-forward -n goldilocks svc/goldilocks-dashboard 8080:80
102119
# Open http://localhost:8080
103120
```
104121

105-
The dashboard shows every namespace with VPA-enabled workloads. For each container it displays:
106-
- Current resource requests/limits
107-
- VPA lower bound, target, and upper bound
108-
- Suggested `requests` and `limits` YAML you can copy-paste
122+
### What the Dashboard Shows
109123

110-
## Reading VPA Recommendations
124+
For each namespace, every workload with a VPA gets a card showing:
125+
- **Current requests/limits** — what's set in the deployment spec
126+
- **Guaranteed QoS** — suggested `requests` YAML block (requests = limits)
127+
- **Burstable QoS** — suggested `requests` YAML block (requests only, no limits)
128+
- **Lower bound, Target, Upper bound** per container
129+
130+
### Excluding Namespaces
131+
132+
With `on-by-default: "true"`, all namespaces are included. To exclude one:
133+
134+
```bash
135+
kubectl label namespace <ns> goldilocks.fairwinds.com/enabled=false
136+
```
137+
138+
---
139+
140+
## CLI Tools & Scripts
141+
142+
### vpa-report.sh
143+
144+
The `scripts/vpa-report.sh` script provides a formatted table of all VPA recommendations with human-readable values.
145+
146+
```bash
147+
# All namespaces
148+
./scripts/vpa-report.sh
149+
150+
# Single namespace
151+
./scripts/vpa-report.sh argocd
152+
```
153+
154+
Example output:
155+
```
156+
==========================================
157+
VPA Resource Recommendations Report
158+
==========================================
159+
160+
NAMESPACE WORKLOAD CONTAINER CPU TGT CPU RANGE MEM TGT MEM RANGE
161+
-------------------------------------------------------------------------------------------------------------------------------------------------
162+
argocd Deployment/argocd-server server 23m 12m-100m 175Mi 88Mi-700Mi
163+
argocd Deployment/argocd-repo-server repo-server 2975m 1488m-11900m 523Mi 262Mi-2.0Gi
164+
goldilocks Deployment/goldilocks-controller goldilocks 12m 6m-48m 64Mi 32Mi-256Mi
165+
...
166+
167+
Total: 42 containers with VPA recommendations
168+
169+
Action needed if your current request is:
170+
< lowerBound → INCREASE NOW (pod is being throttled)
171+
< target → INCREASE (under-provisioned)
172+
≈ target → KEEP (well-tuned)
173+
> 2x target → DECREASE (over-provisioned)
174+
```
111175

112-
### Via kubectl
176+
### kubectl One-Liners
113177

114178
```bash
115179
# Quick overview: all VPA targets across the cluster
@@ -124,9 +188,18 @@ kubectl get vpa -n argocd -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{r
124188

125189
# Full detail for a specific VPA
126190
kubectl describe vpa <name> -n <namespace>
191+
192+
# Current resource usage vs requests (side-by-side comparison)
193+
kubectl top pods -n <namespace>
194+
kubectl get deploy <name> -n <ns> -o jsonpath='{.spec.template.spec.containers[0].resources}'
195+
kubectl get vpa <name> -n <ns> -o jsonpath='{.status.recommendation.containerRecommendations[0].target}'
127196
```
128197

129-
### Understanding the Four Values
198+
---
199+
200+
## Reading Recommendations
201+
202+
### The Four VPA Values
130203

131204
VPA recommendations include four values per container:
132205

@@ -144,8 +217,6 @@ VPA recommendations include four values per container:
144217
- `1073741824` = 1Gi
145218
- `1610612736` = 1.5Gi
146219

147-
## When to Change Resources
148-
149220
### Decision Matrix
150221

151222
| Situation | Action | Priority |
@@ -163,9 +234,13 @@ VPA recommendations include four values per container:
163234
- **Re-check after major changes** (new features, traffic spikes, version upgrades). VPA is backward-looking.
164235
- **Upper bounds stabilize over ~14 days**. They'll be very wide initially.
165236

166-
### How to Apply Changes
237+
---
238+
239+
## Applying Changes (GitOps Workflow)
167240

168-
1. Read the VPA recommendation (Goldilocks dashboard or kubectl)
241+
### Step-by-Step
242+
243+
1. Read the VPA recommendation (Goldilocks dashboard or `./scripts/vpa-report.sh`)
169244
2. Update the app's `values.yaml` with new resource requests
170245
3. Add a comment documenting the VPA data and reasoning:
171246

@@ -193,6 +268,8 @@ resources:
193268
| `limits.cpu` | 2-4x request (allows burst). Or omit entirely to let pods burst freely. |
194269
| `limits.memory` | 2-4x request (or match VPA `upperBound` if spikes are expected) |
195270

271+
---
272+
196273
## Common Workload Patterns
197274

198275
### CPU-Bound (Helm rendering, image processing)
@@ -222,6 +299,8 @@ Example: argocd-server
222299
### GPU Workloads
223300
VPA only tracks CPU/memory, not GPU. Recommendations will show low CPU/memory because compute happens on GPU VRAM. Set CPU/memory based on data loading needs, not inference.
224301
302+
---
303+
225304
## Real-World Example: ArgoCD Optimization
226305
227306
### Before (manual guesswork)
@@ -257,26 +336,46 @@ Total: 5.2 CPU, 2.8Gi memory
257336
258337
See `infrastructure/controllers/argocd/values.yaml` for the actual implementation with inline VPA documentation.
259338
260-
## Excluded Namespaces
261-
262-
Goldilocks can be configured to exclude namespaces via the `goldilocks.fairwinds.com/enabled=false` label. By default with `on-by-default: "true"`, all namespaces are included.
339+
---
263340
264-
## K8s 1.35: In-Place Pod Resize (Future)
341+
## In-Place Pod Resize (K8s 1.35)
265342
266343
This cluster runs K8s v1.35.1 where In-Place Pod Resize is GA. VPA supports `updateMode: "InPlaceOrRecreate"` which resizes pods **without restarting them** when possible.
267344
268-
Currently we use `updateMode: "Off"` (manual review). When confident in VPA accuracy after 2-4 weeks of observation, you can switch individual workloads to `InPlaceOrRecreate`:
345+
Currently we use `updateMode: "Off"` (manual review via Goldilocks). When confident in VPA accuracy after 2-4 weeks of observation, you can enable auto-tuning per workload.
346+
347+
### How to Enable
348+
349+
Goldilocks creates VPAs with `updateMode: "Off"`. To enable in-place resize for a specific workload, create a manual VPA that overrides the Goldilocks-managed one:
269350
270351
```yaml
271352
apiVersion: autoscaling.k8s.io/v1
272353
kind: VerticalPodAutoscaler
354+
metadata:
355+
name: my-app # Must match Goldilocks VPA name
356+
namespace: my-app
357+
labels:
358+
goldilocks.fairwinds.com/enabled: "false" # Prevent Goldilocks from overwriting
273359
spec:
360+
targetRef:
361+
apiVersion: apps/v1
362+
kind: Deployment
363+
name: my-app
274364
updatePolicy:
275365
updateMode: "InPlaceOrRecreate" # Live resize when possible
276366
```
277367

278368
**Start with non-critical workloads** (dev tools, media apps) before enabling on infrastructure.
279369

370+
### How It Works
371+
372+
1. VPA Updater watches pods with `InPlaceOrRecreate` mode
373+
2. If recommendation differs significantly from current resources, it patches the pod spec
374+
3. Kernel applies new CPU/memory limits **without restarting** the container (when supported)
375+
4. If in-place resize fails, pod is evicted and recreated with new resources
376+
377+
---
378+
280379
## Troubleshooting
281380

282381
### No recommendations showing
@@ -287,7 +386,8 @@ spec:
287386
### Goldilocks dashboard is empty
288387
- Check if Goldilocks controller is running: `kubectl get pods -n goldilocks`
289388
- Goldilocks is set to `on-by-default: "true"` — all namespaces should appear
290-
- VPA resources are created by Goldilocks automatically for all workloads
389+
- Check Goldilocks controller logs: `kubectl logs -n goldilocks -l app.kubernetes.io/name=goldilocks,app.kubernetes.io/component=controller`
390+
- Verify VPA CRDs are installed: `kubectl get crd verticalpodautoscalers.autoscaling.k8s.io`
291391

292392
### VPA recommendations seem too high/low
293393
- Not enough data — wait 7-14 days
@@ -300,12 +400,22 @@ spec:
300400
- Set `limits.memory` well above `requests.memory` (2-4x)
301401
- Check startup memory with `kubectl top pod` during pod init
302402

403+
### Duplicate VPA resources
404+
- Goldilocks is the sole VPA creator — if you see duplicates, check for manually created VPAs
405+
- Remove any hand-crafted VPA manifests from Git and let Goldilocks manage them
406+
407+
---
408+
303409
## Quick Reference
304410

305411
```bash
306412
# Goldilocks dashboard (LAN)
307413
https://goldilocks.vanillax.me
308414

415+
# Human-readable VPA report
416+
./scripts/vpa-report.sh
417+
./scripts/vpa-report.sh <namespace>
418+
309419
# All VPA recommendations (cluster-wide)
310420
kubectl get vpa -A -o custom-columns=\
311421
NS:.metadata.namespace,\
@@ -319,6 +429,13 @@ kubectl top pods -n <namespace>
319429
# Compare current requests vs VPA target
320430
kubectl get deploy <name> -n <ns> -o jsonpath='{.spec.template.spec.containers[0].resources}'
321431
kubectl get vpa <name> -n <ns> -o jsonpath='{.status.recommendation.containerRecommendations[0].target}'
432+
433+
# Check Goldilocks controller
434+
kubectl get pods -n goldilocks
435+
kubectl logs -n goldilocks -l app.kubernetes.io/name=goldilocks,app.kubernetes.io/component=controller
436+
437+
# Check VPA recommender
438+
kubectl logs -n vertical-pod-autoscaler -l app.kubernetes.io/component=recommender
322439
```
323440

324441
## Related Docs
@@ -329,5 +446,5 @@ kubectl get vpa <name> -n <ns> -o jsonpath='{.status.recommendation.containerRec
329446

330447
---
331448

332-
**Last Updated**: 2026-02-24
449+
**Last Updated**: 2026-02-28
333450
**Cluster**: talos-prod-cluster (K8s v1.35.1, Talos v1.12.4)

infrastructure/controllers/vertical-pod-autoscaler/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ VPA monitors actual CPU/memory usage and recommends optimal resource requests fo
44

55
## How It Works
66

7-
VPA is deployed in **Off mode** — it generates recommendations but does not apply them. A Kyverno ClusterPolicy (`vpa-auto-create`) automatically creates a VPA resource for every Deployment and StatefulSet in the cluster (excluding system namespaces).
7+
VPA is deployed in **Off mode** — it generates recommendations but does not apply them. Goldilocks (`infrastructure/controllers/goldilocks/`) with `on-by-default: "true"` automatically creates a VPA resource for every Deployment, StatefulSet, and DaemonSet in the cluster.
88

99
When you're ready to let VPA auto-tune, change the `updateMode` to `InPlaceOrRecreate` (K8s 1.35 GA feature — resizes pods without restarting them).
1010

my-apps/development/posthog/data-layer/kafka.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,10 @@ spec:
1919
app.kubernetes.io/name: posthog
2020
app.kubernetes.io/component: kafka
2121
spec:
22+
securityContext:
23+
fsGroup: 101
24+
runAsUser: 101
25+
runAsGroup: 101
2226
containers:
2327
- name: redpanda
2428
image: docker.redpanda.com/redpandadata/redpanda:v25.1.9

0 commit comments

Comments
 (0)