Skip to content

Commit 9c18135

Browse files
committed
up
1 parent 2cef054 commit 9c18135

3 files changed

Lines changed: 349 additions & 15 deletions

File tree

docs/argocd.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -248,5 +248,95 @@ kubectl describe application my-apps-nginx-development -n argocd
248248
| **ArgoCD UI not accessible** | Check the `http-route.yaml` and the status of the Gateway API or ingress controller. |
249249
| **Nested kustomization Helm issues** | The 2025 structure flattens nested kustomizations to avoid `--enable-helm` inheritance issues. If you see Helm chart errors, ensure the chart is defined at the ApplicationSet target level, not nested. |
250250

251+
## 📊 ApplicationSet Sync Behavior
252+
253+
### Why All Apps Show "Syncing" on Every Commit
254+
255+
When you push any commit to the repository, you may notice in tools like Lens that all applications show activity or "re-syncing" status, even if you only changed one specific application (e.g., `nvidia-device-plugin`).
256+
257+
**This is expected behavior** and not a problem. Here's why:
258+
259+
#### How ApplicationSet Git Directory Generators Work
260+
261+
ApplicationSets with Git Directory Generators are designed to discover new applications by scanning directory patterns on each evaluation:
262+
263+
```yaml
264+
generators:
265+
- git:
266+
repoURL: https://github.com/mitchross/talos-argocd-proxmox.git
267+
revision: main
268+
directories:
269+
- path: infrastructure/controllers/*
270+
- path: monitoring/*
271+
- path: my-apps/*/*
272+
```
273+
274+
When a new commit is pushed:
275+
1. **ApplicationSet Controller** detects the change via webhook or polling
276+
2. **Directory scan** happens across ALL matching paths to discover any new applications
277+
3. **Application reconciliation** runs for each discovered app to check if it's out of sync
278+
4. **Sync evaluation** determines which apps actually need changes applied
279+
280+
#### What Actually Happens
281+
282+
- ✅ **Reconciliation/Evaluation**: All apps are checked (this is what you see in Lens)
283+
- ✅ **Sync Only Changed Apps**: Only apps with actual changes are synced to the cluster
284+
- ✅ **No Unnecessary Changes**: With `ApplyOutOfSyncOnly=true`, ArgoCD only applies changes to resources that differ
285+
286+
#### The Key Sync Option
287+
288+
All our ApplicationSets include this critical option:
289+
290+
```yaml
291+
syncOptions:
292+
- ApplyOutOfSyncOnly=true # Only sync resources that actually changed
293+
```
294+
295+
This means:
296+
- ArgoCD **evaluates** all apps to determine sync status
297+
- ArgoCD **only applies changes** to apps that are actually out of sync
298+
- The cluster resources themselves are **not modified** unless there are real changes
299+
300+
#### Why Not Use Path-Based Filtering?
301+
302+
**Git Directory Generators** need to scan all directories to discover new applications. There are alternatives, but they come with tradeoffs:
303+
304+
| Approach | Pros | Cons |
305+
|----------|------|------|
306+
| **Current (Directory Generator)** | Simple, auto-discovers new apps, no extra files needed | Evaluates all apps on every commit (but doesn't apply changes) |
307+
| **Split ApplicationSets** | Only evaluates apps in changed areas | More ApplicationSet files to manage |
308+
| **SCM Provider + Files** | More efficient change detection | Requires marker files in each app directory |
309+
| **requeueAfterSeconds** | Less frequent evaluations | Just delays the problem, doesn't solve it |
310+
311+
#### What You See in Lens vs Reality
312+
313+
When viewing events in Lens:
314+
- **"Syncing" status**: ArgoCD is checking if the app needs updates (reconciliation)
315+
- **"Synced" status**: ArgoCD confirmed the app matches Git (no changes applied)
316+
- **"OutOfSync → Synced"**: ArgoCD actually applied changes to cluster resources
317+
318+
The first two are just status checks and don't modify cluster resources.
319+
320+
#### Verification
321+
322+
To verify that only changed apps are actually syncing resources:
323+
324+
```bash
325+
# Watch ArgoCD application events
326+
kubectl get events -n argocd --sort-by='.lastTimestamp' -w
327+
328+
# Check if resources are actually being applied
329+
kubectl get events -A --sort-by='.lastTimestamp' | grep -v "argocd"
330+
331+
# View specific application sync history
332+
argocd app history <app-name>
333+
```
334+
335+
#### Conclusion
336+
337+
The "re-syncing" activity you see is **evaluation overhead**, not unnecessary resource changes. This is a reasonable tradeoff for the simplicity of automatic application discovery without marker files. The cluster resources themselves remain unchanged unless there are actual differences in Git.
338+
339+
If this evaluation overhead becomes a performance issue (it typically doesn't in homelabs), consider splitting ApplicationSets by area or adding `requeueAfterSeconds: 300` to reduce polling frequency.
340+
251341
### ArgoCD Self-Management
252342
```

docs/network.md

Lines changed: 257 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,22 +2,266 @@
22

33
## Overview
44

5+
This document describes the complete network architecture including external (Cloudflare-proxied) and internal (Firewalla DNS) traffic flows.
6+
7+
## Complete Network Topology
8+
59
```mermaid
6-
graph TD
7-
subgraph "Physical Topology"
8-
A[Internet Gateway] --> B[Switch]
9-
B --> C[Talos Node]
10+
graph TB
11+
subgraph Internet["🌐 Internet"]
12+
ExtUser[External User<br/>anywhere.com]
1013
end
11-
subgraph "Logical Topology"
12-
D[Internet] --> E[Cloudflare]
13-
E --> F[Cloudflare Tunnel]
14-
F --> G[Gateway API]
15-
G --> H[Cilium Service Mesh]
16-
H --> I[Kubernetes Service]
17-
I --> J[Pod]
14+
15+
subgraph Home["🏠 Home Network - 192.168.10.0/24"]
16+
Firewalla[Firewalla<br/>192.168.10.1<br/>Gateway & DNS]
17+
IntUser[Internal User<br/>Phone/Laptop]
18+
19+
subgraph Cluster["☸️ Talos Kubernetes Cluster"]
20+
ExtGW[External Gateway<br/>192.168.10.49<br/>TCP 80,443]
21+
IntGW[Internal Gateway<br/>192.168.10.50<br/>TCP 80,443,5432]
22+
23+
subgraph Services["Services"]
24+
ArgoCD[ArgoCD<br/>argocd-server:80]
25+
Llama[Llama WebUI<br/>ollama-webui:80]
26+
Immich[Immich<br/>immich:80]
27+
end
28+
end
29+
end
30+
31+
subgraph CF["☁️ Cloudflare"]
32+
CFDNS[Cloudflare DNS<br/>*.vanillax.me]
33+
CFProxy[Cloudflare Proxy<br/>HTTP/3 QUIC Enabled]
34+
CFTunnel[Cloudflare Tunnel<br/>cloudflared pod]
1835
end
19-
style C fill:#f9f,stroke:#333
20-
style H fill:#bbf,stroke:#333
36+
37+
%% External Flow - Goes through Cloudflare
38+
ExtUser -->|1. DNS: argocd.vanillax.me| CFDNS
39+
CFDNS -->|2. Returns Cloudflare IP| ExtUser
40+
ExtUser -->|3. HTTPS/QUIC Request| CFProxy
41+
CFProxy -->|4. Tunnel Connection| CFTunnel
42+
CFTunnel -->|5. HTTP to 192.168.10.49:80| ExtGW
43+
ExtGW -->|6. Route via HTTPRoute| ArgoCD
44+
45+
%% Internal Flow - Direct via Firewalla DNS
46+
IntUser -->|1. DNS: llama.vanillax.me| Firewalla
47+
Firewalla -->|2. Returns 192.168.10.50| IntUser
48+
IntUser -->|3. HTTPS Request<br/>NO CLOUDFLARE| IntGW
49+
IntGW -->|4. Route via HTTPRoute| Llama
50+
51+
%% Styling
52+
classDef external fill:#ff6b6b,stroke:#c92a2a,color:#fff
53+
classDef internal fill:#51cf66,stroke:#2f9e44,color:#fff
54+
classDef cloudflare fill:#f59f00,stroke:#e67700,color:#fff
55+
classDef gateway fill:#339af0,stroke:#1971c2,color:#fff
56+
57+
class ExtUser,ExtGW external
58+
class IntUser,IntGW,Firewalla internal
59+
class CFDNS,CFProxy,CFTunnel cloudflare
60+
class ArgoCD,Llama,Immich gateway
61+
```
62+
63+
## Traffic Flow Details
64+
65+
### 🔴 External Flow (Internet → Cloudflare → External Gateway)
66+
67+
```mermaid
68+
sequenceDiagram
69+
participant EU as External User<br/>(Internet)
70+
participant CF as Cloudflare DNS
71+
participant CFP as Cloudflare Proxy<br/>(HTTP/3 QUIC)
72+
participant CFT as Cloudflare Tunnel<br/>(cloudflared pod)
73+
participant EGW as External Gateway<br/>192.168.10.49
74+
participant SVC as K8s Service
75+
participant POD as Pod
76+
77+
Note over EU,POD: Example: argocd.vanillax.me from Internet
78+
79+
EU->>CF: DNS Query: argocd.vanillax.me
80+
CF->>EU: Cloudflare Proxy IP (104.x.x.x)
81+
82+
EU->>CFP: HTTPS/HTTP3 Request (may use QUIC)
83+
Note over CFP: QUIC/HTTP3 handled by Cloudflare<br/>Not your gateways
84+
85+
CFP->>CFT: Tunnel Connection (HTTP/2)
86+
Note over CFT: Runs in cluster<br/>Connects to Cloudflare
87+
88+
CFT->>EGW: HTTP Request to 192.168.10.49:80
89+
Note over EGW: Cilium Gateway<br/>Terminates TLS
90+
91+
EGW->>SVC: Route via HTTPRoute
92+
SVC->>POD: Forward to Pod
93+
POD->>SVC: Response
94+
SVC->>EGW: Response
95+
EGW->>CFT: Response
96+
CFT->>CFP: Response
97+
CFP->>EU: HTTPS/HTTP3 Response
98+
```
99+
100+
### 🟢 Internal Flow (Home Network → Firewalla DNS → Internal Gateway)
101+
102+
```mermaid
103+
sequenceDiagram
104+
participant IU as Internal User<br/>(Phone/Laptop)
105+
participant FW as Firewalla DNS<br/>192.168.10.1
106+
participant IGW as Internal Gateway<br/>192.168.10.50
107+
participant SVC as K8s Service
108+
participant POD as Pod
109+
110+
Note over IU,POD: Example: llama.vanillax.me from home network
111+
112+
IU->>FW: DNS Query: llama.vanillax.me
113+
Note over FW: Custom DNS Record<br/>*.vanillax.me → 192.168.10.50
114+
115+
FW->>IU: IP: 192.168.10.50
116+
117+
IU->>IGW: HTTPS Request (TCP 443)
118+
Note over IU,IGW: NO CLOUDFLARE<br/>Direct connection<br/>Client may attempt HTTP/3
119+
120+
Note over IGW: Cilium Gateway<br/>Terminates TLS<br/>cert-manager cert
121+
122+
IGW->>SVC: Route via HTTPRoute
123+
SVC->>POD: Forward to Pod
124+
POD->>SVC: Response
125+
SVC->>IGW: Response
126+
IGW->>IU: HTTPS Response
127+
```
128+
129+
## IP Address Allocation
130+
131+
### Physical Network (Home LAN)
132+
| Component | IP Address | Purpose | Ports |
133+
|-----------|------------|---------|-------|
134+
| Firewalla | 192.168.10.1 | Gateway, DNS, Firewall | DNS:53, HTTP:80, HTTPS:443 |
135+
| Talos Nodes | 192.168.10.x | Kubernetes nodes | Various |
136+
| **External Gateway** | **192.168.10.49** | Public-facing services via Cloudflare Tunnel | HTTP:80, HTTPS:443 |
137+
| **Internal Gateway** | **192.168.10.50** | LAN-only services (no Cloudflare) | HTTP:80, HTTPS:443, TCP:5432 |
138+
139+
### Cluster Networks
140+
| Network | CIDR | Purpose |
141+
|---------|------|---------|
142+
| Pod Network | 10.14.0.0/16 | Cilium pod CIDR |
143+
| Service Network | 10.43.0.0/16 | Kubernetes services |
144+
| LoadBalancer Pool | 192.168.10.49-50 | Cilium L2 announcements |
145+
146+
## DNS Resolution Paths
147+
148+
### External Domains (Cloudflare DNS)
149+
```
150+
User Query: argocd.vanillax.me
151+
→ Cloudflare Authoritative DNS
152+
→ Returns: Cloudflare Proxy IP (104.x.x.x)
153+
→ Traffic flows through Cloudflare → Tunnel → External Gateway (192.168.10.49)
154+
```
155+
156+
### Internal Domains (Firewalla Custom DNS)
157+
```
158+
User Query: llama.vanillax.me (from home network)
159+
→ Firewalla DNS (192.168.10.1)
160+
→ Custom DNS Override: *.vanillax.me → 192.168.10.50
161+
→ Returns: 192.168.10.50
162+
→ Traffic flows directly: User → Internal Gateway (192.168.10.50)
163+
→ NO CLOUDFLARE INVOLVED
164+
```
165+
166+
## ERR_QUIC_PROTOCOL_ERROR Root Cause Analysis
167+
168+
### Problem Statement
169+
When accessing **internal routes** like `argocd.vanillax.me` or `llama.vanillax.me` from the home network, browsers show `ERR_QUIC_PROTOCOL_ERROR`.
170+
171+
### Why This Happens (Internal Routes)
172+
173+
Even though internal routes **don't go through Cloudflare**, the error still occurs because:
174+
175+
1. **Browser Behavior**: Modern browsers (Chrome, Edge) remember that a domain supports HTTP/3 via **Alt-Svc headers** or **HTTPS RR records**
176+
2. **Domain Matching**: When you access `argocd.vanillax.me` externally (via Cloudflare with HTTP/3), the browser caches that `*.vanillax.me` supports QUIC
177+
3. **Internal Access Attempt**: When you later access the same domain internally (via Firewalla DNS → 192.168.10.50), the browser still tries HTTP/3/QUIC
178+
4. **Gateway Limitation**: Cilium Gateway (without Envoy) doesn't support QUIC → Connection fails → ERR_QUIC_PROTOCOL_ERROR
179+
180+
### Traffic Path Analysis
181+
182+
#### Internal Route - argocd.vanillax.me (ERROR OCCURRED HERE)
183+
```
184+
Browser (at home) → DNS query to Firewalla
185+
186+
Firewalla DNS returns 192.168.10.50 (internal gateway)
187+
188+
Browser TRIED UDP 443 (QUIC) because it remembered HTTP/3 support from Cloudflare
189+
190+
❌ Internal Gateway (192.168.10.50) doesn't handle QUIC
191+
192+
ERR_QUIC_PROTOCOL_ERROR
193+
194+
FIX: Gateway now advertises ONLY h2,http/1.1 (no h3)
195+
Browser won't attempt QUIC anymore
196+
```
197+
198+
#### External Route - argocd.vanillax.me (via Cloudflare - Works Fine)
199+
```
200+
Browser (on internet) → DNS query to Cloudflare DNS
201+
202+
Cloudflare returns Cloudflare Proxy IP
203+
204+
Browser may use HTTP/3 QUIC to Cloudflare (handled by CF)
205+
206+
✅ Cloudflare terminates QUIC, sends HTTP/2 to Tunnel
207+
208+
Tunnel → External Gateway (192.168.10.49) via HTTP/2 → Service
209+
```
210+
211+
### Solution: Disable QUIC Advertisement on Gateways
212+
213+
Since **neither gateway needs QUIC support**:
214+
- External gateway receives HTTP/2 from Cloudflare Tunnel (not QUIC)
215+
- Internal gateway receives direct HTTPS from LAN clients (not QUIC)
216+
- QUIC/HTTP3 only happens between end users and Cloudflare's edge
217+
218+
**The fix**: Explicitly disable HTTP/3 advertisement by only advertising HTTP/2 and HTTP/1.1 via ALPN protocols.
219+
220+
#### Changes Made
221+
222+
1. **Internal Gateway** (`gw-internal.yaml:32`): Set ALPN to `h2,http/1.1` (no h3)
223+
2. **External Gateway** (not changed): Already uses `h2,http/1.1` by default
224+
3. **Cilium**: No Envoy HTTP/3 configuration needed
225+
226+
This prevents browsers from attempting QUIC connections to your gateways, which don't support it.
227+
228+
### Alternative Solutions (Not Recommended)
229+
230+
#### Option 1: Clear Browser QUIC Cache (Temporary)
231+
- **What**: Clear browser Alt-Svc cache: `chrome://net-internals/#sockets` → Flush socket pools
232+
- **Why**: Forces browser to forget HTTP/3 support
233+
- **Downside**: Temporary fix, comes back after external access
234+
235+
#### Option 2: Use Separate Domains
236+
- **What**: Use different domains for internal vs external (e.g., `argocd.local` vs `argocd.vanillax.me`)
237+
- **Why**: Browser won't confuse the two
238+
- **Downside**: More complex DNS management, different URLs
239+
240+
### Recommended Path Forward
241+
242+
1. **Apply the gateway changes** (already made)
243+
2. **Test internal access** to confirm QUIC errors are resolved
244+
3. **Clear browser QUIC cache** once to force re-negotiation
245+
4. **Verify** that browsers use HTTP/2 instead of attempting QUIC
246+
247+
### Validation Commands
248+
249+
```bash
250+
# Apply gateway changes
251+
kubectl apply -f infrastructure/networking/gateway/gw-internal.yaml
252+
253+
# Check gateway status
254+
kubectl get gateway -n gateway gateway-internal -o yaml
255+
256+
# Test from internal network (verbose to see protocol negotiation)
257+
curl -v https://argocd.vanillax.me
258+
259+
# Check negotiated protocol (should be HTTP/2 or HTTP/1.1, NOT h3)
260+
curl -I https://argocd.vanillax.me
261+
262+
# Clear browser QUIC cache (Chrome/Edge)
263+
# Navigate to: chrome://net-internals/#sockets
264+
# Click "Flush socket pools"
21265
```
22266

23267
## Declarative Networking with ArgoCD & Talos

my-apps/media/immich/values.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,8 +75,8 @@ server:
7575
cpu: 500m
7676
memory: 512Mi
7777
limits:
78-
cpu: 2000m
79-
memory: 2Gi
78+
cpu: 4000m
79+
memory: 4Gi
8080

8181
service:
8282
main:

0 commit comments

Comments
 (0)