22
33## Overview
44
5- The cluster uses two separate networks :
6- 1 . ** Main LAN (192.168.10.0/24)** - 2.5G over switch - all cluster traffic, API, etc.
7- 2 . ** Storage Network (172.31.250.0/24) ** - 10G DAC point-to-point - fast NFS/iSCSI to TrueNAS
5+ The cluster uses a single network with 10G switch infrastructure :
6+ - ** Main LAN (192.168.10.0/24)** - All cluster traffic via 10G switch
7+ - ** TrueNAS Storage** - 192.168.10.133 ( 10G connected via switch)
88
99## Physical Topology
1010
@@ -13,38 +13,33 @@ The cluster uses two separate networks:
1313│ NETWORK TOPOLOGY │
1414├─────────────────────────────────────────────────────────────────────────────┤
1515│ │
16- │ ┌─────────────────┐ 10G DAC (Direct) ┌─────────────────┐ │
17- │ │ Proxmox │◄───────────────────────────────►│ TrueNAS │ │
18- │ │ hp-server-1 │ 172.31.250.2/24 │ 192.168.10.133│ │
19- │ │ │ ↕ │ │ │
20- │ │ vmbr1 (eno49) │ 172.31.250.1/24 │ enp67s0 (10G) │ │
21- │ │ │ (no switch!) │ │ │
22- │ └────────┬────────┘ └────────┬────────┘ │
23- │ │ │ │
24- │ vmbr0 │ 192.168.10.14/24 │ 192.168.10.133
25- │ (ens2) │ │ (2.5G) │
26- │ │ │ │
27- │ ▼ ▼ │
16+ │ ┌─────────────────┐ ┌─────────────────┐ │
17+ │ │ Proxmox │ │ TrueNAS │ │
18+ │ │ hp-server-1 │ │ 192.168.10.133 │ │
19+ │ │ 192.168.10.14 │ │ │ │
20+ │ └────────┬────────┘ └────────┬────────┘ │
21+ │ │ 10G │ 10G │
22+ │ │ │ │
23+ │ ▼ ▼ │
2824│ ┌────────────────────────────────────────────────────────────────────┐ │
29- │ │ 2.5G SWITCH (Main LAN) │ │
30- │ │ 192.168.10.0/24 │ │
25+ │ │ 10G SWITCH │ │
26+ │ │ 192.168.10.0/24 │ │
3127│ └────────────────────────────────────────────────────────────────────┘ │
32- │ │ │ │ │ │
33- │ ▼ ▼ ▼ ▼ │
34- │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
35- │ │ Control Plane│ │ Control Plane│ │ Control Plane│ │ Workers │ │
36- │ │ .237 │ │ .76 │ │ .140 │ │ .164/.219/.159│ │
37- │ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
28+ │ │ │ │ │ │
29+ │ ▼ ▼ ▼ ▼ │
30+ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
31+ │ │ Control Plane│ │ Control Plane│ │ Control Plane│ │ Workers │ │
32+ │ │ .237 │ │ .76 │ │ .140 │ │ .164/.219/.159│ │
33+ │ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
3834│ │
3935│ ┌──────────────────────────────────────────────────────────────────┐ │
4036│ │ GPU Worker VM 100 │ │
41- │ │ ┌─────────────────┐ ┌─────────────────┐ │ │
42- │ │ │ net0 (ens18) │ │ net1 (ens19) │ │ │
43- │ │ │ vmbr0 → Main LAN│ │ vmbr1 → 10G DAC │ │ │
44- │ │ │ 192.168.10.x │ │ 172.31.250.10 │ │ │
45- │ │ │ (DHCP) │ │ (Static) │ │ │
46- │ │ │ *** PRIMARY *** │ │ Storage only! │ │ │
47- │ │ └─────────────────┘ └─────────────────┘ │ │
37+ │ │ ┌─────────────────┐ │ │
38+ │ │ │ net0 (ens18) │ │ │
39+ │ │ │ vmbr0 → 10G LAN │ │ │
40+ │ │ │ 192.168.10.x │ │ │
41+ │ │ │ (DHCP) │ │ │
42+ │ │ └─────────────────┘ │ │
4843│ └──────────────────────────────────────────────────────────────────┘ │
4944│ │
5045└─────────────────────────────────────────────────────────────────────────────┘
@@ -58,79 +53,42 @@ The cluster uses two separate networks:
5853| --------| -----| ---------|
5954| Router/Gateway | 192.168.10.1 | Default route |
6055| Proxmox (hp-server-1) | 192.168.10.14 | Hypervisor |
61- | TrueNAS | 192.168.10.133 | NAS (NFS/SMB/MinIO S3) |
56+ | TrueNAS | 192.168.10.133 | NAS (NFS/SMB/MinIO S3) - 10G |
6257| Control Plane 1 | 192.168.10.237 | K8s master |
6358| Control Plane 2 | 192.168.10.76 | K8s master |
6459| Control Plane 3 | 192.168.10.140 | K8s master |
6560| Worker 1 | 192.168.10.164 | K8s worker |
6661| Worker 2 | 192.168.10.219 | K8s worker |
6762| Worker 3 | 192.168.10.159 | K8s worker |
68- | GPU Worker | 192.168.10.x (DHCP) | K8s GPU worker - ** must use this for kubelet ** |
63+ | GPU Worker | 192.168.10.x (DHCP) | K8s GPU worker |
6964| Wyze Bridge | 192.168.10.46 | RTSP camera streams |
7065| LoadBalancer Pool | 192.168.10.32-63 (/27) | Cilium L2 announcements |
7166
72- ### Storage Network (172.31.250.0/24)
73-
74- ** Point-to-point 10G DAC - NO SWITCH**
75-
76- | Device | IP | Interface | Purpose |
77- | --------| -----| -----------| ---------|
78- | TrueNAS | 172.31.250.1 | enp67s0 (10G SFP+) | Storage server |
79- | Proxmox | 172.31.250.2 | eno49 → vmbr1 | Hypervisor |
80- | GPU Worker VM | 172.31.250.10 | ens19 (net1) | Fast storage access |
81-
82- ## Critical Configuration Notes
83-
84- ### GPU Worker Dual-NIC Setup
85-
86- The GPU worker VM has two NICs:
87- - ** net0 (ens18)** → vmbr0 → Main LAN (192.168.10.x) - ** PRIMARY for Kubernetes**
88- - ** net1 (ens19)** → vmbr1 → 10G Storage (172.31.250.x) - ** Storage traffic only**
89-
90- ** IMPORTANT** : Kubernetes/kubelet MUST register with the 192.168.10.x address, NOT the 172.31.250.x address. The 10G network is isolated and only reaches TrueNAS.
91-
92- ### Why This Matters
93-
94- If kubelet registers with 172.31.250.10:
95- - ❌ Other nodes can't reach it (different subnet, no routing)
96- - ❌ kubectl logs/exec fails (API server can't reach kubelet)
97- - ❌ Pods scheduled there become unreachable
98- - ❌ Services don't work
99-
100- ### Talos Configuration Requirements
67+ ## Talos Configuration
10168
10269``` yaml
10370machine :
10471 network :
10572 interfaces :
106- - interface : ens18 # Main LAN - must be primary
73+ - interface : ens18
10774 dhcp : true
108- routes :
109- - network : 0.0.0.0/0 # Default route MUST go through main LAN
110- gateway : 192.168.10.1
111- - interface : ens19 # 10G storage - secondary
112- dhcp : false
113- addresses :
114- - 172.31.250.10/24
115- # NO default route here!
11675 kubelet :
117- nodeIP : <192.168.10.x> # Force kubelet to use main LAN IP
76+ nodeIP :
77+ validSubnets :
78+ - 192.168.10.0/24
11879` ` `
11980
12081## Proxmox Bridge Configuration
12182
12283| Bridge | Physical NIC | CIDR | Purpose |
12384|--------|--------------|------|---------|
124- | vmbr0 | ens2 | 192.168.10.14/24 | Main LAN |
125- | vmbr1 | eno49 | 172.31.250.2/24 | 10G DAC to TrueNAS |
85+ | vmbr0 | ens2 | 192.168.10.14/24 | Main LAN (10G) |
12686
12787## TrueNAS Network Configuration
12888
12989| Interface | IP | Speed | Purpose |
13090|-----------|-----|-------|---------|
131- | enp67s0 | 172.31.250.1/24 | 10G SFP+ DAC | Fast storage (Proxmox direct) |
132- | enp67s0d1 | - | 10G SFP+ | Unused (second port) |
133- | enx04421a41f284 | 192.168.10.133/24 | 2.5G USB | Main LAN access |
91+ | enp67s0 | 192.168.10.133/24 | 10G SFP+ | Main LAN (via 10G switch) |
13492
13593## Whitelisted Storage Access
13694
@@ -141,31 +99,24 @@ The Cilium network policy allows these storage connections:
14199| 192.168.10.133 | 2049, 111 | NFS |
142100| 192.168.10.133 | 445 | SMB |
143101| 192.168.10.133 | 9000 | MinIO S3 |
144- | 172.31.250.1 | 2049, 445, 9000 | 10G storage (GPU worker only) |
102+ | 192.168.10.133 | 30292, 30293 | RustFS |
145103
146104## Troubleshooting
147105
148- ### GPU Worker Shows Wrong IP
149-
150- If ` kubectl get nodes -o wide` shows 172.31.250.10 for GPU worker:
151-
152- 1. Check if DHCP is working on ens18
153- 2. Verify default route goes through 192.168.10.1
154- 3. Force kubelet nodeIP in Talos config
155- 4. Reboot the node after config changes
156-
157- # ## Can't Reach GPU Worker
106+ ### Can't Reach Storage
158107
159108` ` ` bash
160- # From another node, test connectivity
161- ping 192.168.10.x # Should work (main LAN)
162- ping 172.31.250.10 # Will fail (different subnet, no routing)
109+ # Test connectivity to TrueNAS
110+ ping 192.168.10.133
111+
112+ # Test NFS mount
113+ showmount -e 192.168.10.133
163114```
164115
165116### Storage Performance Testing
166117
167118``` bash
168- # Test 10G link from GPU worker to TrueNAS
169- kubectl exec -n <ns> <gpu- pod> -- dd if=/dev/zero of=/mnt/nfs/test bs=1G count=1
119+ # Test 10G link to TrueNAS
120+ kubectl exec -n < ns> < pod> -- dd if=/dev/zero of=/mnt/nfs/test bs=1G count=1
170121# Should see ~1GB/s+ throughput on 10G link
171122```
0 commit comments