Skip to content

Commit 3fede26

Browse files
committed
up
1 parent ba410f7 commit 3fede26

4 files changed

Lines changed: 85 additions & 5 deletions

File tree

CLAUDE.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -515,6 +515,45 @@ spec:
515515

516516
**Reference**: `infrastructure/storage/csi-driver-nfs/storage-class.yaml` (immich static PV)
517517

518+
### NFS 10G Performance Tuning (CRITICAL)
519+
520+
The Linux kernel (5.4+) defaults NFS `read_ahead_kb` to **128 KB**, which limits sequential NFS reads to ~140 MB/s regardless of link speed. This is because the VFS readahead window only allows ~1 NFS READ (1MB rsize) in flight at a time.
521+
522+
**Fix applied in Talos machine config** (`omni/cluster-template/cluster-template.yaml`):
523+
524+
| Setting | Purpose | Where |
525+
|---------|---------|-------|
526+
| `udev rule: ATTR{read_ahead_kb}="16384"` | Sets NFS readahead to 16MB on mount | `machine.udev.rules` (cluster patch) |
527+
| `siderolabs/nfsrahead` extension | Kernel nfsrahead tool + udev rule | `systemExtensions` (all node types) |
528+
| `sunrpc.tcp_slot_table_entries: "128"` | Max outstanding RPCs per connection | `machine.sysctls` (cluster patch) |
529+
| `net.ipv4.tcp_congestion_control: bbr` | Better congestion algorithm for 10G | `machine.sysctls` (cluster patch) |
530+
| NIC ring buffers = 8192 | Max ring buffer on Proxmox + TrueNAS | Applied on both hosts (persisted) |
531+
532+
**NFS mount options** (set per-PV via CSI `mountOptions`):
533+
- `nconnect=16` — 16 TCP connections per mount
534+
- `rsize=1048576` / `wsize=1048576` — 1MB per NFS READ/WRITE op
535+
- `nfsvers=4.1` — NFSv4.1 with session slots
536+
- `noatime` — skip access time updates
537+
538+
**Debugging NFS performance**:
539+
```bash
540+
# Check readahead (should be 16384, NOT 128)
541+
kubectl exec -n <ns> <pod> -- cat /sys/class/bdi/0:*/read_ahead_kb
542+
543+
# Check sunrpc slot table (should be 128, NOT 2)
544+
kubectl exec -n <ns> <pod> -- cat /proc/sys/sunrpc/tcp_slot_table_entries
545+
546+
# Check mount options (verify nconnect=16, rsize=1048576)
547+
kubectl exec -n <ns> <pod> -- cat /proc/self/mountstats | grep -A3 "192.168.10.133"
548+
549+
# Full NFS stats (connection distribution, slot usage, RTT)
550+
kubectl exec -n <ns> <pod> -- cat /proc/self/mountstats
551+
552+
# Server-side debugging
553+
scripts/debug-nfs-server.sh # Run on TrueNAS SSH
554+
scripts/debug-nfs-client.sh # Run on Proxmox SSH
555+
```
556+
518557
## Automated Backup & Restore with Kyverno
519558

520559
### The Magic Label Pattern

docs/network-topology.md

Lines changed: 45 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,49 @@ showmount -e 192.168.10.133
115115
### Storage Performance Testing
116116

117117
```bash
118-
# Test 10G link to TrueNAS
119-
kubectl exec -n <ns> <pod> -- dd if=/dev/zero of=/mnt/nfs/test bs=1G count=1
120-
# Should see ~1GB/s+ throughput on 10G link
118+
# Test raw wire speed (should be ~9.4 Gbps)
119+
iperf3 -c 192.168.10.133
120+
121+
# Test NFS throughput from inside a pod
122+
kubectl exec -n <ns> <pod> -- dd if=/mnt/nfs/testfile of=/dev/null bs=1M status=progress
123+
124+
# Test NFS throughput from Proxmox host (bypasses VM layer)
125+
mount -t nfs -o nfsvers=4.1,nconnect=16,rsize=1048576,wsize=1048576 192.168.10.133:/mnt/BigTank/k8s/llama-cpp /mnt/nfstest
126+
dd if=/mnt/nfstest/testfile of=/dev/null bs=1M status=progress
121127
```
128+
129+
### NFS 10G Tuning
130+
131+
The default Linux kernel `read_ahead_kb` of 128 KB limits NFS sequential reads to ~140 MB/s on any link speed. The cluster applies these fixes via Talos machine config:
132+
133+
| Layer | Setting | Value |
134+
|-------|---------|-------|
135+
| **VFS readahead** | udev rule `ATTR{read_ahead_kb}` | 16384 (16MB) |
136+
| **NFS readahead** | `siderolabs/nfsrahead` extension | Installed on all nodes |
137+
| **RPC concurrency** | `sunrpc.tcp_slot_table_entries` | 128 (default was 2) |
138+
| **TCP congestion** | `net.ipv4.tcp_congestion_control` | bbr |
139+
| **TCP buffers** | `net.core.rmem_max` / `wmem_max` | 64MB |
140+
| **NIC ring buffers** | Proxmox + TrueNAS | 8192 (max) |
141+
| **NFS mount options** | Per-PV CSI mountOptions | `nconnect=16,rsize=1M,wsize=1M` |
142+
143+
**Verified performance** (from TrueNAS ARC-cached 4GB file):
144+
145+
| Layer | Speed |
146+
|-------|-------|
147+
| iperf3 (wire) | 9.4 Gb/s |
148+
| Proxmox host → NFS | 2.7 GB/s |
149+
| Talos VM → NFS (before tuning) | ~128 MB/s |
150+
151+
**Debug commands**:
152+
```bash
153+
# Verify readahead is 16384 (not 128)
154+
kubectl exec -n <ns> <pod> -- cat /sys/class/bdi/0:*/read_ahead_kb
155+
156+
# Verify sunrpc slots are 128 (not 2)
157+
kubectl exec -n <ns> <pod> -- cat /proc/sys/sunrpc/tcp_slot_table_entries
158+
159+
# Full NFS mount stats (connections, slots, RTT)
160+
kubectl exec -n <ns> <pod> -- cat /proc/self/mountstats
161+
```
162+
163+
See `scripts/debug-nfs-server.sh` (TrueNAS) and `scripts/debug-nfs-client.sh` (Proxmox) for comprehensive debugging.

my-apps/ai/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -277,7 +277,7 @@ All routes use `gateway-internal` (Cilium Gateway API). LLM and Open WebUI route
277277
| ComfyUI | NFS (static PV, CSI) | 250Gi | `192.168.10.133:/mnt/BigTank/k8s/comfyui` |
278278
| Open WebUI | Longhorn | 5Gi | Dynamic PVC |
279279

280-
NFS mounts use `nconnect=16` over 10G for fast model loading.
280+
NFS mounts use `nconnect=16` over 10G for fast model loading. Performance depends on Talos kernel tuning — `read_ahead_kb` must be 16384+ (set via udev rule in cluster template) and `sunrpc.tcp_slot_table_entries` must be 128+ (set via sysctl). Without these, NFS caps at ~140 MB/s regardless of link speed.
281281

282282
## Caveats
283283

omni/cluster-template/cluster-template.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,6 @@ machineClass:
119119
systemExtensions:
120120
- siderolabs/iscsi-tools
121121
- siderolabs/nfs-utils
122-
- siderolabs/nfsrahead
123122
- siderolabs/qemu-guest-agent
124123
- siderolabs/util-linux-tools
125124
- siderolabs/nonfree-kmod-nvidia-production

0 commit comments

Comments
 (0)