Skip to content

Commit 5035de2

Browse files
committed
cleanup
up
1 parent 1df9283 commit 5035de2

6 files changed

Lines changed: 23 additions & 20 deletions

File tree

.github/instructions/argocd.instructions.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -33,18 +33,18 @@ spec:
3333
3434
### Three-Tier Application Discovery
3535
1. **Infrastructure** (`infrastructure-appset.yaml`):
36-
- Paths: `infrastructure/controllers/*`, `infrastructure/storage/*`, etc.
37-
- Sync wave: "1" (after ArgoCD, before apps)
38-
- Creates core cluster services
36+
- Explicit path list (NOT glob discovery)
37+
- Sync wave: "4" (after foundation, storage, and PVC Plumber)
38+
- Creates core cluster services (cert-manager, Kyverno, GPU operators, databases, gateway, etc.)
3939

4040
2. **Monitoring** (`monitoring-appset.yaml`):
4141
- Paths: `monitoring/*`
42-
- Sync wave: "0" (early deployment)
42+
- Sync wave: "5" (after infrastructure)
4343
- Creates observability stack
4444

4545
3. **Applications** (`my-apps-appset.yaml`):
4646
- Paths: `my-apps/*/*` (nested directories)
47-
- Sync wave: "2" (after infrastructure)
47+
- Sync wave: "6" (after everything else)
4848
- Creates user applications
4949

5050
### Directory-Based Discovery
@@ -91,9 +91,12 @@ retry:
9191
## Sync Waves and Dependencies
9292

9393
### Wave Ordering
94-
- Wave "0": Monitoring stack (Prometheus, Grafana)
95-
- Wave "1": Infrastructure (Cilium, Longhorn, cert-manager)
96-
- Wave "2": Applications (user workloads)
94+
- Wave "0": Foundation (Cilium, ArgoCD, 1Password Connect, External Secrets, AppProjects)
95+
- Wave "1": Storage (Longhorn, Snapshot Controller, VolSync)
96+
- Wave "2": PVC Plumber (backup existence checker)
97+
- Wave "4": Infrastructure ApplicationSet (cert-manager, Kyverno, GPU operators, databases, gateway)
98+
- Wave "5": Monitoring ApplicationSet (Prometheus, Grafana, Loki)
99+
- Wave "6": My-Apps ApplicationSet (user workloads)
97100

98101
### CRD Handling
99102
For infrastructure components that install CRDs:

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -809,7 +809,7 @@ kubectl exec -it gpu-pod -n app-name -- nvidia-smi
809809

810810
- **[README.md](README.md)** - Bootstrap guide, architecture overview, and Omni/Proxmox setup
811811
- **[.github/copilot-instructions.md](.github/copilot-instructions.md)** - Detailed development patterns
812-
- **[.github/instructions/](/.github/instructions/)** - Domain-specific instructions (ArgoCD, GPU, Talos, standards)
812+
- **[.github/instructions/](.github/instructions/)** - Domain-specific instructions (ArgoCD, GPU, Talos, standards)
813813
- **[docs/pvc-plumber-full-flow.md](docs/pvc-plumber-full-flow.md)** - Complete PVC backup/restore flow from bare metal to automatic disaster recovery
814814
- **[docs/backup-restore.md](docs/backup-restore.md)** - Detailed backup/restore workflow with architecture diagrams
815815
- **[docs/network-topology.md](docs/network-topology.md)** - Network architecture details

docs/argocd.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,12 @@ To solve the "chicken-and-egg" problem of bootstrapping a cluster (e.g., needing
3333

3434
| Wave | Phase | Components | Description |
3535
|------|-------|------------|-------------|
36-
| **0** | **Foundation** | `cilium`, `1password-connect`, `external-secrets` | **Networking & Secrets**. The absolute minimum required for other pods to start and pull credentials. |
36+
| **0** | **Foundation** | `cilium`, `argocd`, `1password-connect`, `external-secrets`, `projects` | **Networking & Secrets**. The absolute minimum required for other pods to start and pull credentials. |
3737
| **1** | **Storage** | `longhorn`, `snapshot-controller`, `volsync` | **Persistence**. Depends on Wave 0 for Pod-to-Pod communication and secrets. |
38-
| **2** | **System** | `cert-manager`, `gpu-operator`, `databases` | **Core Services**. Depends on Storage (PVCs) and Networking (Ingress/Gateway). |
39-
| **3** | **Observability** | `kube-prometheus-stack`, `loki` | **Monitoring**. Monitors the healthy stack. |
40-
| **4** | **User** | `my-apps/*` | **Workloads**. The actual applications running on the cluster. |
38+
| **2** | **PVC Plumber** | `pvc-plumber` | **Backup checker**. Must be running before Kyverno policies in Wave 4 call its API. |
39+
| **4** | **Infrastructure** | `cert-manager`, `kyverno`, `gpu-operator`, `databases`, `gateway`, etc. | **Core Services** via ApplicationSet (explicit path list). |
40+
| **5** | **Monitoring** | `prometheus-stack`, `loki-stack`, `tempo` | **Observability** via ApplicationSet (discovers `monitoring/*`). |
41+
| **6** | **User** | `my-apps/*/*` | **Workloads** via ApplicationSet (discovers `my-apps/*/*`). |
4142

4243
### How It Works
4344
Each `Application` resource in `infrastructure/controllers/argocd/apps/` is annotated with a sync wave:

docs/network-policy.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ Located at: `infrastructure/networking/cilium/policies/block-lan-access.yaml`
6363
| Pod-to-Pod (cluster) | **ALLOWED** | Inter-service communication |
6464
| Kube-apiserver | **ALLOWED** | Kubernetes operations |
6565
| DNS (CoreDNS) | **ALLOWED** | Name resolution |
66-
| TrueNAS (specific ports) | **ALLOWED** | NFS/SMB/MinIO storage |
66+
| TrueNAS (specific ports) | **ALLOWED** | NFS/SMB/RustFS storage |
6767
| LoadBalancer IPs | **ALLOWED** | Cilium L2 announcements |
6868

6969
## Policy Architecture
@@ -73,7 +73,7 @@ graph TD
7373
subgraph "Egress Rules"
7474
Internet[Internet<br/>0.0.0.0/0 EXCEPT RFC1918]
7575
Cluster[Cluster Entities<br/>pods, nodes, apiserver]
76-
Storage[Whitelisted Storage<br/>TrueNAS: ports 2049,445,9000]
76+
Storage[Whitelisted Storage<br/>TrueNAS: NFS,SMB,RustFS]
7777
LB[LoadBalancer Pool<br/>192.168.10.32/27]
7878
end
7979
@@ -105,7 +105,7 @@ These specific IPs are allowed on specific ports only:
105105

106106
| IP | Hostname | Allowed Ports | Purpose |
107107
|----|----------|---------------|---------|
108-
| 192.168.10.133 | TrueNAS | 2049 (NFS), 111 (RPC), 445 (SMB), 9000 (MinIO), 30292-30293 (RustFS) | Storage backend (10G) |
108+
| 192.168.10.133 | TrueNAS | 2049 (NFS), 111 (RPC), 445 (SMB), 9000, 30292-30293 (RustFS S3) | Storage backend (10G) |
109109
| 192.168.10.46 | Wyze Bridge | 8554 (RTSP) | Camera streams for Frigate |
110110
| 192.168.10.14 | Proxmox | 8006 (API) | Omni/Terraform integration |
111111
| 192.168.10.32/27 | LB Pool | All | Cilium L2 LoadBalancer IPs |

docs/network-topology.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ The cluster uses a single network with 10G switch infrastructure:
5353
|--------|-----|---------|
5454
| Router/Gateway | 192.168.10.1 | Default route |
5555
| Proxmox (hp-server-1) | 192.168.10.14 | Hypervisor |
56-
| TrueNAS | 192.168.10.133 | NAS (NFS/SMB/MinIO S3) - 10G |
56+
| TrueNAS | 192.168.10.133 | NAS (NFS/SMB/RustFS S3) - 10G |
5757
| Control Plane 1 | 192.168.10.237 | K8s master |
5858
| Control Plane 2 | 192.168.10.76 | K8s master |
5959
| Control Plane 3 | 192.168.10.140 | K8s master |
@@ -98,8 +98,7 @@ The Cilium network policy allows these storage connections:
9898
|-------------|-------|---------|
9999
| 192.168.10.133 | 2049, 111 | NFS |
100100
| 192.168.10.133 | 445 | SMB |
101-
| 192.168.10.133 | 9000 | MinIO S3 |
102-
| 192.168.10.133 | 30292, 30293 | RustFS |
101+
| 192.168.10.133 | 9000, 30292, 30293 | RustFS S3 (Loki, Tempo, pgBackRest) |
103102
104103
## Troubleshooting
105104

monitoring/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ graph TB
4444
- **kube-state-metrics**: Kubernetes resource metrics
4545

4646
### 📝 Log Aggregation (loki-stack/)
47-
- **Loki**: Log aggregation and storage (SingleBinary mode with filesystem storage)
47+
- **Loki**: Log aggregation and storage (SimpleScalable mode with RustFS S3 backend)
4848
- **Promtail**: Log collection agent
4949
- **Gateway**: HTTP access gateway
5050

0 commit comments

Comments
 (0)