Skip to content

Commit 3b2e9ad

Browse files
mitchrossclaude
andcommitted
Add Garage S3 storage design document
Design for deploying Garage distributed S3-compatible object storage: - 3-instance StatefulSet with replication factor 2 - Longhorn PVCs (3Gi meta + 30Gi data per instance) - Web UI at garage.vanillax.me - PostSync init job for cluster layout - Internal S3 API access only 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 1b1df88 commit 3b2e9ad

1 file changed

Lines changed: 269 additions & 0 deletions

File tree

Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
# Garage S3 Storage Design
2+
3+
**Date**: 2025-10-24
4+
**Status**: Approved for Implementation
5+
**Location**: `infrastructure/storage/garage/`
6+
7+
## Overview
8+
9+
Deploy Garage distributed S3-compatible object storage with Web UI for internal cluster storage needs. Garage provides geo-distributed, self-hosted S3 storage with replication factor 2 across 3 instances.
10+
11+
## Architecture
12+
13+
### Components
14+
15+
1. **Garage Backend**: StatefulSet with 3 replicas, S3 and Admin APIs
16+
2. **Garage Web UI**: Web-based management interface
17+
3. **Auto-initialization**: PostSync Job for cluster layout configuration
18+
19+
### Directory Structure
20+
21+
```
22+
infrastructure/storage/garage/
23+
├── backend/
24+
│ ├── externalsecret.yaml # 1Password: rpc_secret, admin_token
25+
│ ├── configmap.yaml # garage.toml configuration
26+
│ ├── statefulset.yaml # 3 replicas, Longhorn PVCs
27+
│ ├── service-s3.yaml # Port 3900 (S3 API)
28+
│ ├── service-admin.yaml # Port 3903 (Admin API)
29+
│ ├── service-internal.yaml # Headless for kubernetes_discovery
30+
│ └── init-job.yaml # PostSync hook (sync wave 1)
31+
├── webui/
32+
│ ├── deployment.yaml # Web UI frontend (sync wave 2)
33+
│ ├── service.yaml # Port 3909 (named port)
34+
│ └── httproute.yaml # garage.vanillax.me
35+
├── namespace.yaml
36+
├── kustomization.yaml # Lists all resources
37+
└── README.md # Usage guide
38+
```
39+
40+
## Storage Configuration
41+
42+
### PVC Sizing (per instance)
43+
- **Metadata**: 3Gi (Longhorn, ReadWriteOnce)
44+
- **Data**: 30Gi (Longhorn, ReadWriteOnce)
45+
- **Total capacity**: 90Gi across 3 instances
46+
47+
### Replication
48+
- **Replication Factor**: 2
49+
- **Minimum instances**: 2
50+
- **Failure tolerance**: 1 instance can fail
51+
- **Data redundancy**: Each object stored on 2 instances
52+
53+
## Network Architecture
54+
55+
### Services
56+
57+
| Service | Port | Type | Purpose | Exposure |
58+
|---------|------|------|---------|----------|
59+
| `garage-s3` | 3900 | ClusterIP | S3 API | Internal only |
60+
| `garage-admin` | 3903 | ClusterIP | Admin/Web API | Internal only |
61+
| `garage-internal` | 3901 | Headless | RPC/Discovery | Internal only |
62+
| `garage-webui` | 3909 | ClusterIP | Web UI | HTTPRoute |
63+
64+
### External Access
65+
66+
- **Web UI**: `https://garage.vanillax.me` (gateway-internal)
67+
- **S3 API**: Internal cluster access only via `garage-s3.garage.svc.cluster.local:3900`
68+
69+
### Connection Flow
70+
71+
```
72+
User → https://garage.vanillax.me
73+
→ HTTPRoute (gateway-internal)
74+
→ garage-webui Service (3909)
75+
→ garage-webui Pod
76+
→ garage-admin Service (3903)
77+
→ Garage StatefulSet Pods
78+
79+
Apps → garage-s3.garage.svc.cluster.local:3900
80+
→ garage-s3 Service
81+
→ Garage StatefulSet Pods
82+
```
83+
84+
## Configuration
85+
86+
### Garage Backend (garage.toml)
87+
88+
Key settings:
89+
- `replication_factor = 2`
90+
- `db_engine = "lmdb"`
91+
- `metadata_dir = "/mnt/meta"`
92+
- `data_dir = "/mnt/data"`
93+
- `[kubernetes_discovery]` enabled for pod auto-discovery
94+
- `[admin]` API on port 3903 with token from 1Password
95+
96+
### Web UI Environment Variables
97+
98+
- `API_BASE_URL`: `http://garage-admin.garage.svc.cluster.local:3903`
99+
- `S3_ENDPOINT_URL`: `http://garage-s3.garage.svc.cluster.local:3900`
100+
- `S3_REGION`: `garage`
101+
- `API_ADMIN_KEY`: From ExternalSecret (1Password)
102+
103+
## Secrets Management
104+
105+
### 1Password Item: `s3-garage`
106+
107+
Two fields required:
108+
1. **`rpc_secret`**: 32-byte hex string for RPC authentication between pods
109+
- Generate: `openssl rand -hex 32`
110+
2. **`admin_token`**: 64+ character secure token for Admin API
111+
- Generate: `openssl rand -base64 48`
112+
113+
### ExternalSecret
114+
115+
Syncs from ClusterSecretStore `1password`:
116+
- `rpc-secret` → Used in garage.toml
117+
- `admin-token` → Used by Web UI and garage.toml admin section
118+
119+
## Deployment Flow
120+
121+
### ArgoCD Sync Waves
122+
123+
**Wave 0** (Default):
124+
1. Namespace creation
125+
2. ExternalSecret syncs from 1Password
126+
3. ConfigMap with garage.toml
127+
4. StatefulSet deploys (3 pods: garage-0, garage-1, garage-2)
128+
5. Services created (s3, admin, internal)
129+
6. Pods use kubernetes_discovery to find each other via headless service
130+
131+
**Wave 1** (PostSync Hook):
132+
1. Init Job waits for all 3 pods to be ready
133+
2. Connects each node to the cluster
134+
3. Configures cluster layout with capacity and replication factor
135+
4. Applies the layout
136+
5. Cluster operational
137+
138+
**Wave 2** (Applications):
139+
1. Web UI Deployment starts
140+
2. Web UI Service with named port created
141+
3. HTTPRoute configured for garage.vanillax.me
142+
4. Web UI connects to Admin API
143+
144+
### ApplicationSet Discovery
145+
146+
- **Pattern**: `infrastructure/storage/*`
147+
- **Application name**: `garage`
148+
- **Namespace**: `garage` (auto-created)
149+
- **Sync policy**: Automated with prune + selfHeal
150+
- **Sync wave**: 1 (infrastructure tier)
151+
152+
## Post-Deployment Tasks
153+
154+
### 1. Verify Cluster Status
155+
156+
```bash
157+
kubectl exec -n garage garage-0 -- garage status
158+
```
159+
160+
Expected output: 3 connected nodes with configured layout
161+
162+
### 2. Access Web UI
163+
164+
Navigate to `https://garage.vanillax.me`
165+
- Admin token automatically configured from 1Password
166+
- Should see cluster status, buckets, and keys
167+
168+
### 3. Create First S3 Bucket
169+
170+
Via Web UI or CLI:
171+
```bash
172+
kubectl exec -n garage garage-0 -- garage bucket create my-bucket
173+
```
174+
175+
### 4. Create Access Keys
176+
177+
Via Web UI or CLI:
178+
```bash
179+
kubectl exec -n garage garage-0 -- garage key create my-app-key
180+
```
181+
182+
### 5. Test S3 Access
183+
184+
From application pods:
185+
- **Endpoint**: `http://garage-s3.garage.svc.cluster.local:3900`
186+
- **Region**: `garage`
187+
- **Access Key**: From Web UI
188+
- **Secret Key**: From Web UI
189+
190+
## Validation Checklist
191+
192+
Pre-deployment:
193+
- [ ] 1Password item `s3-garage` created with `rpc_secret` and `admin_token`
194+
- [ ] DNS record for `garage.vanillax.me` points to gateway (if needed)
195+
- [ ] Longhorn storage class available and healthy
196+
197+
Post-deployment:
198+
- [ ] All 3 Garage pods running and ready
199+
- [ ] Init job completed successfully (check logs)
200+
- [ ] `garage status` shows 3 connected nodes
201+
- [ ] Cluster layout applied with replication factor 2
202+
- [ ] Web UI accessible at `https://garage.vanillax.me`
203+
- [ ] Can create buckets via Web UI
204+
- [ ] Can create access keys via Web UI
205+
- [ ] S3 API responds to requests from cluster
206+
207+
## Backup Strategy
208+
209+
**Longhorn PVC Backups**:
210+
- Leverage Longhorn's built-in snapshot and backup features
211+
- Both meta and data PVCs will be backed up
212+
- Backs up raw data volumes (6 PVCs total: 3 meta + 3 data)
213+
214+
**Future Enhancement**:
215+
- Garage supports S3-to-S3 replication
216+
- Could replicate to external S3/Garage cluster for disaster recovery
217+
- Document this approach if needed later
218+
219+
## Troubleshooting
220+
221+
### Init Job Fails
222+
223+
Check logs:
224+
```bash
225+
kubectl logs -n garage job/garage-init
226+
```
227+
228+
Common issues:
229+
- Pods not ready yet (job will retry)
230+
- RPC secret mismatch
231+
- Network connectivity between pods
232+
233+
### Web UI Can't Connect
234+
235+
Check:
236+
1. ExternalSecret synced: `kubectl get externalsecret -n garage`
237+
2. Secret created: `kubectl get secret garage-secrets -n garage`
238+
3. Deployment logs: `kubectl logs -n garage deployment/garage-webui`
239+
4. Admin API accessible: `kubectl exec -n garage garage-0 -- curl localhost:3903/health`
240+
241+
### Pods Not Discovering Each Other
242+
243+
Check:
244+
1. Headless service exists: `kubectl get svc garage-internal -n garage`
245+
2. StatefulSet DNS working: `kubectl exec -n garage garage-0 -- nslookup garage-internal.garage.svc.cluster.local`
246+
3. RPC connectivity: `kubectl exec -n garage garage-0 -- garage node connect <node-id>`
247+
248+
## Dependencies
249+
250+
- **Longhorn**: Storage class for PVCs
251+
- **External Secrets Operator**: 1Password integration
252+
- **Gateway API**: HTTPRoute for Web UI
253+
- **ArgoCD**: ApplicationSet discovery and sync
254+
255+
## Future Enhancements
256+
257+
1. **External S3 Access**: Add HTTPRoute for S3 API if needed
258+
2. **S3 Web Hosting**: Enable port 3902 for static website hosting
259+
3. **Monitoring**: Add ServiceMonitor for Prometheus metrics
260+
4. **Horizontal Scaling**: Add more Garage instances (requires layout reconfiguration)
261+
5. **Backup Replication**: Configure S3-to-S3 replication to external cluster
262+
263+
## References
264+
265+
- Garage Documentation: https://garagehq.deuxfleurs.fr/
266+
- Garage Web UI: https://github.com/khairul169/garage-webui
267+
- Docker Image: `dxflrs/garage:v2.1.0`
268+
- Web UI Image: `khairul169/garage-webui:latest`
269+
- Kubernetes Cookbook: https://garagehq.deuxfleurs.fr/documentation/cookbook/kubernetes/

0 commit comments

Comments
 (0)