| name | deploy-t4-cs3-over-ceph |
|---|---|
| description | Deploy (or clean-reinstall) Together T4 (gns control-plane + s3-proxy data-plane) and CS3 (s3-cache-proxy) on the transporter k3s cluster, backed by a Rook-Ceph RGW running on a separate cluster. Covers the two-cluster topology, the two data paths, helm-driven install with hand-created secrets (TLS from certs1, postgres, ECR pull secrets in us-west-2), wiping+reseeding gns postgres, and pointing everything at the live Ceph RGW. Use for "deploy t4 and cs3 over ceph", "reinstall gns/s3-proxy/cs3", "redeploy T4 on the transporter". |
- Compute/T4 —
<USER>@<TRANSPORTER_IP>(<TRANSPORTER_HOST>, +.74worker), k3s. Namespaces:gns,s3-proxy,cs3,t4.KUBECONFIG=~/k3s.yaml(API also reachable from a laptop athttps://<TRANSPORTER_IP>:6443).dockeris podman; passwordless sudo. helm installed at/usr/local/bin/helm(v3). - Storage/Ceph —
<CEPH_USER>@<CEPH_IP>(<CEPH_HOST>), Rook-Ceph in nsrook-ceph(use sudo for kubectl). Object storemy-store; RGW reachable cross-cluster via NodePorthttp://<CEPH_RGW_NODEPORT>(in-cluster namerook-ceph-rgw-my-store.rook-ceph.svc:80does NOT resolve from the transporter).
client → s3-proxy (UMS auth) → GNS (bucket→site→store routing) → Ceph RGW. s3-proxy gets the store URL from GNS and the S3 creds from itss3-proxy-backendssecret keyed by store nameceph.cs3 (s3-cache-proxy) → Ceph RGW directly(-s3-endpoint=http://<CEPH_RGW_NODEPORT>, creds inrgw-creds). cs3 integrates with the T4 control plane (ENG-90005).
- gns:
<T4_ECR_ACCOUNT>.dkr.ecr.us-west-2.amazonaws.com/t4-gns:latest - s3-proxy:
<T4_ECR_ACCOUNT>.dkr.ecr.us-west-2.amazonaws.com/t4-s3-proxy:latest - cs3:
<CS3_ECR_ACCOUNT>.dkr.ecr.us-west-2.amazonaws.com/s3-cache-proxy:latest
Charts in repo: together-t4/gns/deploy/helm/gns, together-t4/s3-proxy/deploy/helm/s3-proxy, cs3/helm. GNS TLS material in together-t4/certs1/ (ca-cert.pem, gns-server.crt/key, gns-client.crt/key, proxy.crt/key). Drive helm from the host (copy charts+certs over) or locally against the kubeconfig.
0. ECR pull secrets (us-west-2) in gns, s3-proxy (ecr-pull) and cs3 (ecr-creds). Generate token locally, kubectl create secret docker-registry … --dry-run=client -o yaml | kubectl apply -f -. ECR tokens last 12h — refresh ecr-creds if cs3 pods hit 403 Forbidden/ImagePullBackOff.
1. GNS (full teardown + wipe):
helm uninstall gns gns-postgres -n gns ; kubectl delete ns gns ; kubectl create ns gns
# recreate: ecr-pull(usw2); gns-tls (tls.crt=gns-server.crt, tls.key=gns-server.key, ca-cert.pem=ca-cert.pem);
# gns-postgres-secret (connection-string=postgres://gns:<GNS_PG_PASSWORD>@gns-postgres-postgresql.gns.svc.cluster.local:5432/gns?sslmode=disable)
helm repo add bitnami https://charts.bitnami.com/bitnami; helm install gns-postgres bitnami/postgresql -n gns \
--set auth.username=gns --set auth.password=<GNS_PG_PASSWORD> --set auth.database=gns --set architecture=standalone --set primary.persistence.size=10Gi
helm install gns ./gns -n gns --set image.repository=…/t4-gns --set image.tag=latest --set image.pullPolicy=Always \
--set 'imagePullSecrets[0].name=ecr-pull' --set config.postgres.secretName=gns-postgres-secret \
--set tls.secretName=gns-tls --set config.tlsEnabled=true
gns auto-migrates the schema on boot. Then seed the ceph store — see [[gns-seed-ceph-store]] (chart seed-job uses mTLS bootstrap; we seed via SQL to be safe).
2. s3-proxy (preserve the external s3-proxy-gns-tls mTLS client-cert secret — don't delete the ns):
- Update
ecr-pull→usw2 ands3-proxy-backendsto{"ceph":{"URL":"http://<CEPH_RGW_NODEPORT>","AWSAccessId":<ceph>,"AWSSecretId":<ceph>}}using the working ceph creds from cs3'srgw-creds(March creds were dead). helm uninstall s3-proxy -n s3-proxy && helm install s3-proxy ./s3-proxy -n s3-proxywith image…/t4-s3-proxy:latest,config.gns.serverAddress=gns.gns.svc.cluster.local:9090,config.gns.serverName=gns.gns.svc.cluster.local,config.siteName=transporter,config.ums.endpoint=<UMS_ENDPOINT>,config.ums.serviceKey=<ums service key>,backendsSecretName=s3-proxy-backends,proxyTLS.secretName=s3-proxy-tls,service.type=LoadBalancer.- Healthy log lines:
Connected to GNS via mTLS,defaultStore: ceph.
3. cs3 (already ceph-wired): kubectl set image ds/cs3 s3-cache-proxy=…/s3-cache-proxy:latest -n cs3 then kubectl rollout status ds/cs3 -n cs3.
4. Verify end-to-end → [[s3-proxy-ceph-e2e]].
- SSH to
.73is intermittently flaky — use-o ConnectTimeout=25 -o ServerAliveInterval=5and retry. - zsh does NOT word-split unquoted vars → don't put ssh flags in a
$VAR; inline them. helm get values <release>is the source of truth for re-creating config faithfully.- RBAC:
<USER>can create/delete ns, patch deploy/ds, create secrets; cannot read others' helm releases without helm installed.