Gitea migration#1614
Conversation
- Upstream Gitea chart v12.5.3 with values-prod.yaml - Local chart for AVP-injected secrets (db, obs, security, oauth, smtp) - NetworkPolicy restricting ingress to nginx + prometheus - Built-in PostgreSQL for initial testing (RDS cutover later) - OBS storage for LFS, attachments, packages - Valkey for session/cache/queue - Security hardened (read-only root, drop all caps, non-root) - ArgoCD applications targeting Gitea_migration branch
- Add PostgreSQL auth credentials (password required by Bitnami chart) - Add SYS_RESOURCE capability for Valkey (needed for background jobs) - Allow egress to in-namespace PostgreSQL in NetworkPolicy
- PostgreSQL: set usePasswordFiles=false (file mount permission issue) - Valkey: set seccompProfile=Unconfined (io_uring needed for background jobs)
- Chart appends -rootless when rootless=true, so tag should be just '1.25.5'
…, increase resources - readOnlyRootFilesystem was blocking initdb writes to paths not covered by emptyDir - resourcesPreset small gives enough memory for initdb
ArgoCD CMP cannot resolve remote chart dependencies at render time. Vendor gitea-12.5.3.tgz so helm template works in the sidecar plugin.
- Set imagePullPolicy: Always to force re-pull of correct amd64 image (nodes cached arm64 image with IfNotPresent policy) - Enable volumePermissions for PostgreSQL to fix SGID bit on data dir (fsGroup sets SGID which causes initdb to fail)
fsGroupChangePolicy: Always re-applies SGID bit on the data directory AFTER init-chmod-data clears it. OnRootMismatch prevents this. Also deleted the stale PVC to start with clean permissions.
fsGroup mechanism sets SGID on fresh volumes even with OnRootMismatch. initdb refuses to run on dirs with SGID bit set. Disable podSecurityContext entirely and rely on volumePermissions init container for ownership.
…Context Root cause: without fsGroup, emptyDir mounts (conf dir) are root:root and PG can't write pg_hba.conf. With containerSecurityContext enabled, chart defaults add seccompProfile/capabilities that suppress error output. Fix: enable podSecurityContext with fsGroup:1001 (for emptyDir write access) and disable containerSecurityContext (avoids restrictive defaults, image already runs as uid 1001).
All 5 nodes are 88-99% CPU-requested. 500m doesn't fit anywhere. 250m is sufficient for initial deployment, can increase later.
The ingress-nginx controller pod runs in the 'default' namespace, not 'ingress-nginx'. Selecting by namespace name 'ingress-nginx' meant no traffic was allowed in (HTTPS returned 504). Switch to podSelector with empty namespaceSelector so the rule matches the controller pod by label regardless of which namespace it lives in.
DISABLE_REGISTRATION=false + ALLOW_ONLY_EXTERNAL_REGISTRATION=true permits new users to be created on first OIDC login through Zitadel, while blocking the local sign-up form. SHOW_REGISTRATION_BUTTON=false keeps the manual register link off the UI.
CCE CoreDNS pods use label k8s-app=coredns, not k8s-app=kube-dns. Init container failed with DNS i/o timeout when resolving gitea-otcinfra2-postgresql.gitea.svc.cluster.local. Use matchExpressions to allow both label values for portability.
…tworkPolicy
Two related issues blocking the configure-gitea init container:
1. Init container loops with 'Admin account gitea_admin already exist.
Running update to sync password... password does not meet complexity
requirements'. The Vault-stored password is 32 alphanumeric chars,
missing the special-character class required by Gitea's
PASSWORD_COMPLEXITY=lower,upper,digit,spec rule. Set
gitea.admin.passwordMode=initialOnlyNoReset so the chart only sets
the admin password on first creation and skips updates on subsequent
reconciles. The admin already exists in PG from previous run, so the
update path is no longer needed.
2. The local-chart NetworkPolicy could not be made permissive enough on
this CCE cluster: even with 0.0.0.0/0 egress, an empty {} egress rule,
and the correct k8s-app=coredns selector, the gitea pod still failed
DNS resolution and TCP to the PG service IP, while a busybox pod
with the same labels on the same node worked. Deleting the policy
on-cluster immediately unblocked init. Drop the template from the
chart so ArgoCD does not recreate it. Egress restriction can be
revisited later with a working policy implementation.
Migration of VM repo data (~82G in /var/lib/gitea/data/gitea-repositories
on gitea2) into the K8s PVC required two changes:
1. persistence.size: 50Gi -> 120Gi
The on-disk repo tree alone is ~82G; 120Gi gives ~25G headroom for
growth and working space (repo-archive cache, indexers, packages).
The PVC was online-resized via Everest CSI (allowVolumeExpansion=true)
from 50Gi to 120Gi without restarting the pod.
2. persistence.volumeName: pvc-0602d752-4ce6-4623-afac-3eca0bc7db3d
Pin the chart-rendered PVC to the specific PV that now holds the
migrated data. Combined with:
- persistentVolumeReclaimPolicy=Retain on the PV
- helm.sh/resource-policy=keep annotation on the PVC
- explicit claimName=gitea-shared-storage
this guarantees that uninstalling+reinstalling the chart, or deleting
and recreating the PVC, will rebind the new PVC to this exact disk
instead of provisioning a fresh empty volume.
If the PV is ever recreated or replaced, update volumeName accordingly.
Adds [oauth2_client] section to app.ini via Helm values: ENABLE_AUTO_REGISTRATION = true ACCOUNT_LINKING = auto USERNAME = nickname UPDATE_AVATAR = false Why: existing users authenticate via KeyCloak OIDC and have no local password. When they log in for the first time via the new Zitadel OIDC provider, Gitea must bind the Zitadel sub to their existing user row (matched by email) instead of refusing login or creating a duplicate account. ACCOUNT_LINKING=auto performs that bind transparently as long as exactly one existing user has the matching email.
Without an explicit scopes flag the chart only sends 'openid' to Zitadel, so the ID token contains only 'sub' and Gitea fails first login with 'missing fields: email,nickname'. Add the standard OIDC scopes plus 'groups' (used by groupClaimName: groups for adminGroup mapping).
Add kubernetes.io/elb.id and elb.pass-through annotations to the SSH LoadBalancer service so port 2222 attaches a listener to the shared OTC ELB (510d12e5-...) at 80.158.58.167. Without elb.id the service was created but no listener was registered, so SSH (git clone) hung.
Port 25 uses STARTTLS (explicit TLS), not implicit TLS (smtps/port 465). Using smtps on port 25 caused repeated TLS handshake failures, which T-Systems SMTP counted as failed auth attempts and eventually blocked the cluster egress IP (80.158.44.171).
ArgoCD's CMP runs 'helm dependency update' on every manifest generation. When dl.gitea.com/charts/ is temporarily unreachable, helm outputs an HTML error page which gets piped to argocd-vault-plugin, causing: Error: invalid character '<' looking for beginning of value Fix: commit Chart.lock + charts/gitea-12.5.3.tgz so helm uses the local vendored chart and never needs to hit the remote registry.
e817e83 to
8977019
Compare
| adminGroup: "admin" | ||
| restrictedGroup: "" | ||
|
|
||
| metrics: |
There was a problem hiding this comment.
Is there a reason we have 2x metrics
| @@ -0,0 +1,6 @@ | |||
| dependencies: | |||
There was a problem hiding this comment.
is it intentional to have the chart also locally stored in kubernetes/helm_charts/upstream/gitea/charts/gitea-12.5.3.tgz but also referring to origin chart at dl.gitea.com?
| passwordMode: initialOnlyNoReset | ||
|
|
||
| config: | ||
| APP_NAME: "Open Telekom Cloud: git" |
| auth: | ||
| username: gitea | ||
| database: gitea | ||
| password: "gitea-pg-temp-pass" |
There was a problem hiding this comment.
even for initial the pw should be fetched from vault (and changed)
| REQUIRE_SIGNIN_VIEW: true | ||
|
|
||
| mailer: | ||
| ENABLED: true |
| ENABLE_NOTIFY_MAIL: true | ||
|
|
||
| api: | ||
| ENABLE_SWAGGER: false |
| apiVersion: v1 | ||
| kind: Secret | ||
| metadata: | ||
| name: gitea-db-secret |
There was a problem hiding this comment.
shouldn't be also namespace mentionedi in secret tamplates? like in zitadel secrets?
namespace: {{ .Values.namespace | default .Release.Namespace }}
| architecture: standalone | ||
| global: | ||
| valkey: | ||
| password: "" |
There was a problem hiding this comment.
why?
even it's internal service, other pods if compromised could read the data from it
| Common labels | ||
| */}} | ||
| {{- define "gitea-additional-manifests.labels" -}} | ||
| app.kubernetes.io/name: gitea |
There was a problem hiding this comment.
no need to add also reference to helm.sh/chart: like in other helper templates?
| externalTrafficPolicy: Local | ||
| annotations: | ||
| kubernetes.io/elb.class: union | ||
| kubernetes.io/elb.id: 510d12e5-a578-46e5-acb0-32bc0ffcb04c |
There was a problem hiding this comment.
is this hardcoded reference necessary?
| SCHEME: https | ||
|
|
||
| # Security: Rate limiting | ||
| api.rate_limit: |
There was a problem hiding this comment.
I didn't this documented in helm charts values, or anywhere else, is it working?
…y T-Systems ENABLED: false to prevent Gitea from hammering the blocked IP. Re-enable once T-Systems unblocks the IP.
No description provided.