Skip to content

Commit d5fd2e0

Browse files
authored
Migrate Element/Matrix from Ansible to Helm/ArgoCD (#1556)
Migrate Element/Matrix from Ansible to Helm/ArgoCD Summary Replaces the legacy Ansible-managed Matrix/Synapse deployment with a fully declarative Helm chart deployment managed by ArgoCD. Migrates the existing matrix.otc-service.com instance (users, rooms, message history) from the old cluster (otcinfra) to the new cluster (otcinfra2). Changes Added — Helm Charts Upstream Element chart (upstream/element/): Wraps matrix-stack v26.2.3 from oci://ghcr.io/element-hq/ess-helm, deploying: Synapse v1.125.0 with external PostgreSQL RDS (SSL verify-ca) Element Web v1.11.86 (dark theme, branded "OTC Chat") MatrixRTC with LiveKit SFU v1.9.1 (forced TCP for OTC security group compatibility) Well-Known delegation with base domain redirect HAProxy + Redis Local Element chart (local/element/): Supplementary templates for: Element Call deployment with LiveKit integration SFU TCP LoadBalancer service (workaround for OTC ELB PreferDualStack incompatibility) Matrix media PersistentVolume (SFS Turbo NFS) Vault-injected secrets (signing key, registration secret, DB credentials, SSL CA) Init DB job for database bootstrapping Maubot Kustomize (kustomize/maubot/): Standalone Maubot deployment with prod overlay Removed — Ansible Roles & Playbooks playbooks/roles/matrix/ (8 files, ~3200 lines): Synapse K8s role with homeserver.yaml.j2 template playbooks/roles/maubot/ (5 files, ~340 lines): Maubot K8s role with config template playbooks/service-matrix.yaml playbooks/service-maubot.yaml Key Configuration Component Value Server name matrix.otc-service.com (preserved from legacy) Authentication Zitadel OIDC SSO (allow_existing_users: true) Images All mirrored to quay.io/opentelekomcloud/ TLS cert-manager with letsencrypt-prod ClusterIssuer WebRTC ICE forced to TCP — OTC SG blocks UDP Database External RDS with SSL verify-ca Migration Details Database dump/restore from old PostgreSQL to new RDS DNS records updated: matrix, element, call, matrixrtc .otc-service.com → 80.158.58.167 Original signing key preserved to maintain federation identity 3 users, 24 rooms, 112,482 events migrated Reviewed-by: SebastianGode
1 parent 70a75e0 commit d5fd2e0

29 files changed

Lines changed: 1000 additions & 3534 deletions
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
apiVersion: v2
2+
name: element-additional-manifests
3+
description: Additional manifests for Element (Matrix) deployment - secrets, PV/PVC, Element Call
4+
version: 0.1.0
Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
{{- if .Values.elementCall.enabled }}
2+
{{- /*
3+
Element Call — Standalone video/voice calling frontend.
4+
5+
This deploys the Element Call web application at call.otc-service.com.
6+
It provides a dedicated URL for video conferencing without needing
7+
the full Element Web chat client.
8+
9+
Element Call is also available as a built-in widget in Element Web —
10+
this standalone deployment is optional but useful for sharing call links
11+
with external users.
12+
*/ -}}
13+
apiVersion: v1
14+
kind: ConfigMap
15+
metadata:
16+
name: element-call-config
17+
labels:
18+
app.kubernetes.io/name: element-call
19+
data:
20+
config.json: |
21+
{
22+
"default_server_config": {
23+
"m.homeserver": {
24+
"base_url": "https://{{ .Values.elementCall.homeserverHost | default "matrix.otc-service.com" }}",
25+
"server_name": "{{ .Values.elementCall.homeserverHost | default "matrix.otc-service.com" }}"
26+
}
27+
},
28+
"livekit": {
29+
"livekit_service_url": "https://{{ .Values.elementCall.livekitHost | default "matrixrtc.otc-service.com" }}/sfu/get"
30+
}
31+
}
32+
---
33+
apiVersion: apps/v1
34+
kind: Deployment
35+
metadata:
36+
name: element-call
37+
labels:
38+
app.kubernetes.io/name: element-call
39+
app.kubernetes.io/component: frontend
40+
spec:
41+
replicas: {{ .Values.elementCall.replicas | default 1 }}
42+
selector:
43+
matchLabels:
44+
app.kubernetes.io/name: element-call
45+
template:
46+
metadata:
47+
labels:
48+
app.kubernetes.io/name: element-call
49+
spec:
50+
imagePullSecrets:
51+
- name: quay-pull-secret
52+
containers:
53+
- name: element-call
54+
image: {{ .Values.elementCall.image | default "quay.io/opentelekomcloud/element-call:v0.17.0" }}
55+
imagePullPolicy: IfNotPresent
56+
ports:
57+
- name: http
58+
containerPort: 8080
59+
protocol: TCP
60+
livenessProbe:
61+
httpGet:
62+
path: /
63+
port: http
64+
initialDelaySeconds: 5
65+
periodSeconds: 10
66+
readinessProbe:
67+
httpGet:
68+
path: /
69+
port: http
70+
initialDelaySeconds: 5
71+
periodSeconds: 10
72+
resources:
73+
requests:
74+
cpu: 50m
75+
memory: 64Mi
76+
limits:
77+
memory: 128Mi
78+
securityContext:
79+
allowPrivilegeEscalation: false
80+
readOnlyRootFilesystem: true
81+
capabilities:
82+
drop:
83+
- ALL
84+
volumeMounts:
85+
- name: tmp
86+
mountPath: /tmp
87+
- name: config
88+
mountPath: /app/config.json
89+
subPath: config.json
90+
readOnly: true
91+
volumes:
92+
- name: tmp
93+
emptyDir: {}
94+
- name: config
95+
configMap:
96+
name: element-call-config
97+
---
98+
apiVersion: v1
99+
kind: Service
100+
metadata:
101+
name: element-call
102+
labels:
103+
app.kubernetes.io/name: element-call
104+
spec:
105+
type: ClusterIP
106+
ports:
107+
- name: http
108+
port: 8080
109+
targetPort: http
110+
protocol: TCP
111+
selector:
112+
app.kubernetes.io/name: element-call
113+
---
114+
apiVersion: networking.k8s.io/v1
115+
kind: Ingress
116+
metadata:
117+
name: element-call
118+
labels:
119+
app.kubernetes.io/name: element-call
120+
annotations:
121+
cert-manager.io/cluster-issuer: letsencrypt-prod
122+
cert-manager.io/duration: "8760h"
123+
cert-manager.io/renew-before: "720h"
124+
spec:
125+
ingressClassName: nginx
126+
tls:
127+
- hosts:
128+
- {{ .Values.elementCall.host | default "call.otc-service.com" }}
129+
secretName: element-call-tls
130+
rules:
131+
- host: {{ .Values.elementCall.host | default "call.otc-service.com" }}
132+
http:
133+
paths:
134+
- path: /
135+
pathType: Prefix
136+
backend:
137+
service:
138+
name: element-call
139+
port:
140+
number: 8080
141+
{{- end }}
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
{{- /*
2+
One-shot Job: creates the "synapse" database on the external RDS instance
3+
if it does not already exist. Runs before Synapse starts (via helm hook).
4+
5+
Credentials come from the same AVP-injected secrets used by Synapse.
6+
*/ -}}
7+
apiVersion: batch/v1
8+
kind: Job
9+
metadata:
10+
name: init-synapse-db
11+
labels:
12+
app.kubernetes.io/name: matrix
13+
app.kubernetes.io/component: init-db
14+
annotations:
15+
argocd.argoproj.io/hook: PreSync
16+
argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
17+
spec:
18+
backoffLimit: 4
19+
ttlSecondsAfterFinished: 300
20+
template:
21+
metadata:
22+
labels:
23+
app.kubernetes.io/name: matrix
24+
app.kubernetes.io/component: init-db
25+
spec:
26+
restartPolicy: OnFailure
27+
imagePullSecrets:
28+
- name: quay-pull-secret
29+
containers:
30+
- name: init-db
31+
image: postgres:16-alpine
32+
env:
33+
- name: PGHOST
34+
value: "192.168.14.3"
35+
- name: PGPORT
36+
value: "5432"
37+
- name: PGUSER
38+
value: "<path:secret/data/postgres/element/admin#postgres-username>"
39+
- name: PGPASSWORD
40+
value: "<path:secret/data/postgres/element/admin#postgres-password>"
41+
- name: PGSSLMODE
42+
value: "require"
43+
command:
44+
- /bin/sh
45+
- -ec
46+
- |
47+
echo "Checking if database 'synapse' exists..."
48+
if psql -d postgres -tc "SELECT 1 FROM pg_database WHERE datname = 'synapse'" | grep -q 1; then
49+
echo "Database 'synapse' already exists — skipping."
50+
else
51+
echo "Creating database 'synapse' with UTF8/C locale..."
52+
psql -d postgres -c "CREATE DATABASE synapse ENCODING 'UTF8' LC_COLLATE='C' LC_CTYPE='C' TEMPLATE=template0;"
53+
echo "Database 'synapse' created successfully."
54+
fi
55+
resources:
56+
requests:
57+
cpu: 50m
58+
memory: 32Mi
59+
limits:
60+
memory: 64Mi
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
{{- /*
2+
NFS PersistentVolume for Synapse media store.
3+
Uses existing SFS Turbo NFS share at 192.168.171.186.
4+
This PV is referenced by the PVC "matrix-media-pvc" which Synapse uses
5+
via synapse.media.storage.existingClaim.
6+
*/ -}}
7+
apiVersion: v1
8+
kind: PersistentVolume
9+
metadata:
10+
name: matrix-nfs
11+
labels:
12+
app.kubernetes.io/name: matrix
13+
app.kubernetes.io/component: media-storage
14+
spec:
15+
capacity:
16+
storage: {{ .Values.mediaStorage.capacity | default "500Gi" | quote }}
17+
accessModes:
18+
- ReadWriteMany
19+
mountOptions:
20+
- vers=3
21+
- nolock
22+
- noresvport
23+
- timeo=600
24+
- tcp
25+
nfs:
26+
server: {{ .Values.mediaStorage.nfs.server | quote }}
27+
path: {{ .Values.mediaStorage.nfs.path | quote }}
28+
persistentVolumeReclaimPolicy: Retain
29+
storageClassName: {{ .Values.mediaStorage.storageClassName | default "csi-sfsturbo-retain" | quote }}
30+
---
31+
apiVersion: v1
32+
kind: PersistentVolumeClaim
33+
metadata:
34+
name: matrix-media-pvc
35+
labels:
36+
app.kubernetes.io/name: matrix
37+
app.kubernetes.io/component: media-storage
38+
spec:
39+
accessModes:
40+
- ReadWriteMany
41+
resources:
42+
requests:
43+
storage: {{ .Values.mediaStorage.capacity | default "500Gi" | quote }}
44+
volumeName: matrix-nfs
45+
storageClassName: {{ .Values.mediaStorage.storageClassName | default "csi-sfsturbo-retain" | quote }}
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
{{- /*
2+
Kubernetes Secrets for the Matrix Stack.
3+
Values are injected by ArgoCD Vault Plugin (AVP) from HashiCorp Vault.
4+
5+
Vault paths:
6+
secret/postgres/element/admin -> postgres-username, postgres-password
7+
secret/postgres/ca -> cert (RDS CA certificate)
8+
secret/element/synapse -> signing-key, registration-shared-secret,
9+
macaroon-secret-key, form-secret
10+
*/ -}}
11+
apiVersion: v1
12+
kind: Secret
13+
metadata:
14+
name: matrix-secrets
15+
labels:
16+
app.kubernetes.io/name: matrix
17+
app.kubernetes.io/component: synapse
18+
type: Opaque
19+
stringData:
20+
# -- Synapse signing key (full ed25519 key)
21+
signing-key: "<path:secret/data/element/synapse#signing-key>"
22+
23+
# -- Registration shared secret
24+
registration-shared-secret: "<path:secret/data/element/synapse#registration-shared-secret>"
25+
26+
# -- Macaroon secret key
27+
macaroon-secret-key: "<path:secret/data/element/synapse#macaroon-secret-key>"
28+
29+
# -- Form secret
30+
form-secret: "<path:secret/data/element/synapse#form-secret>"
31+
---
32+
apiVersion: v1
33+
kind: Secret
34+
metadata:
35+
name: matrix-db-credentials
36+
labels:
37+
app.kubernetes.io/name: matrix
38+
app.kubernetes.io/component: database
39+
type: Opaque
40+
stringData:
41+
# -- PostgreSQL password for synapse user (from RDS)
42+
password: "<path:secret/data/postgres/element/admin#postgres-password>"
43+
---
44+
apiVersion: v1
45+
kind: Secret
46+
metadata:
47+
name: matrix-db-ssl-ca-crt
48+
labels:
49+
app.kubernetes.io/name: matrix
50+
app.kubernetes.io/component: database
51+
type: Opaque
52+
stringData:
53+
# -- RDS CA certificate for SSL connections
54+
ca.crt: "<path:secret/data/postgres/ca#cert>"
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
{{/*
2+
SFU TCP LoadBalancer Service
3+
----------------------------
4+
Managed here (not in the upstream matrix-stack chart) because OTC CCE
5+
requires:
6+
1. kubernetes.io/elb.id annotation pointing to an existing union ELB
7+
2. ipFamilyPolicy: SingleStack (union ELBs reject IPv6 / DualStack)
8+
9+
The upstream chart hard-codes ipFamilyPolicy: PreferDualStack, so we
10+
disable its rtcTcp service and provide our own.
11+
*/}}
12+
apiVersion: v1
13+
kind: Service
14+
metadata:
15+
name: {{ .Release.Name }}-matrix-rtc-sfu-tcp
16+
namespace: {{ .Release.Namespace }}
17+
labels:
18+
app.kubernetes.io/component: matrix-rtc-voip-server
19+
app.kubernetes.io/instance: {{ .Release.Name }}-matrix-rtc-sfu-rtc
20+
app.kubernetes.io/managed-by: Helm
21+
app.kubernetes.io/name: matrix-rtc-sfu-rtc
22+
app.kubernetes.io/part-of: matrix-stack
23+
annotations:
24+
# Share the existing ingress-nginx ELB (80.158.58.167)
25+
kubernetes.io/elb.id: "510d12e5-a578-46e5-acb0-32bc0ffcb04c"
26+
kubernetes.io/elb.class: union
27+
kubernetes.io/elb.pass-through: onlyLocal
28+
spec:
29+
type: LoadBalancer
30+
externalTrafficPolicy: Local
31+
# OTC union ELBs do not support IPv6 — must be SingleStack IPv4
32+
ipFamilyPolicy: SingleStack
33+
ipFamilies:
34+
- IPv4
35+
selector:
36+
app.kubernetes.io/instance: {{ .Values.sfuReleaseName }}-matrix-rtc-sfu
37+
ports:
38+
- name: rtc-tcp
39+
port: 30881
40+
targetPort: 30881
41+
protocol: TCP
42+
nodePort: 30881
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# -- Element Call standalone frontend deployment
2+
# Element Call is already available as a widget inside Element Web.
3+
# Enable this only if you want a dedicated call.otc-service.com URL
4+
# for standalone video conferencing (without the full chat client).
5+
elementCall:
6+
enabled: true
7+
image: quay.io/opentelekomcloud/element-call:v0.17.0
8+
replicas: 1
9+
host: call.otc-service.com
10+
11+
# -- Upstream chart release name (needed for cross-chart service selectors)
12+
# The SFU TCP LoadBalancer service lives in this (local) chart but must
13+
# select pods deployed by the upstream chart, which has a different
14+
# Helm release name.
15+
sfuReleaseName: element-otcinfra2
16+
17+
# -- NFS PersistentVolume for Synapse media store
18+
mediaStorage:
19+
nfs:
20+
server: "192.168.171.186"
21+
path: "/"
22+
capacity: "500Gi"
23+
storageClassName: "csi-sfsturbo-retain"
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
apiVersion: v2
2+
name: element
3+
description: >-
4+
Wrapper chart for Element's matrix-stack (ESS Community).
5+
Deploys Synapse, Element Web, MatrixRTC (LiveKit + lk-jwt-service),
6+
Well-Known Delegation, HAProxy, and optionally MAS and PostgreSQL.
7+
version: 0.1.0
8+
9+
dependencies:
10+
- name: matrix-stack
11+
version: "26.2.3"
12+
repository: "oci://ghcr.io/element-hq/ess-helm"

0 commit comments

Comments
 (0)